I come from the DevOps parts of the world and the AI/ML world was new for me until I joined my current company SingleStore. It's been 8 months since I joined and it is going amazingly well. Learning all the new cool things in AI/ML from various blogs, tutorials, tools, etc. Well, initially when I joined this company, I got to know that we are investing heavily in AI/ML and I started looking out for tools that could simplify the learning process and also do some hands-on. BTW, we had a lot of content and material to start with, but I wanted to see what else was happening outside the industry and do my own research.
Like I said, it's been 8 months already. Throughout my journey here, from conducting webinars, and speaking at conferences, to writing blogs on emerging tech trends, I've stumbled upon a collection of indispensable tools. In this article, I'll share those tools—ranging from emerging programming languages to AI frameworks to vector databases to development tools that ease the creation of AI/ML applications. Let’s get started.
1. Programming Language
Wing
I tried to play with many languages and what I found was that most of the languages are overhyped. Yes, and that is when I met one of the communities talking about this new language made for cloud and AI applications. That was the first time I tried the Wing programming language and found it pretty impressive.
You might ask - why Wing? Wing offers a unified programming model that consolidates infrastructure and application code within a cohesive framework. This unique approach allows developers to streamline their workflow, eliminating the need for constant context switching and significantly enhancing productivity and creativity.
This is what exactly you need while building AI/ML applications, focusing more on the core features rather than the underlying infrastructure. I encountered their Open AI Joker application, which generates and translates jokes into different languages. It was amazing to see how smooth the entire framework was. Just a note that Wing is still under active development.
You can build any AI/ML application with minimal code. Let’s see how the Joker application works. The application generates jokes using OpenAI and translates them to different languages. There is a comedian, an Open AI assistant that generates jokes and whenever it generates jokes, the joke gets put/stored in a bucket. There are two translators, Spanish and Hebrew. These translators subscribe to a topic. Whenever a joke is generated, they receive it and translate it. Also, they put the translated joke into the bucket as well. Pretty simple.
Below is a technical overview image of this example of how to use OpenAI’s API with Wing.
If you haven't tried Winglang yet, you can easily try it with a simple command
npm install -g winglang
Copy the code to your local computer using this git clone command
git clone https://github.com/winglang/wing.git
Go to the examples folder & then run the following commands in the terminal:
npm install
wing it
Invoke the "START HERE" function, and see the results in the "Joke Store".
2. Vector Data Storage and Analysis Tool
SingleStore & Notebooks
For AI/ML applications, you need a database to store the unstructured data. I joined SingleStore almost 8 months back and I was new to AI/ML and vector databases. With all the hype around vector databases, I started learning about vector databases and found out how SingleStore is an amazing addition to the industry as a vector database. It is not just used to store the vector data but companies use SingleStore for real-time analytics also. Now, see the power of vector data storage supporting real-time analytics, that's a superpower.
Let me introduce SingleStore to you all; it is a cloud-based database management system (RDBMS) that is designed for data-intensive applications. It is known for its speed in data ingestion, transaction processing, and query processing. SingleStore started supporting vector storage long back in 2017.
SingleStore's Notebook feature is based on the popular Jupyter Notebook, which is widely used in data science and machine learning communities. The SingleStore Notebook extends the capabilities of Jupyter Notebook to enable data professionals to easily work with SingleStore's distributed SQL database while providing great extensibility in language and data sources.
Try SingleStore and get $600 worth of free credits.
3. Data Manipulation and Analysis Tool
I just love working with data and doing different experiments taking publicly available datasets like wine dataset, titanic dataset, etc to name a few. I was mesmerized to see the capabilities of NumPy and Pandas in data exploration and coming up with different solutions.
Pandas and NumPy are two of the most popular libraries in the Python ecosystem for data analysis and scientific computing.
Pandas and NumPy
At the heart of any AI/ML application is data. Tools like Pandas and NumPy are foundational for data manipulation and analysis in Python. Pandas provide high-level data structures and operations for manipulating numerical tables and time series, making it ideal for preprocessing and cleaning data before it's used for training models. NumPy adds support for large, multi-dimensional arrays and matrices, along with a large collection of mathematical functions to operate on these arrays, crucial for performance-heavy operations in data preprocessing and model training.
4. AI & Machine Learning Frameworks
I have used TensorFlow, PyTorch and recently came across LangChain and LlamaIndex and was impressed by their ability to help AI/ML engineers with all the toolkits required such as APIs, vector storage functionalities, logic, reasoning, etc to build robust applications. Let’s go through them one by one to see their superpowers.
TensorFlow and PyTorch
TensorFlow, developed by Google, and PyTorch, developed by Facebook, are two of the most popular frameworks for building and training complex machine learning models. TensorFlow is known for its flexibility and robust scalability, making it suitable for both research prototypes and production deployments. PyTorch is praised for its ease of use, simplicity, and dynamic computational graph that allows for more intuitive coding of complex AI models. Both frameworks support a wide range of AI models, from simple linear regression to complex deep neural networks.
LangChain
Developed by Harrison Chase, and debuted in October 2022, LangChain serves as an open-source platform designed for constructing sturdy applications powered by Large Language Models, such as chatbots like ChatGPT and various tailor-made applications.
Langchain seeks to equip data engineers with an all-encompassing toolkit for utilizing LLMs in diverse use cases, such as chatbots, automated question-answering, text summarization, and beyond.
LlamaIndex
LlamaIndex is an advanced orchestration framework designed to amplify the capabilities of LLMs like GPT-4. While LLMs are inherently powerful, having been trained on vast public datasets, they often lack the means to interact with private or domain-specific data.
LlamaIndex bridges this gap, offering a structured way to ingest, organize, and harness various data sources — including APIs, databases, and PDFs. By indexing this data into formats optimized for LLMs, LlamaIndex facilitates natural language querying, enabling users to seamlessly converse with their private data without the need to retrain the models.
5. Deep Learning Model
As a beginner, I was looking for something simple and flexible for developing deep learning models and that is when I found Keras. Many AI/ML professionals appreciate Keras for its simplicity and efficiency in prototyping and developing deep learning models, making it a preferred choice, especially for beginners and for projects requiring rapid development.
Keras
For developers looking for a high-level neural networks API, Keras, which is now integrated into TensorFlow, offers a simpler interface to build and train deep learning models. Keras abstracts away much of the complexity of building neural networks, making it accessible for beginners while still being powerful enough for research.
6. Development and Version Control Platforms
GitHub and DVC
Collaboration and version control are crucial in AI/ML development projects due to the iterative nature of model development and the need for reproducibility. GitHub is the leading platform for source code management, allowing teams to collaborate on code, track issues, and manage project milestones. DVC (Data Version Control) complements Git by handling large data files, data sets, and machine learning models that Git can't manage effectively, enabling version control for the data and model files used in AI projects.
7. AI Model Deployment and Monitoring
I built some AI/ML applications but how and where to deploy them? That is where my mind jumped to the two main tools in this category: Docker and Kubernetes. Like I said before, I come from DevOps parts of the world and I have already worked with these tools and I know how they work. While Docker containerizes your application, Kubernetes is used to deploy your applications for scale.
Docker and Kubernetes
Deploying AI models into production requires tools that can package applications and manage them at scale. Docker simplifies the deployment of AI applications by containerizing them, ensuring that the application runs smoothly in any environment. Kubernetes, an orchestration system for Docker containers, allows for the automated deployment, scaling, and management of containerized applications, essential for AI applications that need to scale across multiple servers or cloud environments.
8. Cloud Platforms for AI
You cannot scale anything without cloud platforms like AWS, Google, and Azure. While AWS is my favorite, I have also explored other options and mentioned all three major cloud providers here.
AWS, Google Cloud, and Azure
Cloud platforms like Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure offer a range of AI and machine learning services that abstract away much of the infrastructure required to train and deploy AI models. These platforms provide managed services for machine learning model training, deployment, and monitoring, along with a vast array of computational resources scalable to the needs of any AI project.
9. Specialized AI Development Tools
While I still prefer using the SingleStore Notebook feature, I know most of you are already using Jupyter Notebooks for your data explorations and analysis. I sometimes use Jupyter Notebooks and one more interesting tool is MLflow which will help you with end-to-end ML workflow.
Jupyter Notebooks
For exploratory data analysis, model development, and documentation, Jupyter Notebooks are an indispensable tool. They allow developers to create and share documents that contain live code, equations, visualizations, and narrative text, making it an excellent tool for collaborative AI research and development.
MLflow
MLflow is an open-source platform for managing the end-to-end machine learning lifecycle. It includes features for experiment tracking, model versioning, and deployment, enabling developers to track and compare experiments, package models into reproducible runs, and manage model deployment across multiple environments.
Some final thoughts
The AI/ML landscape is growing like an ocean. Every day we will see one or the other language model making its debut with lots of expectations. There are numerous amazing developer tools that can still be part of this list but from my personal experience, I believe these tools are a good start for every AI/ML Engineer to start building their AI/ML applications.
Let me know what your favorite tools are currently in the AI/ML ARENA.