The Open-Source Toolkit for Building AI Agents

Curated frameworks, tools, and libraries every developer needs to build functional and efficient AI agents

Nov 28, 2024

June ‘25 update: Released an updated map with new frameworks and repositories since this post was published in Nov ‘24.

AI Agents

The Open-Source Toolkit for Building AI Agents v2

Sahar Mor

Jun 1

The Open-Source Toolkit for Building AI Agents v2

An opinionated, developer-first guide to building AI agents with real-world impact

Read full story

Welcome to a new post in the AI Agents Series - helping AI developers and researchers deploy and make sense of the next step in AI.

My last post explored how the internet will trans

form for an agent-first future - from websites optimizing for AI interaction through "agent-responsive design" to the emergence of Agent Engine Optimization (AEO) as the next SEO. We saw how tech giants like Google, Apple, OpenAI, and Anthropic are racing to define this next evolution of digital interaction, with Gartner projecting that by 2028, 33% of enterprise software applications will include agentic AI.

Agent-Responsive Design: Rethinking the web for an agentic future

Sahar Mor

November 17, 2024

Read full story

In this post, I'll outline a curated, though non-exhaustive, overview of the open-source ecosystem for developers creating these AI agents. While numerous market maps exist for AI agents, they often cater more to venture capitalists than builders. Developers need actionable tools and frameworks to launch functional AI agents today.

Which tools do other builders rely on to develop voice agents? What’s the leading open model for document understanding? With new packages emerging almost daily, I’ll focus solely on the libraries I’ve personally found most effective. This list is, therefore, intentionally selective rather than exhaustive.

Every package included here supports commercial use and has a permissive open-source license.

With the holiday season coming, there's no better time to dive into these tools and start building.

Categories covered in this piece:
→ Frameworks for Building and Orchestrating Agents
→ Computer and Browser Use
→ Voice
→ Document Understanding
→ Memory
→ Testing and Evaluation
→ Monitoring and Observability
→ Simulation
→ Vertical Agents

Become a premium AI Tidbits subscriber and get over $1k in free credits to build AI agents with Vapi, Claude, and other leading AI tools (Hugging Face, Deepgram, etc.), along with exclusive access to the LLM Builders series and in-depth explorations of crucial topics, such as the future of the internet in an era driven by AI agents.

Many readers expense the paid membership from their learning and development education stipend.

Upgrade to Premium

Frameworks for Building and Orchestrating Agents

Building AI agents requires robust frameworks that can handle complex workflows, memory management, and tool integration. These foundational frameworks serve as the backbone for creating agents that can understand, plan, and execute tasks autonomously.

CrewAI - a framework for orchestrating role-playing, autonomous AI agents
Phidata - build AI assistants with memory, knowledge, and tools
Camel - build customized multi-agent systems to generate data, complete tasks, or simulate real-world interactions
AutoGPT - create, deploy, and manage continuous AI agents that automate complex workflows
AutoGen - develop LLM applications using multiple agents that can converse with each other
SuperAGI - build, manage, and run autonomous AI agents quickly and reliably
Superagent - an open framework for building AI assistants
LangChain & LlamaIndex - the usual suspects, facilitating AI Agents through composability

CrewAI Mind Map — CrewAI supports running customized agents with specific roles, goals, and tools

Computer and Browser Use

For AI agents to be truly useful, they need to interact with computers and browsers just like humans do. These tools enable agents to navigate websites, control applications, and execute commands programmatically, bridging the gap between AI reasoning and real-world actions.

Open Interpreter - turn natural language commands into code that runs on your local machine
Self-Operating Computer - enables multimodal models to operate a computer
Agent-S - an open agentic framework that uses computers like a human
LaVague - create web agents that take actions on websites using LLMs as their reasoning engines
Playwright - a framework for web testing and automation
Puppeteer - a JavaScript library that provides a high-level API to control Chrome or Firefox

temp.mov [video-to-gif output image] — Self-Operating Computer generates a poem and saves it in a Google Doc

Voice

Voice interfaces represent the most natural way for humans to interact with AI agents. These tools enable the creation of agents that can understand spoken language, maintain context in conversations, and respond with natural-sounding speech, making AI interaction more accessible and intuitive.

Speech2speech

Ultravox - a speech2speech model for real-time voice interaction, superior to Moshi for now
Moshi - a speech2speech model for real-time voice interaction
Pipecat - a framework for voice and multimodal conversational AI, supporting speech2text, text2speech, video, etc.

Speech2text

Whisper - OpenAI's speech2text model
Stable-ts - a lightweight Whisper wrapper with timestamps and more

Speaker diarization 3.1 - pyannote’s flagship model for speaker detection

Text2speech

The only decent open model I came across was ChatTTS, which is satisfactory for production. I, therefore, default to ElevenLabs or Cartesia.

Misc

Vocode - a toolkit for building voice-based LLM agents
Voice Lab - a comprehensive testing and evaluation framework for voice agents across language models, prompts, and agent personas

Document Understanding

Modern AI agents need to process and understand documents in various formats, from PDFs to images with text. These tools provide the crucial ability to extract, comprehend, and act on information from unstructured documents, enabling agents to handle real-world business processes.

Qwen2-VL - vision language model from Alibaba outperforming GPT-4o and Claude 3.5 Sonnet
DocOwl2 - an efficient multimodal LLM for OCR-free document understanding

Image temp.mov — Qwen2 excels in document and chart understanding with a commercially permissive license

Memory

Without memory, AI agents are limited to single-turn interactions. These memory tools enable agents to maintain context over long conversations, remember user preferences, and learn from past interactions, making them truly personal assistants rather than just query responders.

Mem0 - provides an efficient, self-improving memory layer for LLMs, enabling personalized AI experiences
Letta (fka MemGPT) - create LLM agents with long-term memory and custom tools
LangChain - offers memory components to manage conversation history and context

Screenshot of the Letta ADE (Agent Development Environment) — Stateful agents with Letta

Become a premium AI Tidbits subscriber and get over $1k in free credits to build AI agents with Vapi, Claude, and other leading AI tools (Hugging Face, Deepgram, etc.), along with exclusive access to the LLM Builders series and in-depth explorations of crucial topics, such as the future of the internet in an era driven by AI agents.

Many readers expense the paid membership from their learning and development education stipend.

Upgrade to Premium

Testing and Evaluation

As AI agents become more complex, robust testing becomes critical. These tools help developers evaluate agent performance, identify failure modes, and ensure reliability across different scenarios and environments.

Voice Lab - a comprehensive testing and evaluation framework for voice agents
AgentOps - tools for monitoring and benchmarking agent performance
AgentBench - a benchmark to evaluate LLMs as agents across various environments (Web, Minecraft, Visual Design, etc.)

Demo usage — Test and refine your voice agents with Voice Lab

Monitoring and Observability

Understanding how AI agents perform in production is crucial for maintaining reliability and optimizing costs. These tools provide insights into agent behavior, resource usage, and performance metrics essential for running agents at scale.

openllmetry - an open-source, OpenTelemetry-based end-to-end observability tool for LLM applications
AgentOps - agent monitoring, LLM cost tracking, benchmarking, and more

Session Replays — Debug agents with AgentOps

Simulation

Before deploying agents to real-world scenarios, testing them in controlled environments is crucial. These simulation tools allow developers to validate agent behavior, test edge cases, and refine decision-making capabilities in safe, reproducible environments.

AgentVerse - facilitates the deployment of multiple LLM-based agents in various applications, including simulations
Tau-Bench - a benchmark and testing code for agent-user interactions in real-world domains like retail and airline
ChatArena - multi-agent language game environments for research on autonomous LLM agents
AI Town - A virtual town where AI characters live, chat, and socialize
Generative Agents - Stanford’s Interactive simulacra of human behavior

Screen Shot 2023-10-16 at 10 53 49 PM — Simulate agentic environments with AgentVerse

Vertical Agents

There are dozens of open vertical agents out there, so here are just a few select ones I’ve tinkered with and found the most useful:

OpenHands (Coding) - a platform for software development agents powered by AI
aider (Coding) - pair programming in your terminal
GPT Engineer (Low code) - build applications using natural language. Specify what you want to build, and the AI will ask for clarification before building it.
screenshot-to-code - convert screenshots into a functioning website using HTML/Tailwind/React/Vue
GPT Researcher (Research) - an autonomous agent that performs comprehensive research on any given topic
Vanna (SQL) - chat with your SQL database

aider screencast — Aider is a pair programming in a terminal

Looking Ahead

While this post focused on open-source packages with permissive licenses, I plan to publish another comprehensive list specifically for engineers building voice agents. This upcoming guide will include both open-source and commercial tools, covering solutions like OpenAI's Realtime API (speech2speech) and ElevenLabs (text2speech), along with detailed comparisons of their capabilities, pricing models, and ideal use cases.

Stay tuned for more deep dives in the AI Agents Series.

Comprehensive list of open-source packages for AI engineers (last update: Aug ‘23)

Sahar's 2¢