LinkedIn Highlights, Oct 2024
Turning unstructured docs to structured data with language models, a platform for generating Windows-operating agents, open-source text-to-speech, answering questions over databases, and LLM security
Something different today: Rather than our usual Thursday roundup, I'll take a slight detour to share some in-depth insights about AI Agents that have occupied my mind lately. For the next two weeks, expect more of Sahar's 2¢ pieces.
This deep dive series will probably launch next week, so I'm fast-tracking the LinkedIn Highlights post.
Welcome to LinkedIn Highlights!
Each month, I'll share my five top-performing LinkedIn posts, bringing you the best of AI straight from the frontlines of academia and industry.
As a frequent LinkedIn contributor, I regularly share insights on groundbreaking papers, promising open-source packages, and significant AI product launches. These posts offer more depth and detail than our weekly snippets, providing a comprehensive look at the latest AI developments.
Whether you're not on LinkedIn or simply missed a post, this monthly roundup ensures you stay informed about the most impactful AI news and innovations.
1. Sparrow
A new open-source project called Sparrow simplifies the challenging task of extracting structured data from unstructured documents like forms, invoices, and images using machine learning and LLM pipelines.
Its modular and pluggable architecture lets you seamlessly integrate tools like LlamaIndex, Haystack, and Unstructured for customizable data processing workflows. Whether you're processing PDFs or extracting content from images, Sparrow provides independent agents for each task.
Sparrow's standout feature is its ability to let users build and deploy LLM agents through a simple API, making integration into your systems seamless and efficient. It even supports local LLM execution using Ollama or Apple MLX.
Key agents include:
llamaindex - PDF processing with LlamaIndex
vprocessor - OCR + LlamaIndex for image processing
haystack - PDF processing with Haystack
unstructured-light - PDF and image processing with Unstructured and LangChain
GitHub repo https://github.com/katanaml/sparrow
2. Windows Agent Arena
Are AI Agents coming to Windows? Microsoft just released an open-source project for developers to build autonomous agents for its Windows operating system.
As part of the release, Microsoft open-sourced Omniparser, the current top-performing screen understanding model in their benchmark.
A ready Windows OS environment ensures agents perform optimally in real-world conditions. Microsoft also integrated it with Azure ML so multiple agents can run in parallel and complete their tasks in minutes rather than days, thanks to cloud scaling.
Code https://github.com/microsoft/WindowsAgentArena
Become a premium member to get full access to my content and $1k+ in free credits for leading AI tools and APIs, including Claude, Hugging Face, Deepgram. It’s common to expense the paid membership from your company’s learning and development education stipend.
3. ChatTTS
A new breakthrough in text-to-speech technology is here: ChatTTS.
Explicitly designed for dialogue-based scenarios like LLM assistants, ChatTTS pushes the boundaries of conversational AI with the ability to generate natural, expressive speech.
ChatTTS is optimized for multi-speaker dialogue tasks, making it ideal for AI assistants and interactive conversation models. The model also allows fine-grained prosody control, such as pauses, laughter, and interjections, significantly enhancing the expressiveness of synthesized speech.
Its ability to predict and replicate natural speech patterns surpasses many open-source TTS models.
ChatTTS was trained on 100,000+ hours of English and Chinese audio and is open-sourced for research use.
Repo https://github.com/2noise/ChatTTS
Example notebook https://github.com/2noise/ChatTTS/blob/main/examples/ipynb/example.ipynb
4. Table-Augmented Generation
Keep reading with a 7-day free trial
Subscribe to AI Tidbits to keep reading this post and get 7 days of free access to the full post archives.