AI Tidbits

AI Tidbits

Share this post

AI Tidbits
AI Tidbits
LinkedIn Highlights, Oct 2024
Monthly's

LinkedIn Highlights, Oct 2024

Turning unstructured docs to structured data with language models, a platform for generating Windows-operating agents, open-source text-to-speech, answering questions over databases, and LLM security

Sahar Mor's avatar
Sahar Mor
Nov 07, 2024
∙ Paid
18

Share this post

AI Tidbits
AI Tidbits
LinkedIn Highlights, Oct 2024
1
Share

Something different today: Rather than our usual Thursday roundup, I'll take a slight detour to share some in-depth insights about AI Agents that have occupied my mind lately. For the next two weeks, expect more of Sahar's 2¢ pieces.

This deep dive series will probably launch next week, so I'm fast-tracking the LinkedIn Highlights post.


Welcome to LinkedIn Highlights!

Each month, I'll share my five top-performing LinkedIn posts, bringing you the best of AI straight from the frontlines of academia and industry.

As a frequent LinkedIn contributor, I regularly share insights on groundbreaking papers, promising open-source packages, and significant AI product launches. These posts offer more depth and detail than our weekly snippets, providing a comprehensive look at the latest AI developments.

Whether you're not on LinkedIn or simply missed a post, this monthly roundup ensures you stay informed about the most impactful AI news and innovations.


1. Sparrow

No alt text provided for this image

A new open-source project called Sparrow simplifies the challenging task of extracting structured data from unstructured documents like forms, invoices, and images using machine learning and LLM pipelines.

Its modular and pluggable architecture lets you seamlessly integrate tools like LlamaIndex, Haystack, and Unstructured for customizable data processing workflows. Whether you're processing PDFs or extracting content from images, Sparrow provides independent agents for each task.

Sparrow's standout feature is its ability to let users build and deploy LLM agents through a simple API, making integration into your systems seamless and efficient. It even supports local LLM execution using Ollama or Apple MLX.

Key agents include:

  • llamaindex - PDF processing with LlamaIndex

  • vprocessor - OCR + LlamaIndex for image processing

  • haystack - PDF processing with Haystack

  • unstructured-light - PDF and image processing with Unstructured and LangChain

GitHub repo https://github.com/katanaml/sparrow


2. Windows Agent Arena

1727127118262.mp4 [optimize output image]

Are AI Agents coming to Windows? Microsoft just released an open-source project for developers to build autonomous agents for its Windows operating system.

As part of the release, Microsoft open-sourced Omniparser, the current top-performing screen understanding model in their benchmark.

A ready Windows OS environment ensures agents perform optimally in real-world conditions. Microsoft also integrated it with Azure ML so multiple agents can run in parallel and complete their tasks in minutes rather than days, thanks to cloud scaling.

Code https://github.com/microsoft/WindowsAgentArena


Become a premium member to get full access to my content and $1k+ in free credits for leading AI tools and APIs, including Claude, Hugging Face, Deepgram. It’s common to expense the paid membership from your company’s learning and development education stipend.

Support AI Tidbits as a premium member


3. ChatTTS

A new breakthrough in text-to-speech technology is here: ChatTTS.
Explicitly designed for dialogue-based scenarios like LLM assistants, ChatTTS pushes the boundaries of conversational AI with the ability to generate natural, expressive speech.

ChatTTS is optimized for multi-speaker dialogue tasks, making it ideal for AI assistants and interactive conversation models. The model also allows fine-grained prosody control, such as pauses, laughter, and interjections, significantly enhancing the expressiveness of synthesized speech.

Its ability to predict and replicate natural speech patterns surpasses many open-source TTS models.

ChatTTS was trained on 100,000+ hours of English and Chinese audio and is open-sourced for research use.

Repo https://github.com/2noise/ChatTTS
Example notebook https://github.com/2noise/ChatTTS/blob/main/examples/ipynb/example.ipynb


4. Table-Augmented Generation

No alt text provided for this image

Keep reading with a 7-day free trial

Subscribe to AI Tidbits to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Substack Inc
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share