LinkedIn Highlights, Sep 2024

A Perplexity-like open-source package, ready-to-run notebooks for advanced RAG techniques, a GPT-powered OCR, Anthropic's tool to refine prompts, and a novel RAG method

Oct 13, 2024

∙ Paid

Introducing: AI Tidbits LinkedIn Highlights

Welcome to a new AI Tidbits series! Each month, I'll share my five top-performing LinkedIn posts, bringing you the best of AI straight from the frontlines of academia and industry.

As a frequent LinkedIn contributor, I regularly share insights on groundbreaking papers, promising open-source packages, and significant AI product launches. These posts offer more depth and detail than our weekly snippets, providing a comprehensive look at the latest AI developments.

Whether you're not on LinkedIn or simply missed a post, this monthly roundup ensures you stay informed about the most impactful AI news and innovations.

1. MindSearch

A new open-source search engine is rivaling top-tier AI products like Perplexity.ai Pro and ChatGPT-Web.

MindSearch is an innovative AI search engine framework that combines LLMs and a multi-agent system to tackle three critical issues that often limit LLM-powered search engines:

LLMs struggle to decompose complex queries into simpler, actionable requests
Search results often contain too much noise, making it hard to filter and extract relevant information
Iterative searches can quickly overload the LLM’s input length capacity

MindSearch utilizes two main components:

WebPlanner - decomposes complex queries into sub-tasks and creates a dynamic graph structure for problem-solving
WebSearcher - conducts fine-grained searches and delivers summarized information back to WebPlanner for further refinement

This approach allows MindSearch to handle massive web content (e.g., more than 300 pages) effectively, surpassing limitations faced by traditional LLM-based search systems.

According to subjective evaluations from human experts, MindSearch significantly outperforms major search engines like ChatGPT-Web and Perplexity.ai Pro. Its superior depth, breadth, and factual accuracy make it a breakthrough solution for both open-set and closed-set QA tasks.

Technical report https://arxiv.org/abs/2407.20183
Code https://github.com/InternLM/MindSearch

2. Advanced RAG techniques

A new GitHub repository provides the most comprehensive RAG tutorials you’ll find, showcasing advanced techniques to enhance the accuracy, efficiency, and contextual richness of RAG systems.

The repository offers easy-to-start notebooks covering methods like:

Reliable RAG – refining and validating retrieved information for better accuracy
Proposition Chunking – breaking down text into meaningful sentences for improved control over query handling
Query Transformations – optimizing queries by rewriting and decomposing complex ones into sub-queries.
Semantic Chunking – dividing documents based on semantic coherence for more meaningful retrieval.

GitHub repo https://github.com/NirDiamant/RAG_Techniques

Become a premium member to get full access to my content and $1k+ in free credits for leading AI tools and APIs, including Claude, Hugging Face, Deepgram. It’s common to expense the paid membership from your company’s learning and development education stipend.

Support AI Tidbits as a premium member

3. Zerox OCR

OCR just got simpler thanks to Zerox OCR, a dead simple open-source solution for extracting text from documents for AI ingestion.

Documents are visual by nature, filled with tricky layouts, tables, and charts, making vision models the perfect fit. Zerox uses GPT-4o Mini to turn visual documents into characters, à la OCR.

The process is straightforward:

Feed in a PDF
PDF is converted into a series of images
Each image is sent to GPT, which is tasked to convert it into markdown format
The response of each image is aggregated into a cohesive Markdown file

While it may sound basic, Zerox OCR with gpt-4o-mini is both cost-effective and delivers superior results compared to existing specialized solutions like AWS Textract, Google Document AI, and Azure Document AI.

Try it out https://github.com/getomni-ai/zerox

4. Anthropic’s metaprompt

Keep reading with a 7-day free trial

Subscribe to AI Tidbits to keep reading this post and get 7 days of free access to the full post archives.