Welcome to the weekly edition of AI Tidbits, where I curate the firehose of AI research papers and tools every week so you won’t have to.
📩 Published a new breakthrough paper? Just released an open-source package? Submit it here to ensure we don’t miss it and that it gets featured in next week’s post.
Overview
✨ Highlights (6 entries)
Language Models (10 entries)
Multimodal (6 entries)
Autonomous Agents (6 entries)
Vision (3 entries)
Audio (1 entry)
AI Tools (1 entry)
Open-source Packages (2 entries)
Nvidia releases Llama-3.1-nemotron-70b - a language model that outperforms GPT-4o and Claude 3.5 Sonnet on instruction following benchmarks like AlpacaEval and MT-Bench, allowing commercial use (Hugging Face)
OpenAI open-sources Swarms - a new framework that orchestrates multiple AI agents with routines and handoffs for efficient multi-agent collaboration (OpenAI)
Mistral releases Ministral 3B and 8B models for edge computing, pushing new limits in reasoning and function-calling within the sub-10B range, outperforming Llama 3 8B and Mistral 7B on instruction-following benchmarks (Mistral)
Peking University develops a new method for efficient video generation, producing smooth, high-quality 10-second videos in 768p resolution at 24 FPS (Project website)
Sequoia Capital publishes a report on the evolution of generative AI, highlighting the shift from fast, pattern-based responses ("System 1 thinking") to deliberate reasoning at inference time ("System 2 thinking") (Sequoia)
Anthropic's CEO presents a hopeful vision for AI, predicting breakthroughs in health, economics, and governance if AI’s potential is harnessed correctly (Blog)
⭐️ Exciting news - Deepgram and Writer.com join AI Tidbits credits program!
AI Tidbits premium members recieve $100 in Writer credis to build LLM-powered apps with RAG tools, AI guardrails, and more, along with $200 in Deepgram credits to build real-time voice agents.
Premium members also get full access to AI Tidbits content and $800+ for other leading AI tools and APIs, including Claude and Hugging Face.
We will be announcing more partners soon. Stay tuned.
Support AI Tidbits as a premium member
Zyphra releases Zamba2-7B - a Mamba-based model outperforming Mistral-7B, Gemma-7B, and Llama3-8B in quality and efficiency for enterprise and on-device use
MIT unveils and open-sources DuoAttention - a framework that optimizes KV caching for different attention heads, reducing LLM memory by 2.5x and speeding up decoding by 2x while preserving long-context abilities
Researchers present AutoDAN-Turbo - a jailbreak framework achieving 74% higher attack success rates and 93% success on GPT-4-1106-turbo by combining automatic and human-designed strategies
Researchers present StructRAG - a novel framework that restructures retrieved information for better reasoning, achieving state-of-the-art results in knowledge-intensive tasks
Meta and Berkeley propose Thought Preference Optimization (TPO) - a training method that teaches LLMs to think before responding through iterative optimization, yielding superior performance across diverse tasks on benchmarks like AlpacaEval and Arena-Hard
ETH Zurich, INSAIT, and LatticeFlow introduce COMPL-AI – the first evaluation platform that aligns generative AI models with the EU AI Act by providing technical interpretations and benchmarking tools
Writer showcases Palmyra X 004 - a state-of-the-art LLM with tool-calling that automates workflows by interacting with external tools and generating structured outputs
OpenAI's research reveals that ChatGPT occasionally offers more nurturing advice to female-sounding names and technical suggestions to male-sounding ones, highlighting subtle biases
Apple introduces GSM-Symbolic - an improved benchmark designed to assess LLMs' mathematical reasoning, revealing fragility in performance when numerical values or question clauses are altered
HKUST proposes a multi-agent collaborative data selection mechanism that improves LLM pretraining efficiency, delivering a 10.5% performance boost over state-of-the-art methods
Rhymes AI openly releases Aria - the first open-source multimodal Mixture-of-Experts (MoE) model, outperforming Llama-3.2 and GPT-4o mini with 3.9B activated parameters and a 64k token-long context window
Mistral releases the technical paper of its Pixtral-12B multimodal model, exceling in both language and vision tasks, surpassing larger models like Llama-3.2 90B while being 7x smaller
Homebrew releases Ichigo - an open-source multimodal Llama 3.1 model that processes speech and responds in voice, streamlining audio interactions similar to OpenAI's new Advance Voice Mode
Baichuan develops Baichuan-Omni - a 7B open-source multimodal model capable of handling images, videos, audio, and text, offering high performance and real-time interaction
Researchers present DeCo - a decoding method that reduces hallucination rates by adaptively integrating knowledge and visual information, enhancing multimodal LLM performance
Researchers present LOKI - a benchmark designed to assess large multimodal models' ability to detect synthetic data across multiple modalities, with 18K questions spanning 26 subcategories
OpenAI introduces MLE-bench - a benchmark that evaluates AI agents on practical machine learning tasks using 75 real-world Kaggle competitions
Meta and KAUST present Agent-as-a-Judge - a new evaluation framework where agentic systems assess other agentic systems, offering superior feedback and scalability over LLM-as-a-Judge
Apple publishes CAMPHOR - an innovative on-device Small Language Model multi-agent framework that surpasses closed-source LLMs in task completion by 35%, ensuring privacy and eliminating server communication
Researchers introduce Agent S - an open framework that achieves 83% improved task success through hierarchical planning, showcasing broad applicability across operating systems
Researchers present UGround - a visual grounding model for GUI agents that improves grounding accuracy by up to 20% and enables agents to outperform state-of-the-art alternatives using only visual perception
Researchers present Animate-X - a universal animation framework using LDM and the Pose Indicator to generate high-quality animations of anthropomorphic and other character types
The University of Rochester and Bytedance unveil TextToon - a system for generating drivable, high-quality toonified avatars using a short video and text prompts, achieving real-time performance on GPUs and mobile devices
UT Texas and Google Researchers present a novel RF-based inversion framework that achieves state-of-the-art zero-shot inversion and editing
Stanford develops a large-scale dataset and a novel learning-based framework that leverages diffusion models and reinforcement learning to generate lifelike hand motions for piano performance, even for unseen music
Replace an image background with any color, image, or video
finic - connect scrapers or automations to a fleet of cloud-hosted browsers configured for reliability and stealth
Ditto - generate Flask apps from natural language
Plus >70 more open-source packages for AI engineers
Last week’s AI Tidbits roundup
Reach AI builders, researchers, and entrepreneurs by partnering with AI Tidbits
If you find AI Tidbits valuable, share it with a friend and consider showing your support.