Welcome to the weekly edition of AI Tidbits, where I curate the firehose of AI research papers and tools every week so you won’t have to.
📩 Published a new breakthrough paper? Just released an open-source package? Submit it here to ensure we don’t miss it and that it gets featured in next week’s post.
Overview
✨ Highlights (6 entries)
Language Models (13 entries)
Multimodal (2 entries)
Autonomous Agents (2 entries)
Vision (7 entries)
Audio (2 entries)
Open-source Packages (2 entries)
GitHub unveils multi-model Copilot in its annual developer conference, adding Claude 3.5 and Gemini 1.5, as well as launches GitHub Spark, an AI-native tool that builds micro web apps entirely through natural language with no coding required
Apple develops and openly releases Ferret-UI 2 – a multimodal large language model to automate tasks across mobile devices like iPhones and web browsers
Meta releases an open-source replica of Google’s NotebookLM called NotebookLlama, offering an open-source framework using Llama models and text-to-speech tools to generate podcast-style audio from PDFs
Anthropic shows that its recently-released Claude 3.5 Sonnet achieves a new state-of-the-art of 49% on the real-world software engineering tasks benchmark, SWE-bench, providing concrete tips for developers using Claude
Anthropic presents an enhanced Claude chatbot with JavaScript coding capabilities and an analysis tool for precise data insights
Nvidia introduces HOVER - a framework that combines different control tasks into one approach, allowing robots to switch smoothly between actions like walking and handling objects
⭐️ Exciting news - Deepgram and Writer.com join AI Tidbits credits program!
AI Tidbits premium members receive $100 in Writer credits to build LLM-powered apps with RAG tools, AI guardrails, and more, along with $200 in Deepgram credits to build real-time voice agents.
Premium members also get full access to AI Tidbits content and $800+ for other leading AI tools and APIs, including Claude and Hugging Face.
We will be announcing more partners soon. Stay tuned.
Support AI Tidbits as a premium member
OpenAI introduces SimpleQA - a factuality benchmark designed to assess the accuracy and hallucination tendencies of language models on concise, fact-seeking questions, with top models like GPT-4o scoring under 40%
OpenAI publishes the system card for its GPT-4o model five months after its launch, providing details on its development and safety mechanism employed
Meta releases quantized Llama 3.2 models, delivering faster on-device AI processing with reduced size and memory use for mobile deployment
Cohere introduces Aya Expanse models, setting new multilingual AI standards with 8B and 32B parameter models outperforming larger competitors
Google releases the Japanese version of Gemma 2 - a lightweight AI model designed to match GPT-3.5’s Japanese proficiency with only 2B parameters
Researchers present Bielik 7B - a generative Polish language model that leverages innovative training techniques, surpassing Mistral-7B by 9% on the RAG Reader task
Researchers present a comprehensive review of document parsing, i.e. converting unstructured and semi-structured documents into structured machine-readable data, reviewing current methods from modular pipelines to end-to-end models powered by vision-language architectures
Shanghai AI Laboratory unveils CompassJudger-1 - an open-source judge LLM capable of scoring, critiquing, and executing evaluation tasks across diverse formats and scenarios
UIUC and AWS introduce CodeFavor - a code preference learning framework that improves model accuracy by 28.8% and achieves 34x cost-efficiency using synthetic evolution data
Researchers introduce HalluEditBench – a comprehensive benchmark for evaluating and advancing knowledge editing techniques, offering new insights through a rigorous assessment of methods across diverse hallucination cases
Researchers introduce STRING – a method that enhances the effective context length of LLMs by shifting position embeddings during inference, achieving over 10-point gains on long-context benchmarks and boosting LLMs like Llama 3.1 70B
Researchers introduce HarmAug – a data augmentation technique that generates harmful instructions to improve small safety guard models, achieving large-model performance with minimal computational cost
Researchers present a survey on Small Language Models (SLMs), proposing a novel taxonomy of optimization methods and outlining key datasets, metrics, and challenges
Meta and KAUST develop LongVU – an advanced compression mechanism for multimodal LLMs achieving state-of-the-art performance on long-video benchmarks through selective token reduction
Researchers introduce CLEAR - a benchmark for evaluating multimodal unlearning methods, enabling the removal of sensitive information from large multimodal models
Researchers present ROCKET-1 - a system combining GPT-4o, Molmo, and SAM-2, enabling precise AI interactions in virtual environments like Minecraft
Researchers develop and release AgentStore - a scalable platform with a MetaAgent strategy that integrates diverse agents for complex computer tasks, more than doubling previous performance on the OSWorld benchmark
Meta and KAUST introduce MarDini - a family of video diffusion models achieving state-of-the-art video interpolation and versatile video generation tasks
Researchers introduce Framer – an interactive frame interpolation system allowing users to customize keypoint trajectories, achieving smooth transitions between images with precise local motion control
OpenAI unveils sCM models, enabling 50x faster image generation with stable training and scalable performance, achieving top-tier FID scores
Researchers introduce SocialGPT – a modular framework combining vision models and LLMs to achieve competitive zero-shot social relation recognition with interpretable text-based reasoning
Google and UNC introduce Unbounded - a generative infinite game powered by real-time LLM-driven mechanics and emergent gameplay, with IP-Adapter ensuring visual consistency across dynamic environments
Researchers develop and openly release GenIR and DreamClear—a dual strategy combining innovative data curation and a Diffusion Transformer model to achieve state-of-the-art image restoration
TU Darmstadt shows that even advanced AI models like GPT-4 and Claude perform poorly on visual puzzles, exposing limitations in their visual reasoning abilities
Researchers develop and open source FasterCache - a training-free acceleration strategy that speeds up video diffusion models by 1.67x while maintaining high-quality output
Researchers present F5-TTS – a fully non-autoregressive text2speech system capable of cloning voices and generating speech fast
Zhipu AI open sources GLM-4-Voice - an end-to-end speech model that handles nuanced conversations with real-time adaptability in both English and Chinese
Agent.exe - a local app that lets Claude 3.5 Sonnet control your computer with the new computer-use API
Memary - an open-source memory for agents
Plus >70 more open-source packages for AI engineers
Agent.exe allows your to run Calude Computer Use in minutes Last week’s AI Tidbits roundup
Reach AI builders, researchers, and entrepreneurs by partnering with AI Tidbits
If you find AI Tidbits valuable, share it with a friend and consider showing your support.
Regarding NotebookLlama it is cool to have a open competitor, but at as of today it's sadly not on the same quality.