February 2024 - AI Tidbits Monthly Roundup
New language models that push the envelope from Anthropic and Google, OpenAI's foray into video generation, new AI models to solve math problems, and models to navigate mobile apps autonomously
Welcome to the monthly curated round-up, where we curate the firehose of AI research papers and tools so you won’t have to. If you're pressed for time and can only catch one AI Tidbits edition, this is the one to read—featuring the absolute must-knows.
Welcome to the February edition of AI Tidbits Monthly, where we unravel the latest and greatest in AI. February continued January’s strong momentum for commercial and open-source AI across modalities.
On the commercial LLMs front, Google released Gemini 1.5, supporting a groundbreaking context window of 10M tokens. Anthropic released Claude 3, a suite of powerful language models with image understanding capabilities and performance that outperform GPT-4. Mistral launched its largest and most powerful model to date, Mistral Large.
Open-source language models experienced a step change in performance, with Google’s Gemma, Abacus’ Smaug, and Qwen 1.5—all demonstrating GPT-3.5-level performance with a commercially permissive license.
Nonetheless, February’s biggest announcement was OpenAI’s new text-to-video model, Sora, which produces Hollywood-grade one-minute videos. Alibaba unveiled a remarkable new framework designed to bring portraits to life with incredibly realistic expressions and accurate lip-syncing. Lastly, Google released a pioneering tool that turns any image into an interactive 2D game.
These breakthroughs, along with many more across speech, video, multimodal AI, and autonomous agents, are featured in this month’s roundup.
Let's dive in!
Overview
Industry announcements (7 entries)
✨ Special feature: Speech recognition and text-to-speech AI (5 entries)
Large Language Models
Open-source (15 entries)
Research (9 entries)
Autonomous Agents (5 entries)
Image and Video (13 entries)
Audio (3 entries)
Multimodal (5 entries)
Robotics (4 entries)
Open-source packages (5 entries)
AI tools (5 entries)
Recent Deep Dives
Industry announcements
Stability AI launches SVD 1.1 - a text-to-video model optimized for better motion and consistency
Ideogram releases Ideogram 1.0 - a text-to-image model excelling in text rendering and photorealism
Become a premium member to get full access to my content and $1k in free credits for leading AI tools and APIs like Perplexity, Replicate, and Hugging Face. It’s common to expense the paid membership from your company’s learning and development education stipend.
✨ Special feature: Speech recognition and text-to-speech AI
Large Language Models (LLMs)
Open-source
Keep reading with a 7-day free trial
Subscribe to AI Tidbits to keep reading this post and get 7 days of free access to the full post archives.