February 2024 - AI Tidbits Monthly Roundup

New language models that push the envelope from Anthropic and Google, OpenAI's foray into video generation, new AI models to solve math problems, and models to navigate mobile apps autonomously

Mar 09, 2024

∙ Paid

Welcome to the monthly curated round-up, where we curate the firehose of AI research papers and tools so you won’t have to. If you're pressed for time and can only catch one AI Tidbits edition, this is the one to read—featuring the absolute must-knows.

Welcome to the February edition of AI Tidbits Monthly, where we unravel the latest and greatest in AI. February continued January’s strong momentum for commercial and open-source AI across modalities.

On the commercial LLMs front, Google released Gemini 1.5, supporting a groundbreaking context window of 10M tokens. Anthropic released Claude 3, a suite of powerful language models with image understanding capabilities and performance that outperform GPT-4. Mistral launched its largest and most powerful model to date, Mistral Large.

Open-source language models experienced a step change in performance, with Google’s Gemma, Abacus’ Smaug, and Qwen 1.5—all demonstrating GPT-3.5-level performance with a commercially permissive license.

Nonetheless, February’s biggest announcement was OpenAI’s new text-to-video model, Sora, which produces Hollywood-grade one-minute videos. Alibaba unveiled a remarkable new framework designed to bring portraits to life with incredibly realistic expressions and accurate lip-syncing. Lastly, Google released a pioneering tool that turns any image into an interactive 2D game.

These breakthroughs, along with many more across speech, video, multimodal AI, and autonomous agents, are featured in this month’s roundup.

Let's dive in!

Overview

Industry announcements (7 entries)
✨ Special feature: Speech recognition and text-to-speech AI (5 entries)
Large Language Models
- Open-source (15 entries)
- Research (9 entries)
Autonomous Agents (5 entries)
Image and Video (13 entries)
Audio (3 entries)
Multimodal (5 entries)
Robotics (4 entries)
Open-source packages (5 entries)
AI tools (5 entries)

Read full story

Industry announcements

Become a premium member to get full access to my content and $1k in free credits for leading AI tools and APIs like Perplexity, Replicate, and Hugging Face. It’s common to expense the paid membership from your company’s learning and development education stipend.

Upgrade to Premium

✨ Special feature: Speech recognition and text-to-speech AI

Large Language Models (LLMs)

Open-source

Google open sources Gemma - a suite of small language models (7B, 12B) that outperforms Llama 2 and Mistral 7B, permitting commercial use

Keep reading with a 7-day free trial

Subscribe to AI Tidbits to keep reading this post and get 7 days of free access to the full post archives.