AI Tidbits

AI Tidbits

Share this post

AI Tidbits
AI Tidbits
February 2024 - AI Tidbits Monthly Roundup
Copy link
Facebook
Email
Notes
More
Monthly's

February 2024 - AI Tidbits Monthly Roundup

New language models that push the envelope from Anthropic and Google, OpenAI's foray into video generation, new AI models to solve math problems, and models to navigate mobile apps autonomously

Arthur Mor's avatar
Arthur Mor
Mar 09, 2024
∙ Paid
11

Share this post

AI Tidbits
AI Tidbits
February 2024 - AI Tidbits Monthly Roundup
Copy link
Facebook
Email
Notes
More
3
Share

Welcome to the monthly curated round-up, where we curate the firehose of AI research papers and tools so you won’t have to. If you're pressed for time and can only catch one AI Tidbits edition, this is the one to read—featuring the absolute must-knows.


Welcome to the February edition of AI Tidbits Monthly, where we unravel the latest and greatest in AI. February continued January’s strong momentum for commercial and open-source AI across modalities.

On the commercial LLMs front, Google released Gemini 1.5, supporting a groundbreaking context window of 10M tokens. Anthropic released Claude 3, a suite of powerful language models with image understanding capabilities and performance that outperform GPT-4. Mistral launched its largest and most powerful model to date, Mistral Large.

Open-source language models experienced a step change in performance, with Google’s Gemma, Abacus’ Smaug, and Qwen 1.5—all demonstrating GPT-3.5-level performance with a commercially permissive license.

Nonetheless, February’s biggest announcement was OpenAI’s new text-to-video model, Sora, which produces Hollywood-grade one-minute videos. Alibaba unveiled a remarkable new framework designed to bring portraits to life with incredibly realistic expressions and accurate lip-syncing. Lastly, Google released a pioneering tool that turns any image into an interactive 2D game.

These breakthroughs, along with many more across speech, video, multimodal AI, and autonomous agents, are featured in this month’s roundup.

Let's dive in!


Overview

  • Industry announcements (7 entries)

  • ✨ Special feature: Speech recognition and text-to-speech AI (5 entries)

  • Large Language Models

    • Open-source (15 entries)

    • Research (9 entries)

  • Autonomous Agents (5 entries)

  • Image and Video (13 entries)

  • Audio (3 entries)

  • Multimodal (5 entries)

  • Robotics (4 entries)

  • Open-source packages (5 entries)

  • AI tools (5 entries)

Recent Deep Dives

Top 8 leaderboards to choose the right AI model for your task

Top 8 leaderboards to choose the right AI model for your task

Sahar Mor
·
February 17, 2024
Read full story
[cross-post] 7 methods to secure LLM apps from prompt injections and jailbreaks

[cross-post] 7 methods to secure LLM apps from prompt injections and jailbreaks

Sahar Mor
·
February 9, 2024
Read full story
12 techniques to reduce your LLM API bill and launch blazingly fast products

12 techniques to reduce your LLM API bill and launch blazingly fast products

Sahar Mor
·
January 13, 2024
Read full story
Harnessing research-backed prompting techniques for enhanced LLM performance

Harnessing research-backed prompting techniques for enhanced LLM performance

Sahar Mor
·
December 10, 2023
Read full story
Most popular and upcoming Generative AI tools and APIs

Most popular and upcoming Generative AI tools and APIs

Sahar Mor
·
December 19, 2023
Read full story

Industry announcements

  1. OpenAI unveils Sora - a groundbreaking text-to-video model that creates realistic videos up to a minute long from text prompts

  2. Anthropic announces Claude 3 - three state-of-the-art language models, setting new industry benchmarks across reasoning, math, coding, multilingual understanding, and vision

  3. Google releases Gemini 1.5, featuring a groundbreaking 10M context window for superior performance across multiple modalities with reduced compute

  4. Mistral AI releases Mistral Large - a top-tier model that rivals GPT-4 with advanced multilingual reasoning and competitive pricing, now available on Azure as part of a new partnership with Microsoft

  5. Stability AI unveils Stable Diffusion 3, featuring enhanced multi-subject and advanced text prompt handling

  6. Stability AI launches SVD 1.1 - a text-to-video model optimized for better motion and consistency

  7. Ideogram releases Ideogram 1.0 - a text-to-image model excelling in text rendering and photorealism

    ssstwitter.com_1708588021070.mp4 [optimize output image]
    OpenAI’s novel text-to-video system, Sora
Become a premium member to get full access to my content and $1k in free credits for leading AI tools and APIs like Perplexity, Replicate, and Hugging Face. It’s common to expense the paid membership from your company’s learning and development education stipend.

Upgrade to Premium

✨ Special feature: Speech recognition and text-to-speech AI

  1. MetaVoice open sources a commercially permissive 1B base model for text-to-speech, supporting voice cloning and emotional speech synthesis

  2. Amazon unveils Base TTS - the largest text-to-speech model trained on 100K hours of speech, achieving unprecedented naturalness in speech synthesis with novel tokenization

  3. Nvidia presents Audio Flamingo - a novel audio language model that improves LLMs' abilities to understand audio

  4. Nvidia releases Canary 1 - a state-of-the-art automatic speech recognition and speech translation model leading the Open ASR Leaderboard across four languages

  5. Nvidia introduces Parakeet-TDT - revolutionizing speech recognition with unparalleled accuracy and 64% faster processing speed compared to previous models

Large Language Models (LLMs)

Open-source

  1. Google open sources Gemma - a suite of small language models (7B, 12B) that outperforms Llama 2 and Mistral 7B, permitting commercial use

Keep reading with a 7-day free trial

Subscribe to AI Tidbits to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Substack Inc
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More