September 2024 - AI Tidbits Monthly Roundup

OpenAI’s new reasoning model and realtime API to power AI assistants, Qwen's and AI2’s new fully open-sourced state-of-the-art multimodal models, and speech2text with Whisper v3 Turbo

Oct 06, 2024

∙ Paid

Welcome to the monthly curated round-up, where we curate the firehose of AI research papers and tools so you won’t have to. If you're pressed for time and can only catch one AI Tidbits edition, this is the one to read—featuring the absolute must-knows.

September has been a busy month for everyone in the AI space. It has been packed with groundbreaking developments across various AI domains, from industry giants to open-source breakthroughs.

In the realm of large language models, we've seen significant strides from both industry leaders and open-source initiatives. OpenAI introduced its advanced o1-preview and o1-mini models, excelling in high-level reasoning for coding and math. Meanwhile, Alibaba released the impressive Qwen 2.5 family of open multilingual models, handling an expansive 128K tokens. Meta also made waves with Llama 3.2, featuring edge-optimized text models and their first large multimodal models.

Multimodal AI saw remarkable progress, with AI2's Molmo models rivaling and surpassing industry giants like GPT-4V and Gemini 1.5. Nvidia's NVLM 1.0 and Apple's MM1.5 further pushed the boundaries of vision-language reasoning and diverse task performance.

In the audio domain, OpenAI released Whisper Large v3 Turbo, a faster and more capable speech-to-text model, while Google developed a promising zero-shot Voice Transfer module for cross-lingual applications.

The image and video generation landscape continued to evolve, with Meta's Imagine Yourself technology enabling personalized image generation and advancements in text-to-video models like CogVideoX-5B pushing the boundaries of visual content creation.

This month's roundup features these breakthroughs and many more exciting updates across AI tools, research methodologies, and vision AI.

Let's dive in!

Overview

Industry announcements (11 entries)
Large Language Models
- Open-source (11 entries)
- Research (9 entries)
Multimodal (7 entries)
Autonomous Agents (3 entries)
Image and Video (8 entries)
Audio (4 entries)
AI Tools (5 entries)
Open-source Packages (5 entries)

Recent Deep Dives

The Great AI Consolidation

Sahar Mor

September 29, 2024

Read full story

Harnessing research-backed prompting techniques for enhanced LLM performance

Sahar Mor

December 10, 2023

Read full story

[cross-post] 7 methods to secure LLM apps from prompt injections and jailbreaks

Sahar Mor

February 9, 2024

Read full story

Industry announcements

👆 OpenAI’s Realtime API powering Speak’s language learning app

Become a premium member to get full access to my content and $1k in free credits for leading AI tools and APIs like Claude, Replicate, and Hugging Face. It’s common to expense the paid membership from your company’s learning and development education stipend.

Upgrade to Premium

Large Language Models

Open-source

Keep reading with a 7-day free trial

Subscribe to AI Tidbits to keep reading this post and get 7 days of free access to the full post archives.