AI Tidbits

AI Tidbits

Share this post

AI Tidbits
AI Tidbits
October 2024 - AI Tidbits Monthly Roundup
Copy link
Facebook
Email
Notes
More
Monthly's

October 2024 - AI Tidbits Monthly Roundup

New open-source powerful video generation models, agentic frameworks and tools to control computers and smartphones from Apple and Anthropic, and open-source NotebookLM

Arthur Mor's avatar
Arthur Mor
Nov 10, 2024
∙ Paid
17

Share this post

AI Tidbits
AI Tidbits
October 2024 - AI Tidbits Monthly Roundup
Copy link
Facebook
Email
Notes
More
2
Share

Welcome to the monthly curated round-up, where we curate the firehose of AI research papers and tools so you won’t have to. If you're pressed for time and can only catch one AI Tidbits edition, this is the one to read—featuring the absolute must-knows.


October marked a pivotal moment for agentic AI, as interest from industry and academia reached new heights, alongside groundbreaking video generation releases that democratized previously exclusive capabilities.

Genmo's release of Mochi 1 and Rhymes AI's Allegro led the charge, bringing commercial-grade text-to-video generation into the open-source domain. Meanwhile, industry giants made significant moves - Anthropic released Claude 3.5 Sonnet and Haiku with unprecedented computer use capabilities, GitHub expanded Copilot with Claude 3.5 and Gemini 1.5 integration, and OpenAI introduced Canvas for real-time writing and coding assistance.

The open-source community continued its remarkable momentum, with Nvidia's Llama-3.1-nemotron surpassing GPT-4 and Claude 3.5 on key benchmarks, while Apple made waves by open-sourcing Depth Pro and Ferret-UI 2, pushing the boundaries of on-device AI capabilities.

Multimodal AI saw further developments with Rhymes AI's Aria, the first open-source mixture-of-experts multimodal model, and Meta's innovative Spirit LM, combining text and speech capabilities.

These breakthroughs and numerous advances in autonomous agents, audio generation, and AI tools paint a picture of rapid democratization across AI domains.

Let's dive in!


Overview

  • ✨ Special Feature: Open Video Generation

  • Industry announcements (10 entries)

  • Large Language Models

    • Open-source (9 entries)

    • Research (8 entries)

  • Autonomous Agents (8 entries)

  • Multimodal (7 entries)

  • Image and Video (9 entries)

  • Audio (3 entries)

  • AI Tools (3 entries)

  • Open-source Packages (7 entries)


✨ Special Feature: Open-source Video Generation

  1. Genmo openly releases Mochi 1 - a text-to-video model delivering smooth 30fps videos with precise motion and accurate prompt adherence, with downloadable weights on Hugging Face and a commercially permissive license

  2. Rhymes AI releases Allegro - a 2.8B open-source model capable of generating cinematic 6-second videos from text prompts at 15 FPS and 720p resolution

  3. Meta AI presents MovieGen - a next-gen model family that generates HD personalized videos and synchronized audio from text prompts, enabling users to create and edit videos featuring their own faces

  4. Peking University develops a new method for efficient video generation, producing smooth, high-quality 10-second videos in 768p resolution at 24 FPS

temp.mov [optimize output image]
Mochi is the leading open-source text2video model, outperforming Runway, Pika, and Luma Labs


Recent Deep Dives

The Great AI Consolidation

The Great AI Consolidation

Sahar Mor
·
September 29, 2024
Read full story
12 techniques to reduce your LLM API bill and launch blazingly fast products

12 techniques to reduce your LLM API bill and launch blazingly fast products

Sahar Mor
·
January 13, 2024
Read full story
[cross-post] 7 methods to secure LLM apps from prompt injections and jailbreaks

[cross-post] 7 methods to secure LLM apps from prompt injections and jailbreaks

Sahar Mor
·
February 9, 2024
Read full story


Industry announcements

  1. Anthropic releases new versions of its LLMs Claude 3.5 Sonnet and Haiku, boasting top-tier performance in coding and problem-solving, surpassing OpenAI o1-preview, along with new human-like software interaction capabilities, enabling the model to click, type, and automate tasks directly through GUIs

  2. OpenAI introduces Canvas - a visual interface that simplifies real-time edits on writing and coding tasks within ChatGPT

  3. OpenAI releases Chat Completions API with support for text and audio, enabling both asynchronous audio experiences and real-time interactions

  4. Google elevates its podcast-generating NotebookLM, introducing customizable Audio Overviews and giving users the ability to fine-tune AI summaries with specific instructions 

  5. GitHub unveils multi-model Copilot in its annual developer conference, adding Claude 3.5 and Gemini 1.5, as well as launches GitHub Spark, an AI-native tool that builds micro web apps entirely through natural language with no coding required

  6. Runway releases Act-One - a cutting-edge tool for transforming simple video and voice inputs into expressive character performances

  7. ElevenLabs unveils Voice Design, allowing users to create unique voices from a text prompt alone

  8. Ideogram releases Canvas - an AI-powered image editor offering inpainting and outpainting capabilities, outperforming competitors like Midjourney

  9. Sequoia Capital publishes a report on the evolution of generative AI, highlighting the shift from fast, pattern-based responses ("System 1 thinking") to deliberate reasoning at inference time ("System 2 thinking")

  10. Anthropic's CEO presents a hopeful vision for AI, predicting breakthroughs in health, economics, and governance if AI’s potential is harnessed correctly

temp.mov [optimize output image]
Claude Computer Use scheduling a meeting autonomously
Become a premium member to get full access to my content and $1k in free credits for leading AI tools and APIs like Claude, Replicate, and Hugging Face. It’s common to expense the paid membership from your company’s learning and development education stipend.

Upgrade to Premium


Large Language Models

Open-source

  1. Nvidia releases Llama-3.1-nemotron-70b - a language model that outperforms GPT-4o and Claude 3.5 Sonnet on instruction following benchmarks like AlpacaEval and MT-Bench, allowing commercial use 

  2. Mistral releases Ministral 3B and 8B models for edge computing, pushing new limits in reasoning and function-calling within the sub-10B range, outperforming Llama 3 8B and Mistral 7B on instruction-following benchmarks 

  3. Meta releases quantized Llama 3.2 models, delivering faster on-device AI processing with reduced size and memory use for mobile deployment

  4. Meta releases an open-source replica of Google’s NotebookLM called NotebookLlama, offering an open-source framework using Llama models and text-to-speech tools to generate podcast-style audio from PDFs

Keep reading with a 7-day free trial

Subscribe to AI Tidbits to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Substack Inc
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More