New open-source powerful video generation models, agentic frameworks and tools to control computers and smartphones from Apple and Anthropic, and open-source NotebookLM
Welcome to the monthly curated round-up, where we curate the firehose of AI research papers and tools so you won’t have to. If you're pressed for time and can only catch one AI Tidbits edition, this is the one to read—featuring the absolute must-knows.
October marked a pivotal moment for agentic AI, as interest from industry and academia reached new heights, alongside groundbreaking video generation releases that democratized previously exclusive capabilities.
Genmo's release of Mochi 1 and Rhymes AI's Allegro led the charge, bringing commercial-grade text-to-video generation into the open-source domain. Meanwhile, industry giants made significant moves - Anthropic released Claude 3.5 Sonnet and Haiku with unprecedented computer use capabilities, GitHub expanded Copilot with Claude 3.5 and Gemini 1.5 integration, and OpenAI introduced Canvas for real-time writing and coding assistance.
The open-source community continued its remarkable momentum, with Nvidia's Llama-3.1-nemotron surpassing GPT-4 and Claude 3.5 on key benchmarks, while Apple made waves by open-sourcing Depth Pro and Ferret-UI 2, pushing the boundaries of on-device AI capabilities.
Multimodal AI saw further developments with Rhymes AI's Aria, the first open-source mixture-of-experts multimodal model, and Meta's innovative Spirit LM, combining text and speech capabilities.
These breakthroughs and numerous advances in autonomous agents, audio generation, and AI tools paint a picture of rapid democratization across AI domains.
Become a premium member to get full access to my content and $1k in free credits for leading AI tools and APIs like Claude, Replicate, and Hugging Face. It’s common to expense the paid membership from your company’s learning and development education stipend.