June 2024 - AI Tidbits Monthly Roundup
Apple's AI revolution, Anthropic's powerful new LLM Sonnet 3.5, a new SOTA AI software engineer, hyperrealistic video generation, and Microsoft's commercially permissive vision models
Welcome to the monthly curated round-up, where we curate the firehose of AI research papers and tools so you won’t have to. If you're pressed for time and can only catch one AI Tidbits edition, this is the one to read—featuring the absolute must-knows.
As we step into summer, the AI landscape continues moving with groundbreaking innovations and exciting developments. June has been a month of significant strides across various AI domains, from industry giants to open-source breakthroughs.
Apple's Worldwide Developers Conference (WWDC) took center stage, unveiling a suite of AI features that promise to revolutionize the user experience across Apple devices. Meanwhile, the language model arena saw remarkable advancements, with Anthropic's Claude 3.5 Sonnet pushing the boundaries of performance, a new software agent that scored 19% on SWE-bench, and a new state-of-the-art version of DeepSeek-Coder.
In video generation, companies like Runway and Luma AI challenge the status quo with their hyperrealistic video creation tools.
This month's roundup also spotlights impressive progress in multimodal AI, with Microsoft openly releasing Florence-2, a commercially permissive state-of-the-art small vision model family, and EPFL and Apple's new training approach setting new benchmarks for multimodal AI.
In addition to these highlights, June’s roundup features novel LLM techniques (e.g. Mixture-of-Agents), promising open-source projects (e.g. Open Interpreter), and a host of other developments in autonomous agents and multimodal AI.
Let's dive in!
Overview
✨ Special feature: Apple Worldwide Developers Conference (WWDC)
Industry announcements (9 entries)
Large Language Models
Open-source (10 entries)
Research (10 entries)
Autonomous Agents (4 entries)
Multimodal (4 entries)
Image and Video (8 entries)
Audio (3 entries)
Open-source Packages (6 entries)
Recent Deep Dives
Become a premium member to get full access to my content and $1k in free credits for leading AI tools and APIs like Perplexity, Replicate, and Hugging Face. It’s common to expense the paid membership from your company’s learning and development education stipend.
✨Special feature: Apple Worldwide Developers Conference (WWDC)
Apple Intelligence - a suite of new AI features for iPhone, Mac, and other Apple devices. This includes a more conversational Siri, custom AI-generated "Genmoji," and integration with OpenAI's GPT-4o. Apple’s on-device models excel in specific tasks using an adapter strategy, with on-device models outperforming larger models in summarizing and composing text.
Enhanced Siri capabilities - Siri will gain new abilities such as managing notifications, writing and summarizing text, and carrying out actions across multiple apps. Users can interact with Siri through voice or typing.
Genmoji and Image Playground - Apple is launching Genmoji to create emoji-like reactions on demand and Image Playground for AI-generated images. These features will be integrated into various apps, including Photos, which will have improved search and editing capabilities similar to Google's Magic Eraser.
OpenAI integration - Siri will leverage ChatGPT, powered by GPT-4o, for complex requests, ensuring user permission before sending data. ChatGPT will be available across iOS, macOS, and iPadOS, supporting AI writing and image generation tools.
—> Apple WWDC 2024 keynote in 18 minutes
Industry announcements
Large Language Models
Open-source
Keep reading with a 7-day free trial
Subscribe to AI Tidbits to keep reading this post and get 7 days of free access to the full post archives.