AI Tidbits

AI Tidbits

Share this post

AI Tidbits
AI Tidbits
June 2024 - AI Tidbits Monthly Roundup
Copy link
Facebook
Email
Notes
More
Monthly's

June 2024 - AI Tidbits Monthly Roundup

Apple's AI revolution, Anthropic's powerful new LLM Sonnet 3.5, a new SOTA AI software engineer, hyperrealistic video generation, and Microsoft's commercially permissive vision models

Arthur Mor's avatar
Arthur Mor
Jun 30, 2024
∙ Paid
15

Share this post

AI Tidbits
AI Tidbits
June 2024 - AI Tidbits Monthly Roundup
Copy link
Facebook
Email
Notes
More
3
Share

Welcome to the monthly curated round-up, where we curate the firehose of AI research papers and tools so you won’t have to. If you're pressed for time and can only catch one AI Tidbits edition, this is the one to read—featuring the absolute must-knows.


As we step into summer, the AI landscape continues moving with groundbreaking innovations and exciting developments. June has been a month of significant strides across various AI domains, from industry giants to open-source breakthroughs.

Apple's Worldwide Developers Conference (WWDC) took center stage, unveiling a suite of AI features that promise to revolutionize the user experience across Apple devices. Meanwhile, the language model arena saw remarkable advancements, with Anthropic's Claude 3.5 Sonnet pushing the boundaries of performance, a new software agent that scored 19% on SWE-bench, and a new state-of-the-art version of DeepSeek-Coder.

In video generation, companies like Runway and Luma AI challenge the status quo with their hyperrealistic video creation tools.

This month's roundup also spotlights impressive progress in multimodal AI, with Microsoft openly releasing Florence-2, a commercially permissive state-of-the-art small vision model family, and EPFL and Apple's new training approach setting new benchmarks for multimodal AI.

In addition to these highlights, June’s roundup features novel LLM techniques (e.g. Mixture-of-Agents), promising open-source projects (e.g. Open Interpreter), and a host of other developments in autonomous agents and multimodal AI.

Let's dive in!


Overview

  • ✨ Special feature: Apple Worldwide Developers Conference (WWDC)

  • Industry announcements (9 entries)

  • Large Language Models

    • Open-source (10 entries)

    • Research (10 entries)

  • Autonomous Agents (4 entries)

  • Multimodal (4 entries)

  • Image and Video (8 entries)

  • Audio (3 entries)

  • Open-source Packages (6 entries)

Recent Deep Dives

Top 8 leaderboards to choose the right AI model for your task

Top 8 leaderboards to choose the right AI model for your task

Sahar Mor
·
February 17, 2024
Read full story
12 techniques to reduce your LLM API bill and launch blazingly fast products

12 techniques to reduce your LLM API bill and launch blazingly fast products

Sahar Mor
·
January 13, 2024
Read full story
Harnessing research-backed prompting techniques for enhanced LLM performance

Harnessing research-backed prompting techniques for enhanced LLM performance

Sahar Mor
·
December 10, 2023
Read full story
[cross-post] 7 methods to secure LLM apps from prompt injections and jailbreaks

[cross-post] 7 methods to secure LLM apps from prompt injections and jailbreaks

Sahar Mor
·
February 9, 2024
Read full story
Most popular and upcoming Generative AI tools and APIs

Most popular and upcoming Generative AI tools and APIs

Sahar Mor
·
December 19, 2023
Read full story
Become a premium member to get full access to my content and $1k in free credits for leading AI tools and APIs like Perplexity, Replicate, and Hugging Face. It’s common to expense the paid membership from your company’s learning and development education stipend.

Upgrade to Premium

✨Special feature: Apple Worldwide Developers Conference (WWDC)

  • Apple Intelligence - a suite of new AI features for iPhone, Mac, and other Apple devices. This includes a more conversational Siri, custom AI-generated "Genmoji," and integration with OpenAI's GPT-4o. Apple’s on-device models excel in specific tasks using an adapter strategy, with on-device models outperforming larger models in summarizing and composing text.

  • Enhanced Siri capabilities - Siri will gain new abilities such as managing notifications, writing and summarizing text, and carrying out actions across multiple apps. Users can interact with Siri through voice or typing.

  • Genmoji and Image Playground - Apple is launching Genmoji to create emoji-like reactions on demand and Image Playground for AI-generated images. These features will be integrated into various apps, including Photos, which will have improved search and editing capabilities similar to Google's Magic Eraser.

  • OpenAI integration - Siri will leverage ChatGPT, powered by GPT-4o, for complex requests, ensuring user permission before sending data. ChatGPT will be available across iOS, macOS, and iPadOS, supporting AI writing and image generation tools.

—> Apple WWDC 2024 keynote in 18 minutes

Industry announcements

  1. Anthropic releases Claude 3.5 Sonnet - a model with a 200k token context window, outperforming GPT-4o and featuring a new dynamic Artifacts workspace

  2. A new startup called Etched is trying to take on Nvidia by presenting the world’s first specialized chip for Transformers, delivering over 500,000 tokens per second and claims to be >10x faster and cheaper than NVIDIA’s next-generation Blackwell

  3. Factory emerges out of stealth to automate software engineering by modeling the cognitive processes of developers, achieving top performance on SWE-bench (19.27% compared to Devin's 13.86%)

  4. Ilya Sutskever, OpenAI's former chief scientist, starts Safe Superintelligence Inc. - a new company to build safe superintelligence

  5. Runway releases Gen-3 Alpha - an AI model generating high-quality, hyperrealistic videos with expressive human characters and smooth transitions 

  6. A Chinese company launches a Sora competitor called Kling - an AI model that generates realistic video clips from text prompts using advanced 3D AI techniques

  7. Luma AI debuts Dream Machine - an OpenAI Sora-like tool enabling users to create realistic videos from text prompts in just two minutes

  8. ElevenLabs releases Sound Effects, turning text into rich sounds

  9. Mistral introduces model customization on its platform, enabling efficient fine-tuning of AI models to meet specific user needs with reduced costs and expertise

temp.mov [speed output image]
Claude Artifacts generates and executes code live

Large Language Models

Open-source

  1. Microsoft releases Florence-2 - a commercially permissive state-of-the-art small vision model family (200M, 800M params) that outperforms larger specialized models in tasks like image description and object recognition

  2. DeepSeek-AI releases DeepSeek-Coder-V2 - an open-source language model supporting 338 programming languages, beating top commercial models like GPT-4 Turbo in code generation and mathematics

  3. Open Interpreter introduces Local III - a suite of tools for running powerful language models locally to control your personal computer, enhancing control and privacy

Keep reading with a 7-day free trial

Subscribe to AI Tidbits to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Substack Inc
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More