AI Tidbits

AI Tidbits

Share this post

AI Tidbits
AI Tidbits
May 2024 - AI Tidbits Monthly Roundup
Copy link
Facebook
Email
Notes
More
Monthly's

May 2024 - AI Tidbits Monthly Roundup

A 2M context window from Google, an any-to-any model from OpenAI, new open models for document understanding, Mistral’s first coding LLM, and a sneak peek into LM’s brain from Anthropic

Arthur Mor's avatar
Arthur Mor
Jun 02, 2024
∙ Paid
13

Share this post

AI Tidbits
AI Tidbits
May 2024 - AI Tidbits Monthly Roundup
Copy link
Facebook
Email
Notes
More
1
Share

Welcome to the monthly curated round-up, where we curate the firehose of AI research papers and tools so you won’t have to. If you're pressed for time and can only catch one AI Tidbits edition, this is the one to read—featuring the absolute must-knows.


Welcome to the May edition of AI Tidbits Monthly, where we unravel the latest and greatest in AI. This month has been particularly eventful, with major updates from industry leaders such as Google, OpenAI, and Microsoft.

In its Spring Updates event, OpenAI introduced GPT-4o, a multimodal model processing text, vision, and audio with real-time emotion recognition and adaptive speech responses. They expanded the free tier to include ChatGPT Plus features and announced a new Mac app, with a Windows version coming soon.

Google I/O was also filled with announcements, including a Gemini 1.5 Pro with featuring a 2M-token context window and Gemini 1.5 Flash, optimized for speed and cost-efficiency. Google unveiled Project Astra, a real-time multimodal AI assistant, and Veo, a long-form video generator, and announced the Trillium chip (TPU v6) for AI datacenters.

Lastly, Microsoft held its annual developers conference, Microsoft Build, introducing Copilot+ PCs, AI-optimized devices with advanced silicon and all-day battery life. They launched agents as part of Copilot to complete tasks autonomously, unveiled Phi-3 Vision and Phi Silica models, and announced AI-powered real-time video translation for the Edge browser.

In addition to these highlights, May's roundup includes groundbreaking advancements in language models (a new coding model from Mistral!), research, and open-source projects.

Let's dive in!


Overview

  • ✨ Special feature: AI updates from Google, OpenAI, and Microsoft

  • Large Language Models

    • Open-source (10 entries)

    • Research (8 entries)

  • Autonomous Agents (2 entries)

  • Multimodal (5 entries)

  • Image and Video (5 entries)

  • Audio (4 entries)

  • Open-source Packages (6 entries)

Recent Deep Dives

Top 8 leaderboards to choose the right AI model for your task

Top 8 leaderboards to choose the right AI model for your task

Sahar Mor
·
February 17, 2024
Read full story
[cross-post] 7 methods to secure LLM apps from prompt injections and jailbreaks

[cross-post] 7 methods to secure LLM apps from prompt injections and jailbreaks

Sahar Mor
·
February 9, 2024
Read full story
12 techniques to reduce your LLM API bill and launch blazingly fast products

12 techniques to reduce your LLM API bill and launch blazingly fast products

Sahar Mor
·
January 13, 2024
Read full story
Harnessing research-backed prompting techniques for enhanced LLM performance

Harnessing research-backed prompting techniques for enhanced LLM performance

Sahar Mor
·
December 10, 2023
Read full story
Most popular and upcoming Generative AI tools and APIs

Most popular and upcoming Generative AI tools and APIs

Sahar Mor
·
December 19, 2023
Read full story
Become a premium member to get full access to my content and $1k in free credits for leading AI tools and APIs like Perplexity, Replicate, and Hugging Face. It’s common to expense the paid membership from your company’s learning and development education stipend.

Upgrade to Premium

✨ Special feature: AI updates from Google, OpenAI, and Microsoft

OpenAI Spring Updates

OpenAI introduced GPT-4o, a cutting-edge multimodal model that processes text, vision, and audio, offering superior speed and cost-efficiency compared to GPT-4 Turbo. Key enhancements include real-time emotion recognition and adaptive speech responses, inspired by the movie "Her." The new voice assistant features real-time translation, facial expression reading, and dynamic voice adaptation, significantly improving interactivity. OpenAI expanded its free tier, providing features previously exclusive to ChatGPT Plus users, and limited access to GPT-4o. Additionally, a new desktop app for Mac was announced, with a Windows version coming soon, and potential integration with Apple devices is on the horizon.

—> More here

Google I/O

At Google I/O 2024, Google unveiled Gemini 1.5 Pro, boasting an expanded context window of up to two million tokens, and Gemini 1.5 Flash, optimized for speed and cost-efficiency. New projects include Project Astra, a real-time multimodal AI assistant, and Veo, a long-form video generator. AI capabilities are being integrated across Google's ecosystem, enhancing Search, Gmail, Google Photos, and Android. The Trillium chip (TPU v6) was introduced, designed for AI datacenters to enhance processing power and energy efficiency, along with Gemini Nano, bringing on-device multimodal capabilities to Pixel devices.

—> More here

Microsoft Build

Microsoft introduced Copilot+ PCs, a new category of AI-optimized Windows devices with advanced silicon and all-day battery life. The company expanded Copilot AI agents to handle autonomous tasks, with new capabilities launching in Copilot Studio. Phi-3 Vision, a compact multimodal AI model, and Phi Silica, a local language model optimized for Copilot+ PCs, were unveiled. Additionally, the Edge browser will soon feature AI-powered real-time video translation, enabling multilingual video accessibility and enhancing global communication.

—> More here

Large Language Models (LLMs)

Open-source

  1. Microsoft open sources new Phi-3 models, including a 7B, 14b, and a new multimodal variant with vision capabilities featuring a 128k context window

  2. Mistral releases Codestral - a 32k context window coding model supporting over 80 programming languages and setting a new standard in performance and latency

  3. Mistral releases 7B v0.3 - a new version of its small and powerful open language model, with function calling support 

  4. Google releases Gemma 2 - featuring a new, efficient architecture that offers class-leading performance with fewer parameters and lower deployment costs, optimized for diverse hardware

  5. Google releases PaliGemma - a state-of-the-art open vision-language model that can perform deeper analysis of images and provide useful insights, such as captioning for images and short videos, object detection, and reading text embedded within images

  6. Researchers open-source Prometheus 2 - a groundbreaking, cost-effective language model evaluator, achieving unmatched performance in aligning with human assessments

  7. Cohere open sources Aya 23 - a new family of multilingual LLMs (8B, 35B) that supports 23 languages and outperforms existing models in various linguistic tasks

  8. IBM open sources for commercial use the Granite Code models - a series of code LLMs optimized for a wide range of coding tasks, from code generation to repository maintenance, achieving state-of-the-art performance in software development tasks

  9. Researchers unveil AutoCoder - the first coding model to outperform GPT-4 Turbo and GPT-4o on the Human Eval benchmark, achieving a pass@1 score of 90.9% and featuring an enhanced code interpreter with the capability to install external packages

  10. CMU showcases MAmmoTH2 - a novel LLM that significantly boosts reasoning performance by using web-extracted instruction data, setting new standards in efficiency and effectiveness

    graphical user interface
    Microsoft’s Phi-3 is a suite of small powerful language models

Research

  1. Anthropic releases a detailed study peeking into LLMs' brains, showcasing how millions of concepts such as gender, conversational styles, and political views are represented, revealing the first-ever detailed look inside a modern LLM

  2. Anthropic releases a new tool that turns task descriptions into production-ready optimized prompts

Keep reading with a 7-day free trial

Subscribe to AI Tidbits to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Substack Inc
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More