AI Tidbits

AI Tidbits

Share this post

AI Tidbits
AI Tidbits
August 2024 - AI Tidbits Monthly Roundup
Copy link
Facebook
Email
Notes
More
Monthly's

August 2024 - AI Tidbits Monthly Roundup

Musk's xAI unveils Grok-2, OpenAI's new GPT-4o model, Microsoft's open-source Phi 3.5 series, Nvidia's Eagle multimodal LLMs, Black Forest Labs' FLUX.1 to flood the internet with uncensored images

Arthur Mor's avatar
Arthur Mor
Sep 08, 2024
∙ Paid
16

Share this post

AI Tidbits
AI Tidbits
August 2024 - AI Tidbits Monthly Roundup
Copy link
Facebook
Email
Notes
More
1
Share

Welcome to the monthly curated round-up, where we curate the firehose of AI research papers and tools so you won’t have to. If you're pressed for time and can only catch one AI Tidbits edition, this is the one to read—featuring the absolute must-knows.


August has been a month of remarkable progress across various AI domains, from industry giants to open-source breakthroughs.

Elon Musk's xAI made waves with the unveiling of Grok-2 and Grok-2 mini, showcasing advanced capabilities that rival top models. OpenAI continued to refine its offerings with a more efficient GPT-4o and the introduction of Structured Outputs. The open-source community saw significant advancements, with Microsoft's Phi 3.5 series and AI21's Jamba models pushing the boundaries of what's possible with freely available models.

In the realm of multimodal AI, Nvidia's Eagle and Alibaba's Qwen2-VL demonstrated impressive performance in visual understanding tasks. The image and video generation field saw major leaps with Black Forest Labs' FLUX.1 and Tsinghua University's CogVideoX-5B. Audio AI also made strides, with Qwen2-Audio enabling multilingual voice interaction and HuggingFace's Parler TTS v1 offering enhanced text-to-speech capabilities.

Perhaps most intriguingly, Sakana AI introduced The AI Scientist, a system that could revolutionize scientific research by automating idea generation, execution, and documentation.

These developments, along with many more exciting updates across language models, multimodal AI, and specialized applications, are part of this month's comprehensive roundup.

Let's dive in!


Overview

  • Industry announcements (8 entries)

  • Large Language Models

    • Open-source (8 entries)

    • Research (8 entries)

  • Multimodal (11 entries)

  • Image and Video (10 entries)

  • Audio (6 entries)

  • Robotics (2 entries)

  • Open-source Packages (8 entries)

Recent Deep Dives

Top 8 leaderboards to choose the right AI model for your task

Top 8 leaderboards to choose the right AI model for your task

Sahar Mor
·
February 17, 2024
Read full story
12 techniques to reduce your LLM API bill and launch blazingly fast products

12 techniques to reduce your LLM API bill and launch blazingly fast products

Sahar Mor
·
January 13, 2024
Read full story
Harnessing research-backed prompting techniques for enhanced LLM performance

Harnessing research-backed prompting techniques for enhanced LLM performance

Sahar Mor
·
December 10, 2023
Read full story
[cross-post] 7 methods to secure LLM apps from prompt injections and jailbreaks

[cross-post] 7 methods to secure LLM apps from prompt injections and jailbreaks

Sahar Mor
·
February 9, 2024
Read full story
Become a premium member to get full access to my content and $1k in free credits for leading AI tools and APIs like Claude, Replicate, and Hugging Face. It’s common to expense the paid membership from your company’s learning and development education stipend.

Upgrade to Premium

Industry announcements

  1. Elon Musk's xAI unveils Grok-2 and Grok-2 mini, showcasing advanced chat, coding, and reasoning capabilities that outperform top models like Claude 3.5 Sonnet and GPT-4 Turbo

  2. OpenAI releases a new GPT-4o version that is slightly better and 50% cheaper than the previous GPT-4o model

  3. OpenAI releases Structured Outputs, ensuring AI-generated data conforms precisely to developer-supplied JSON schemas for enhanced reliability, achieving perfect accuracy with the new gpt-4o-2024-08-06 model

  4. Anthropic introduces prompt caching on its API, reducing costs and latency for large prompts by up to 90% and 85%, respectively, now in public beta for Claude 3.5 Sonnet and Claude 3 Haiku

  5. Google releases Gemini Live - a voice-interactive AI chatbot with enhanced emotional expression and real-time adaptive dialogue, similar to ChatGPT's new Advanced Voice Mode capability, offering hands-free and long-context conversational capabilities

  6. DeepMind releases Imagen 3 - a latent diffusion model that outperforms state-of-the-art models in generating high-quality images from text prompts, with built-in measures to enhance safety and representation

  7. OpenAI introduces fine-tuning for GPT-4o, allowing developers to tailor model responses and improve performance on domain-specific tasks like software engineering and text-to-SQL 

  8. Ideogram introduces Ideogram 2.0 - a new version of its text-to-image model, outperforming DALL-E, Midjourney, and FLUX Pro with improved text accuracy and an API for developers

    ssstwitter.com_1725662377237.mp4 [optimize output image]
    xAI’s Grok-2 is in 2nd place on the Chatbot Arena Leaderboard, surpassing Claude and the previous GPT-4o version


Large Language Models

Open-source

  1. Microsoft releases three new open-source AI models in the Phi 3.5 series: Phi 3.5 mini-instruct, MoE-instruct, and vision-instruct models, offering scalable reasoning capabilities for commercial and scientific use across languages

  2. AI2 introduces OLMoE - a sparse Mixture-of-Experts language model that activates only 1B parameters per token, achieving state-of-the-art performance and outperforming larger models like Llama2-13B-Chat

  3. AI21 releases Jamba Large and Jamba Mini - two new language models in its family of Mamba-Transformer models, featuring the longest context window for open models (256k) and rivaling state-of-the-art models like Llama 3.1 and Mistral Large

  4. Researchers open source OpenResearcher - an AI-driven platform that integrates LLMs with domain-specific knowledge through Retrieval-Augmented Generation, enabling researchers to efficiently navigate and generate insights from scientific literature

Keep reading with a 7-day free trial

Subscribe to AI Tidbits to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Substack Inc
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More