AI Roundup 04/25 -> 05/02/2024

Microsoft's Phi-3 excels in mobile performance, Apple's OpenELM boosts iPhone AI, ChatGPT memory for Plus users, Cohere's language models jury to better models, and a new text-to-video Sora rival

May 02, 2024

Welcome to the weekly edition of AI Tidbits, where I curate the firehose of AI research papers and tools every week so you won’t have to.

Sahar’s weekly take: This week, OpenAI equipped ChatGPT with memory capabilities—ChatGPT will now ‘remember’ key details about you across conversations, such as your name, writing style, and even political views.

I'm excited to see this feature rolled out. It is the first concrete step toward "LLM memory" beyond plain retrieval, à la RAG, and personalized language models that are curated to our preferences and personalities.

I could also see how such personalized ChatGPT could power experiences beyond just better-generated text. The rumor is that OpenAI will soon release models that excel in planning and reasoning, which are key capabilities for autonomous AI assistants. Imagine a ChatGPT that remembers your Amazon and Uber accounts—you ask it to find you a book you should read, and it transforms your natural language task into real-world action, finding and buying you a book you’d enjoy.

With over 100 million monthly active users, seemingly minor updates like this memory capability have the potential to profoundly transform our interactions with our phones and computers.

Overview

✨ Highlights (6 entries)
Language Models (16 entries)
Multimodal (4 entries)
Vision (7 entries)
Audio (1 entries)
Open-source Packages (4 entries)

Recent Deep Dives

Top 8 leaderboards to choose the right AI model for your task

Top 8 leaderboards to choose the right AI model for your task

·

February 17, 2024

Read full story

12 techniques to reduce your LLM API bill and launch blazingly fast products

12 techniques to reduce your LLM API bill and launch blazingly fast products

·

January 13, 2024

Read full story

Harnessing research-backed prompting techniques for enhanced LLM performance

Harnessing research-backed prompting techniques for enhanced LLM performance

·

December 10, 2023

Read full story

Highlights

Microsoft releases the Phi-3 family of open models, with the compact phi-3-mini outshining larger models in performance, capable of running directly on smartphones (Paper)

Apple unveils OpenELM - a series of open-source language models, enhancing AI capabilities on iPhones with efficient, privacy-focused technology (Hugging Face)

Google develops Med-Gemini - a specialized multimodal Gemini model that sets new benchmarks in medicine with a 91.1% accuracy in medical diagnosis (Paper)

Synthesia introduces Expressive Avatars - its next-gen AI avatars, enhancing video communication with realistic emotional and physical mimicry (Company blog)

temp.mov [optimize output image]

AI Index: State of AI in 13 Charts

Language Models

OpenAI's ChatGPT now offers a memory function to all Plus users, enabling more tailored conversations by remembering past interactions

temp.mov [video-to-gif output image]

DeepMind showcases that providing many in-context learning examples significantly boosts LLMs' performance (up to 36%) in tasks like translation and summarizing, potentially reducing the need for specialized training

Fudan University develops AutoCrawler - a new framework combining LLMs with web crawlers to enhance adaptability and efficiency in web automation

Researchers propose Kolmogorov-Arnold Networks (KANs) - an innovative network architecture with learnable activation functions on edges, which outperforms traditional MLPs in accuracy, scalability, and interpretability

Meta develops AdvPrompter - an LLM that rapidly generates adversarial prompts to expose vulnerabilities

Snowflake releases Arctic - an open-source LLM optimized for business tasks, boasting cost-effective training and high performance on par with Meta's more expensive Llama models

Microsoft proposes FILM-7B, employing information-intensive training to enhance LLMs' ability to utilize long contexts effectively, addressing the lost-in-the-middle challenge with long prompts

Researchers present GPT-4 as a superior tool in exploiting disclosed security flaws, achieving success in 87% of cases with detailed CVE descriptions

Multimodal

Vision

Vidu, China's first text-to-video AI, launches with the ability to create consistent, high-definition 16-second videos, though still trailing behind counterparts like OpenAI's Sora

temp2.mov [optimize output image]

Researchers develop Pooling LLaVA - a novel model that adapts image-language pre-training to videos, reducing resource use and setting new performance benchmarks on video understanding tasks

temp.mov [optimize output image]

Researchers introduce InstantFamily - a novel approach using masked cross-attention to achieve state-of-the-art zero-shot multi-ID image generation with remarkable identity preservation

Adobe researchers release VideoGigaGAN, achieving a significant advance in video super-resolution with enhanced detail at up to eight times the original resolution

Audio

Microsoft releases VASA-1 - a framework that creates lifelike talking faces with synchronized lip movements and authentic facial expressions from a single image and audio clip, achieving real-time performance at high resolution

Open-source Packages

Plus >70 more open-source packages for AI engineers

Perplexica is a Perplexity-like open-source project to launch LLM-powered search interfaces

Previous AI Tidbits roundup

AI Roundup 03/21 -> 03/28/2024

AI Roundup 03/21 -> 03/28/2024

·

March 28, 2024

Read full story

Reach AI builders, researchers, and entrepreneurs by partnering with AI Tidbits

If you find AI Tidbits valuable, share it with a friend and consider showing your support.

Discussion about this post

No posts

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts