Welcome to the weekly edition of AI Tidbits, where I curate the firehose of AI research papers and tools every week so you won’t have to.
Sahar’s weekly take: This week, OpenAI equipped ChatGPT with memory capabilities—ChatGPT will now ‘remember’ key details about you across conversations, such as your name, writing style, and even political views.
I'm excited to see this feature rolled out. It is the first concrete step toward "LLM memory" beyond plain retrieval, à la RAG, and personalized language models that are curated to our preferences and personalities.
I could also see how such personalized ChatGPT could power experiences beyond just better-generated text. The rumor is that OpenAI will soon release models that excel in planning and reasoning, which are key capabilities for autonomous AI assistants. Imagine a ChatGPT that remembers your Amazon and Uber accounts—you ask it to find you a book you should read, and it transforms your natural language task into real-world action, finding and buying you a book you’d enjoy.
With over 100 million monthly active users, seemingly minor updates like this memory capability have the potential to profoundly transform our interactions with our phones and computers.
Overview
Microsoft releases the Phi-3 family of open models, with the compact phi-3-mini outshining larger models in performance, capable of running directly on smartphones ( Paper )
Apple unveils OpenELM - a series of open-source language models, enhancing AI capabilities on iPhones with efficient, privacy-focused technology ( Hugging Face)
Google develops Med-Gemini - a specialized multimodal Gemini model that sets new benchmarks in medicine with a 91.1% accuracy in medical diagnosis ( Paper )
Synthesia introduces Expressive Avatars - its next-gen AI avatars, enhancing video communication with realistic emotional and physical mimicry ( Company blog )
MyShell and MIT open source OpenVoice V2 - a commercially permissive text-to-speech model that can clone any voice in any language ( X )
Stanford's AI Index Report 2024 reveals AI's superior performance in specific tasks and significant economic impact, alongside growing ethical concerns and public anxiety ( Stanford blog )
OpenAI's ChatGPT now offers a memory function to all Plus users, enabling more tailored conversations by remembering past interactions
DeepMind showcases that providing many in-context learning examples significantly boosts LLMs' performance (up to 36%) in tasks like translation and summarizing, potentially reducing the need for specialized training
Fudan University develops AutoCrawler - a new framework combining LLMs with web crawlers to enhance adaptability and efficiency in web automation
Researchers extend Llama-3-8B context length from 8K to 1M tokens
Researchers propose a multi-token prediction approach for training LLMs, which boosts efficiency and performance, outperforming baselines on coding benchmarks without additional training time
Researchers propose Kolmogorov-Arnold Networks (KANs) - an innovative network architecture with learnable activation functions on edges, which outperforms traditional MLPs in accuracy, scalability, and interpretability
Cohere proposes PoLL - a cost-effective evaluation method using a diverse panel of smaller LLMs (LLM as a Judge), reducing bias and expenses while outperforming single large model assessments
Anthropic introduces "defection probes" to effectively detect sleeper agent behaviors in AI, achieving over 99% accuracy with Claude 2, promising broader applicability across various LLMs
OpenAI proposes a hierarchical instruction model for LLMs that improves security by prioritizing system prompts, enhancing robustness to attacks without compromising performance
Meta develops AdvPrompter - an LLM that rapidly generates adversarial prompts to expose vulnerabilities
Snowflake releases Arctic - an open-source LLM optimized for business tasks, boasting cost-effective training and high performance on par with Meta's more expensive Llama models
BigCode open sources StarCoder2-15B-Instruct, setting a new standard in coding LLMs by achieving a 72.6 HumanEval score and outperforming peers through its innovative, open-source self-aligned training approach
Rice University proposes SpaceByte - a novel byte-level decoding architecture that matches tokenized model performance while overcoming traditional tokenization drawbacks
Hugging Face releases FineWeb - a 15 trillion (!) token high-quality web dataset from CommonCrawl optimized for LLMs that outperforms existing datasets like C4 and The Pile
Microsoft proposes FILM-7B, employing information-intensive training to enhance LLMs' ability to utilize long contexts effectively, addressing the lost-in-the-middle challenge with long prompts
Researchers present GPT-4 as a superior tool in exploiting disclosed security flaws, achieving success in 87% of cases with detailed CVE descriptions
Researchers release InternVL 1.5 - an enhanced open-source multimodal language model with advanced visual and linguistic capabilities, achieving state-of-the-art results in numerous benchmarks
Researchers present a detailed survey on hallucination in multimodal LLMs, exploring its causes, evaluation techniques, and strategies for mitigation to advance the models' reliability and applicability
The University of Hong Kong and ByteDance release Groma - a multimodal LLM with a unique visual tokenization mechanism, enabling enhanced region-level image understanding and interaction
Hugging Face releases Jack of All Trades (JAT) - a versatile transformer-based model achieving strong performance across Reinforcement Learning, Computer Vision, and NLP tasks, fully open-sourced to advance general AI model development
Vidu, China's first text-to-video AI, launches with the ability to create consistent, high-definition 16-second videos, though still trailing behind counterparts like OpenAI's Sora
Researchers develop Pooling LLaVA - a novel model that adapts image-language pre-training to videos, reducing resource use and setting new performance benchmarks on video understanding tasks
Researchers release a novel diffusion model trained on a unique dataset of natural image pairs to effectively add objects into images based on text instructions, surpassing existing methods
Nvidia presents VisualFactChecker (VFC) - a training-free pipeline that generates high-fidelity, detailed captions for visual content, significantly outperforming current methods on 2D and 3D datasets
Researchers introduce Hyper-SD - a distillation framework for diffusion models, enhancing performance across minimal inference steps while preserving trajectory integrity, setting a new standard in the field
Researchers introduce InstantFamily - a novel approach using masked cross-attention to achieve state-of-the-art zero-shot multi-ID image generation with remarkable identity preservation
Adobe researchers release VideoGigaGAN, achieving a significant advance in video super-resolution with enhanced detail at up to eight times the original resolution
Microsoft releases VASA-1 - a framework that creates lifelike talking faces with synchronized lip movements and authentic facial expressions from a single image and audio clip, achieving real-time performance at high resolution
Cohere’s Toolkit - a chat interface for RAG-powered chatbot
Perplexica - an open-source AI-powered searching tool that goes deep into the internet to find answers
GPT Cache - semantic cache for LLMs, fully integrated with LangChain and LlamaIndex
Devika - an open-source replication of Devin, an agentic AI software engineer
Plus >70 more open-source packages for AI engineers
Perplexica is a Perplexity-like open-source project to launch LLM-powered search interfaces
Previous AI Tidbits roundup
Reach AI builders, researchers, and entrepreneurs by partnering with AI Tidbits
If you find AI Tidbits valuable, share it with a friend and consider showing your support.