December 2023 - AI Tidbits Monthly Roundup
Google’s long-awaited Gemini, Apple's first major strides into generative AI, a state-of-the-art transcription model outperforming Whisper, and an autonomous AI that operates smartphone apps for you
Welcome to the monthly curated round-up, where I curate the firehose of AI research papers and tools so you won’t have to. If you're pressed for time and can only catch one AI Tidbits edition, this is the one to read—featuring the absolute must-knows.
Welcome to the December edition of AI Tidbits Monthly, where we unravel the latest and greatest in AI. December provided an exciting finale to a year filled with innovative breakthroughs and groundbreaking research.
This December, Google debuted its long-awaited large multimodal AI, Gemini, incorporating it into its Bard chatbot and providing API access. Apple also made its first major strides into generative AI with its efficient on-device inference framework, an open-source multimodal model named Ferret, and a new robust Apple silicon framework for enhanced ML efficiency.
On the open-source front, Mistral released a fully open-source Mixture of Experts model that outperforms GPT-3.5 and Llama 70B. Deci released a state-of-the-art 7B base model, and Microsoft introduced a coding LLM, CodeOcean, that beats the current SOTA open and closed LLMs on coding tasks.
Also on the open-source front, though for speech understanding and generation, Nvidia released Parakeet, a speech-to-text model that outperforms Whisper v3. Additional noteworthy developments include the unveiling of OpenVoice, a novel voice cloning technology, and Amphion, an extensive toolkit dedicated to generating audio, music, and speech.
These and many more exciting updates across novel promoting frameworks, autonomous agents, multimodal AI, and open-source repositories are part of this month’s roundup.
Let's dive in!
Overview
Industry announcements (6 entries)
Large Language Models
Open-source (9 entries)
Prompting techniques (3 entries)
Research (8 entries)
Autonomous Agents (3 entries)
Image and Video (8 entries)
Audio (5 entries)
Multimodal (5 entries)
Open-source packages (4 entries)
Recent Deep Dives
Industry announcements
Large Language Models (LLMs)
Open-source
Become a premium member to get full access to my content and $1k in free credits for leading AI tools and APIs. It’s common to expense the paid membership from your company’s learning and development education stipend.
Prompting techniques
Keep reading with a 7-day free trial
Subscribe to AI Tidbits to keep reading this post and get 7 days of free access to the full post archives.