OpenAI DevDay - a pivotal moment for AI
Making sense and sharing insights from OpenAI's announcements including a faster and cheaper GPT-4 model, a new text-to-speech API, an App Store for GPT agents, and more.
Welcome to Deep Dives - an AI Tidbits section providing editorial takes and insights to make sense of the latest in AI.
OpenAI's latest announcements have sent shockwaves through the tech world, signaling a new era for artificial intelligence. The company's ground-breaking announcements and bold moves would eliminate thousands of companies, from GPT-wrappers to deep tech ones, and pose a substantial threat to big tech incumbents.
These developments aren't just about shaking up the industry—they create a goldmine of possibilities for innovators and businesses in AI, turning previously unfeasible ideas into profitable ventures.
In this post, I'll unpack each of these pivotal announcements and outline the impact they have on those at the cutting edge of AI.
Announcements covered:
GPT-4 Turbo with a 128k context length
Substantial price reductions
New text-to-speech (TTS) model and API
Whisper v3
Assistants API and Retrieval
GPTs and the GPT Store, à la OpenAI’s App Store
Other announcements
A new model: GPT-4 Turbo having a 128k context length
GPT-4 Turbo is a cheaper and faster version of GPT-4. It has an updated knowledge cutoff of April 2023, with OpenAI stating they will keep it up-to-date. It has a 128k context window so it can fit the equivalent of more than 300 pages of text in a single prompt.
Availability
Available for all paying developers
Why it matters?
Cheaper - having GPT-4 level capabilities at a 2.75x cheaper cost enables multiple applications that up until now didn’t make sense from a margins perspective.
Faster - latency is a main consideration for LLM builders, especially for those building user-facing apps. A faster yet capable model unlocks applications for which latency is a core part of the user experience.
Longer context window - context extends the language model’s knowledge, e.g. by augmenting it with your company’s Slack data. Up until today, the GPT-4 context window was limited to 32k tokens. A 4x longer context window means more data can fit in one prompt, reducing the need for retrieval augmented generation and the frequency of hallucinations by grounding GPT’s response in your data.
Examples of affected companies
LLM API companies such as AWS Bedrock, Google PaLM 2, Anthropic (Claude Instant), Hugging Face, and AI21
Substantial price reductions
The new GPT-4 Turbo API will be 2.75x cheaper than GPT-4. The same applies to the GPT-3.5 Turbo 16k model.
Developers using the 4k context version of GPT-3.5 would also benefit from a 33% reduction.
Fine-tuned GPT-3.5 Turbo 4K model input tokens are reduced by 3x as well.
Availability
Available for all paying developers
Why it matters?
GPT-powered app builders constantly monitor the cost of serving their apps. Cheaper generations translate to better margins, and better margins enable more innovation.
Substantially cheaper fine-tuned GPT models reduce the strain on long context windows and RAG applications.
Examples of affected companies
Anthropic, AWS Bedrock, Hugging Face, Google PaLM 2, AI21 Labs
New text-to-speech (TTS) model and API
Developers can now generate human-quality speech from text via a text-to-speech API. The current TTS model offers six preset voices to choose from and two model variants, tts-1 and tts-1-hd.
tts-1 is optimized for real-time use cases and tts-1-hd is optimized for quality, i.e. more human-like speech in exchange of latency.
OpenAI’s TTS supports real-time audio streaming and pricing starts at $0.015 per input 1,000 characters. For comparison, ElevenLabs, which is considered to be the current best TTS service, starts at $0.165 per 1k characters. That’s >10x the cost.
OpenAI's preset voice Nova in action:
Availability
Available for all developers
Why it matters?
Generating voice was the missing modality in OpenAI’s ecosystem. An OpenAI-grade TTS engine, that is cheaper and faster thanks to OpenAI’s economies of scale, will enable more business use cases and increase competition in the space.
Combined with OpenAI’s new GPTs vision (see below), a future of OpenAI-powered voice assistants is imminent.
Examples of affected companies
ElevanLabs, PlayHT, Coqui, Resemble AI, and cloud providers (AWS Polly, Google Text-to-Speech, Azure TTS)
Whisper v3
Whisper is OpenAI’s cutting-edge Automatic Speech Recognition model (ASR) model. Whisper’s open-source release in Sep 2022 had a profound impact on the speech2text industry and enabled many speech-powered applications since then.
Whisper large-v3 is OpenAI’s next-generation ASR which features improved performance across languages.
Availability
Immediately via the Whisper package on GitHub. API access will arrive in the “near future”.
Why it matters?
Natural language is how humans interact with one another. Incorporating Whisper v3 into the OpenAI API ecosystem will make this technology more accessible to developers, enabling them to integrate sophisticated speech-to-text features into their applications. This move would also drive the commoditization of the currently expensive speech2text market, allowing for a broader spectrum of uses and users.
Examples of affected companies
Deepgram, Azure TTS, Google Text-to-Speech AI, Amazon Transcribe
Assistants API and Retrieval
Using the Assistants API, developers can create agent-like AI within their applications, equipped with specialized functions like Code Interpreter, Retrieval, and function calling for efficient task execution. No more fancy Retrieval Augmented Generation (RAG) pipelines. Users just need to upload files to extend GPT’s knowledge. This alone eliminates GPT wrappers like ChatPDF and the need for smaller LangChain apps for conversing over your data.
The Assistants API also features persistent threads, which are designed to allow developers to manage long-running conversations and complex tasks without the limitations of a short-term memory context, enabling more coherent and contextually aware interactions over time.
Assistants can run Python code, manage diverse data, create visual content, and tap into external knowledge sources, eliminating the need for developers to embed or search through large datasets. Developers can also define and call custom functions through the API, e.g. calculating shipping costs based on weight, dimensions, and destination provided by the customer in a chat message.
Availability
Available for all developers under a beta program
Why it matters?
The Assistants API and interface is a no-code builder for intelligence AI agents. Non-engineers can build small pieces of software powered by OpenAI’s models and then share them with others. They can even monetize those agents through OpenAI’s new GPT Store (see below).
Users can also expand their assistant’s knowledge by uploading documents such as PDFs, Excel files, etc., removing the need for fancy RAG frameworks or customized LangChain scripts. OpenAI takes care of this all!
Examples of affected companies
AutoGPT, Characther.AI, ChatPDF, LangChain, Adept, Hugging Face
GPTs and the GPT Store, à la OpenAI’s App Store
Users can now create customized ChatGPT versions, dubbed GPTs, enabling users to craft personalized AI for specific uses like learning, work, or leisure, and to share these with others.
GPTs offer task-oriented assistance, such as explaining board game rules, teaching math, or designing graphics. They require zero coding skills and come with the capability to perform web searches, create images, and analyze data.
GPTs would also have access to custom actions, connecting them to external APIs and enabling real-world interactions. This functionality can transform GPTs into versatile tools capable of interacting with databases, managing emails, or assisting with shopping. Building on the Plugins beta experience, the update gives developers more control and simplifies the transition for those with existing plugins, allowing them to seamlessly integrate these capabilities into their GPTs.
Lastly, OpenAI enables users to create and share custom GPTs, with a monetizable GPT Store launching soon.
Availability
Immediately for ChatGPT Plus and Enterprise users
Why it matters?
Customizable GPTs mark a pivotal shift towards more personalized and specific AI utility. Those without coding expertise will be able to craft AI tools for a range of tasks, thereby broadening the technology's accessibility and application.
For developers, it facilitates the integration of AI with other services, fostering more dynamic and practical uses in real-world scenarios.
For enterprises, custom GPTs offer a new frontier in customization, enabling the creation of AI tailored to specific corporate needs and proprietary data. Companies can now streamline operations, creating AI solutions for internal tasks like marketing, customer support, and employee onboarding while ensuring data privacy. Such integration makes many startups that raised lofty funding rounds to monetize generative AI for the enterprise almost obsolete.
For consumers, having specialized GPTs erodes the unique selling proposition of companies such as Character AI, which just last month saw almost 5M monthly active users.
Examples of affected companies
AutoGPT, Adept, Character AI, LangChain, Contextual AI, Hugging Face
Other notable announcements
DALL-E 3 API access, with pricing starting at $0.04 per image
Custom models - a new program from OpenAI that offers selected organizations the chance to collaborate with researchers to create bespoke GPT-4 models that are highly specialized to their domains, ensuring exclusive access and privacy for their proprietary data
GPT-4 fine-tuning
Function calling is now more efficient and accurate, with updates that enable the calling of multiple functions in a single message and enhancements that increase the likelihood of returning the correct function parameters.
Higher rate limits - doubling the current tokens per minute limit and allowing users to request rate increases.
JSON mode - ensuring GPT returns valid JSON outputs
A new seed parameter for consistent completions, facilitating debugging, comprehensive unit testing, and enhanced control over model behavior.
Copyright Shield - OpenAI will step in to defend and pay the costs incurred of any legal claims around copyright infringement for its users.
A new era
It is clear that the horizon for AI is not just broadening—it's being redefined.
With the advent of more efficient, cost-effective, and powerful models like GPT-4 Turbo, new text-to-speech capabilities, and an innovative GPT Store, OpenAI is empowering creators and businesses with tools that were once out of reach.
The implications are profound: barriers to entry are crumbling, enabling a democratization of technology that accelerates innovation at a breakneck pace. The sheer volume of possibilities for personalization, integration, and expansion in AI applications is staggering.
As developers, entrepreneurs, and technologists harness these breakthroughs, the next wave of AI utility and business models is upon us. To all those at the forefront of this change: the future is not just knocking, it has already stepped through the door.
cool ..., OpenAI just dropped the AI mic with these announcements! ,I feel like I've just witnessed the Avengers assembling, but in the tech universe. Let's break down this buffet of innovation:
GPT-4 Turbo with a 128k context length - It's like giving your AI superhero a turbo boost, making it cheaper, faster, and with a memory that can rival an elephant's. Who needs a sidekick when you have a context window longer than my weekend to-do list?
Substantial price reductions - Cheaper AI? That's music to every developer's ears! It's like having a Black Friday sale, but for coding. Sorry, budget constraints, but OpenAI just threw you out the window!
New text-to-speech model and API - OpenAI is now giving voices to the voiceless... well, the textless? And at a fraction of the cost compared to ElevenLabs? It's like getting a high-quality concert ticket for the price of a microwave burrito.
Whisper v3 - Automatic Speech Recognition on steroids! It's like upgrading from a walkie-talkie to a satellite phone. Goodbye, language barriers, hello global communication domination!
Assistants API and Retrieval - No more fancy RAG pipelines, just upload files and let the AI magic happen. It's like having a personal assistant that doesn't ask for coffee breaks or a raise. Take that, LangChain!
GPTs and the GPT Store - The birth of ChatGPT's customizable siblings, and they're monetizable? It's like the App Store, but for your personalized AI sidekicks. Move over, Siri, there's a new GPT in town!
Other announcements - DALL-E 3 API, custom models, GPT-4 fine-tuning, higher rate limits, JSON mode, a seed parameter, and a Copyright Shield! It's like OpenAI just opened Pandora's box of awesomeness. Can I get a "Hallelujah" for the Copyright Shield?
In the grand scheme of things, OpenAI isn't just knocking on the door of a new era; it's kicked it wide open! The future of AI isn't coming—it's already here, and it's wearing a cape.
Thanks for the summary! It's easy elsewhere to get distracted by lot's of marketing hype