<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[AI Tidbits: Sahar's 2¢]]></title><description><![CDATA[Editorial takes, insights, and tools to stay ahead on the latest in AI]]></description><link>https://www.aitidbits.ai/s/deep-dives</link><image><url>https://substackcdn.com/image/fetch/$s_!-amS!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F71d6ea06-1f4c-478d-b0f2-6227eede6b25_1280x1280.png</url><title>AI Tidbits: Sahar&apos;s 2¢</title><link>https://www.aitidbits.ai/s/deep-dives</link></image><generator>Substack</generator><lastBuildDate>Wed, 20 May 2026 21:42:37 GMT</lastBuildDate><atom:link href="https://www.aitidbits.ai/feed" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><webMaster><![CDATA[aitidbits@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[aitidbits@substack.com]]></itunes:email><itunes:name><![CDATA[Sahar Mor]]></itunes:name></itunes:owner><itunes:author><![CDATA[Sahar Mor]]></itunes:author><googleplay:owner><![CDATA[aitidbits@substack.com]]></googleplay:owner><googleplay:email><![CDATA[aitidbits@substack.com]]></googleplay:email><googleplay:author><![CDATA[Sahar Mor]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Google I/O '25 - Research to reality]]></title><description><![CDATA[How Google is finally taking the lead on AI]]></description><link>https://www.aitidbits.ai/p/google-io-25</link><guid isPermaLink="false">https://www.aitidbits.ai/p/google-io-25</guid><dc:creator><![CDATA[Sahar Mor]]></dc:creator><pubDate>Fri, 23 May 2025 14:31:13 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!X7GG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F956c0295-b4fc-40d0-b6fb-26b04a4ec154_1718x962.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>This post is part of my 2&#162; series - my raw thoughts about recent topics in AI. Not always practical thoughts, but always thought-provoking. Some of my previous ones covered the <a href="https://www.aitidbits.ai/p/when-machines-learn-to-speak">new wave of conversational AI</a>, <a href="https://www.aitidbits.ai/p/economies-of-scale-gen-ai">economies of scale for foundation AI models</a>, and the <a href="https://www.aitidbits.ai/p/the-great-ai-consolidation">consolidation in the AI space</a>.</em></p><p><em>This post captures my takeaways from attending Google&#8217;s flagship event, I/O 2025. It&#8217;s not a comprehensive announcement round-up. Instead, I&#8217;ve focused on the launches that matter most to anyone building or working with AI. I also share my perspective on what these moves mean for the broader AI ecosystem and founders, developers, and researchers alike.</em></p><div><hr></div><p>A NotebookLM-powered podcast episode discussing this post:</p><div class="native-audio-embed" data-component-name="AudioPlaceholder" data-attrs="{&quot;label&quot;:null,&quot;mediaUploadId&quot;:&quot;a5491c71-b429-4d3c-9fc1-36b0eb70e9b7&quot;,&quot;duration&quot;:1007.88245,&quot;downloadable&quot;:false,&quot;isEditorNode&quot;:true}"></div><div><hr></div><p>Since 2017, when Google unveiled the groundbreaking <a href="https://www.youtube.com/watch?v=D5VN56jQMWM">Duplex demo</a> at its biggest event of the year, Google I/O, I've been captivated by the company's AI advancements. For me, it was the first truly practical, consumer-facing use of AI&#8212;a clear example of how AI could take over routine tasks like booking appointments. But more importantly, it marked a key step toward a future where AI helps people express themselves in ways that were previously out of reach.</p><p>In recent years, the AI community has often viewed Google as trailing behind leaders like OpenAI and Anthropic. However, this year's Google I/O conference felt different&#8212;everything finally clicked. Google moved from research to reality, capitalizing on its massive distribution channels and deep technological prowess. The perfect combination of state-of-the-art technology with access to real-world usage through Search, Google Workspace (Gmail, Sheets, Docs, etc.), and Android (smart TVs, glasses, phones).</p><p>And it wasn&#8217;t only me. The same sentiment echoed across the press tent at I/O last Tuesday, capturing an energy reminiscent of OpenAI&#8217;s <a href="https://www.aitidbits.ai/p/openai-devday">inaugural DevDay</a>.</p><p>The winning combination, as defined by Google in this week&#8217;s I/O, manifests across three principles:</p><ul><li><p>Powerful - deploying best-in-class models to support real-time, reliable experiences</p></li><li><p>Personalize - tailoring AI to understand and cater to individual user preferences and needs</p></li><li><p>Proactive - developing AI that anticipates user needs and acts accordingly without being too intrusive or eager</p></li></ul><p>Out of these three, the one I found the most promising is <em>Personalize</em>.</p><p>Google's unparalleled access to user data gives it a powerful edge over competitors like OpenAI and Apple. It understands my interests through the searches I make (Search), the places I go (Maps), the music I listen to (YouTube), my payment habits (Google Pay), and even my work life (Gmail, Calendar, Docs). This breadth of insight uniquely positions Google to deliver truly personalized AI experiences.</p><p>Google didn&#8217;t just launch new products at I/O, it made deliberate moves into markets long held by OpenAI, Meta, Perplexity, Anthropic, and even Shopify and Stripe. Each announcement, from Jules to Gemini Live, stepped directly into competitive territory. If you&#8217;re working on dev tools, agent platforms, creative apps, e-commerce flows, or voice interfaces, these updates are worth reading. I&#8217;ve included a breakdown of the most directly affected companies and industries at the end of this post&#8212;worth reviewing if you want to stay ahead of what&#8217;s coming.</p><p>The real story isn&#8217;t about how many features Google shipped, though. It&#8217;s about the strategy taking shape. Google is doubling down on vertical integration and deeply contextual AI. That&#8217;s the new game. In Ben Thompson (Stratechery) terms, it&#8217;s Aggregation Theory with agency. Google owns the user interface, the distribution (Android, Chrome, Search), and now, increasingly, the intelligence layer.</p><p>In this post, I'll outline a selected subset of announcements I found most promising and share my <em>2&#162;</em> on why this event marks a turning point in AI's evolution.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!X7GG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F956c0295-b4fc-40d0-b6fb-26b04a4ec154_1718x962.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!X7GG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F956c0295-b4fc-40d0-b6fb-26b04a4ec154_1718x962.png 424w, https://substackcdn.com/image/fetch/$s_!X7GG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F956c0295-b4fc-40d0-b6fb-26b04a4ec154_1718x962.png 848w, https://substackcdn.com/image/fetch/$s_!X7GG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F956c0295-b4fc-40d0-b6fb-26b04a4ec154_1718x962.png 1272w, https://substackcdn.com/image/fetch/$s_!X7GG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F956c0295-b4fc-40d0-b6fb-26b04a4ec154_1718x962.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!X7GG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F956c0295-b4fc-40d0-b6fb-26b04a4ec154_1718x962.png" width="1456" height="815" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/956c0295-b4fc-40d0-b6fb-26b04a4ec154_1718x962.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:815,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1299858,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.aitidbits.ai/i/164053881?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F956c0295-b4fc-40d0-b6fb-26b04a4ec154_1718x962.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!X7GG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F956c0295-b4fc-40d0-b6fb-26b04a4ec154_1718x962.png 424w, https://substackcdn.com/image/fetch/$s_!X7GG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F956c0295-b4fc-40d0-b6fb-26b04a4ec154_1718x962.png 848w, https://substackcdn.com/image/fetch/$s_!X7GG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F956c0295-b4fc-40d0-b6fb-26b04a4ec154_1718x962.png 1272w, https://substackcdn.com/image/fetch/$s_!X7GG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F956c0295-b4fc-40d0-b6fb-26b04a4ec154_1718x962.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2><strong>Google AI Studio, Jules, and Stitch</strong></h2><p>Perhaps one of the most significant announcements at Google I/O was unveiling the upgraded Google AI Studio, with a whole new Build section&#8212;an integrated development environment explicitly designed for building AI-driven applications.</p><p>Positioned directly against IDEs like Cursor, Windsurf, Lovable, and Bolt, Google <strong>AI Studio</strong> unifies Google's flagship multimodal Gemini models into one streamlined interface. Developers now have the ability to build and deploy their creations using natural language and with a single click to Google Cloud, reinforcing Google's strategic advantage through infrastructure integration.</p><p><strong>Jules</strong>, a particularly intriguing release, is Google's take on the autonomous coding agent, similar to the likes of Devin and Factory. Quietly entering public beta at <a href="https://jules.google/">jules.google</a>, Jules represents Google's ambitions to dominate the software development lifecycle: from writing documentation and deploying applications to autonomously submitting pull requests. Though overshadowed by flashier announcements, Jules may well emerge as a sleeper hit among developers seeking highly efficient, AI-augmented development workflows.</p><p><strong><a href="http://labs.google/stitch">Stitch</a></strong>, another groundbreaking tool revealed at I/O, could radically simplify UI design processes. Through natural language prompts, designers can describe interfaces, which Stitch then generates and exports directly into Figma.</p><p>Together, Google AI Studio, Jules, and Stitch exemplify Google's strategy of leveraging its state-of-the-art models and infrastructure to deliver highly integrated, practical, and transformative tools for developers and designers alike.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!tL5C!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88a842c6-6de4-4917-a7c5-7025551b03a7_800x450.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!tL5C!,w_424,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88a842c6-6de4-4917-a7c5-7025551b03a7_800x450.gif 424w, https://substackcdn.com/image/fetch/$s_!tL5C!,w_848,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88a842c6-6de4-4917-a7c5-7025551b03a7_800x450.gif 848w, https://substackcdn.com/image/fetch/$s_!tL5C!,w_1272,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88a842c6-6de4-4917-a7c5-7025551b03a7_800x450.gif 1272w, https://substackcdn.com/image/fetch/$s_!tL5C!,w_1456,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88a842c6-6de4-4917-a7c5-7025551b03a7_800x450.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!tL5C!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88a842c6-6de4-4917-a7c5-7025551b03a7_800x450.gif" width="800" height="450" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/88a842c6-6de4-4917-a7c5-7025551b03a7_800x450.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:450,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;temp.mov [optimize output image]&quot;,&quot;title&quot;:&quot;temp.mov [optimize output image]&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="temp.mov [optimize output image]" title="temp.mov [optimize output image]" srcset="https://substackcdn.com/image/fetch/$s_!tL5C!,w_424,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88a842c6-6de4-4917-a7c5-7025551b03a7_800x450.gif 424w, https://substackcdn.com/image/fetch/$s_!tL5C!,w_848,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88a842c6-6de4-4917-a7c5-7025551b03a7_800x450.gif 848w, https://substackcdn.com/image/fetch/$s_!tL5C!,w_1272,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88a842c6-6de4-4917-a7c5-7025551b03a7_800x450.gif 1272w, https://substackcdn.com/image/fetch/$s_!tL5C!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88a842c6-6de4-4917-a7c5-7025551b03a7_800x450.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Julez, Google&#8217;s new coding agent, in action</figcaption></figure></div><h2><strong>Powerful models</strong></h2><p>Gemini 2.5 took center stage at I/O, outperforming nearly every major AI benchmark: from coding and web development to complex reasoning and video understanding. Compared to leading commercial models, it stands out with a January 2025 knowledge cutoff, a 1 million-token context window, and operates at around a quarter of the cost of OpenAI&#8217;s GPT-4o.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Jcpq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15ecf1fb-d1ab-4d0a-a09f-cb0ceb0155b4_2430x1092.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Jcpq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15ecf1fb-d1ab-4d0a-a09f-cb0ceb0155b4_2430x1092.png 424w, https://substackcdn.com/image/fetch/$s_!Jcpq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15ecf1fb-d1ab-4d0a-a09f-cb0ceb0155b4_2430x1092.png 848w, https://substackcdn.com/image/fetch/$s_!Jcpq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15ecf1fb-d1ab-4d0a-a09f-cb0ceb0155b4_2430x1092.png 1272w, https://substackcdn.com/image/fetch/$s_!Jcpq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15ecf1fb-d1ab-4d0a-a09f-cb0ceb0155b4_2430x1092.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Jcpq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15ecf1fb-d1ab-4d0a-a09f-cb0ceb0155b4_2430x1092.png" width="1456" height="654" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/15ecf1fb-d1ab-4d0a-a09f-cb0ceb0155b4_2430x1092.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:654,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:256077,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.aitidbits.ai/i/164053881?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15ecf1fb-d1ab-4d0a-a09f-cb0ceb0155b4_2430x1092.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!Jcpq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15ecf1fb-d1ab-4d0a-a09f-cb0ceb0155b4_2430x1092.png 424w, https://substackcdn.com/image/fetch/$s_!Jcpq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15ecf1fb-d1ab-4d0a-a09f-cb0ceb0155b4_2430x1092.png 848w, https://substackcdn.com/image/fetch/$s_!Jcpq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15ecf1fb-d1ab-4d0a-a09f-cb0ceb0155b4_2430x1092.png 1272w, https://substackcdn.com/image/fetch/$s_!Jcpq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15ecf1fb-d1ab-4d0a-a09f-cb0ceb0155b4_2430x1092.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Gemini 2.5 leads the leaderboard for <a href="https://web.lmarena.ai/leaderboard">web coding tasks</a></figcaption></figure></div><p>Key improvements include:</p><ul><li><p><strong>Deep Think</strong> - an advanced reasoning capability, achieving state-of-the-art results in complex mathematical and programming tasks in exchange to increased cost and latency.</p></li><li><p><strong>Enhanced function calling and Structured Outputs</strong> - until now, the real-time Gemini models haven&#8217;t been usable for anyone needing function calling or structured output. Now, it&#8217;s finally fixed.</p></li><li><p><strong>Gemini Diffusion</strong> - Google unveiled Gemini Diffusion, generating text 5x faster than the leading Flash Lite model. This advancement is powered by recent research utilizing diffusion models for text generation, marking a significant leap forward in efficiency and responsiveness.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iYIK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bb7bf3a-34f1-4e0d-a245-1bcd2dffaa4b_800x450.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iYIK!,w_424,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bb7bf3a-34f1-4e0d-a245-1bcd2dffaa4b_800x450.gif 424w, https://substackcdn.com/image/fetch/$s_!iYIK!,w_848,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bb7bf3a-34f1-4e0d-a245-1bcd2dffaa4b_800x450.gif 848w, https://substackcdn.com/image/fetch/$s_!iYIK!,w_1272,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bb7bf3a-34f1-4e0d-a245-1bcd2dffaa4b_800x450.gif 1272w, https://substackcdn.com/image/fetch/$s_!iYIK!,w_1456,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bb7bf3a-34f1-4e0d-a245-1bcd2dffaa4b_800x450.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iYIK!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bb7bf3a-34f1-4e0d-a245-1bcd2dffaa4b_800x450.gif" width="800" height="450" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0bb7bf3a-34f1-4e0d-a245-1bcd2dffaa4b_800x450.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:450,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;temp.mov [optimize output image]&quot;,&quot;title&quot;:&quot;temp.mov [optimize output image]&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="temp.mov [optimize output image]" title="temp.mov [optimize output image]" srcset="https://substackcdn.com/image/fetch/$s_!iYIK!,w_424,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bb7bf3a-34f1-4e0d-a245-1bcd2dffaa4b_800x450.gif 424w, https://substackcdn.com/image/fetch/$s_!iYIK!,w_848,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bb7bf3a-34f1-4e0d-a245-1bcd2dffaa4b_800x450.gif 848w, https://substackcdn.com/image/fetch/$s_!iYIK!,w_1272,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bb7bf3a-34f1-4e0d-a245-1bcd2dffaa4b_800x450.gif 1272w, https://substackcdn.com/image/fetch/$s_!iYIK!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bb7bf3a-34f1-4e0d-a245-1bcd2dffaa4b_800x450.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Gemini Diffusion starts with text that is "pure noise", then gradually transforms the random input into coherent and contextually accurate natural language aligned with the desired prompt</figcaption></figure></div><h2><strong>Search &amp; AI Mode</strong></h2><p>Google has been experimenting with a new way of search over the last few weeks, dubbed &#8220;AI Mode&#8221;. This new mode just got generally available in the US last Tuesday. Powered by the Gemini 2.5 Pro model, AI Mode allows users to engage in multi-turn dialogues, enabling more complex and nuanced information retrieval.</p><p>Highlights from the new search experience:</p><ul><li><p><strong>Personal Context</strong> - for an even more customized experience, AI Mode will offer personalized suggestions based on your connected Google apps, starting with Gmail, to bring in more of your personal context. For example, if you&#8217;re searching for &#8220;things to do in Nashville this weekend with friends, we're big foodies who like music&#8221; ahead of an upcoming trip, AI Mode can show you restaurants with outdoor seating based on your past restaurant bookings and searches.</p></li><li><p><strong>Agentic Checkout - </strong>streamlining the purchasing process by allowing users to complete transactions directly within Search, bypassing the need to navigate to third-party websites. For example, when searching for concert tickets, AI Mode will find the best options and facilitate the purchase through Google Pay, all within the same interface. This seamless integration has the potential to disrupt traditional e-commerce models and reshape how users interact with online marketplaces. <em>I wrote <a href="https://www.aitidbits.ai/p/agent-responsive-design">a whole series</a> on the new agentic internet!</em></p></li><li><p><strong>Try It On - </strong>enhancing the virtual shopping experience, Google's "Try It On" feature utilizes Google&#8217;s strong image generation diffusion models to allow users to visualize clothing items on themselves. Users can upload their picture using Google Photos and see how different garments would look on their own bodies. Google&#8217;s generative AI capabilities meet distribution (Google Photos).</p></li><li><p><strong>Deep Search - </strong>by synthesizing information from multiple sources, AI Mode can provide comprehensive answers to multifaceted questions, making Google Search relevant again in the face of competing tools such as OpenAI&#8217;s and Perplexity&#8217;s Deep Research.</p></li></ul><h2><strong>The revenue gamble</strong></h2><p>While Google's AI Mode represents a significant leap forward in search capabilities, it also reveals a fundamental tension at the heart of the company's strategy. Google is essentially betting against its own golden goose: the advertising-driven search model that has generated over 50% of its revenues for over two decades.</p><p>The math is straightforward but concerning: if AI Mode provides comprehensive answers directly within search results, users will click through to fewer websites. Independent studies already suggest this trend with AI Overviews, and AI Mode's conversational interface offers even fewer opportunities for traditional paid link placements. Google's executives at I/O spoke confidently about the technical capabilities of their new search experience, but when it came to discussing how this translates into sustainable revenue streams, the answers were notably vague.</p><p>This isn't just a minor product pivot, it's a fundamental reimagining of how Google makes money. The company appears to be racing toward a future where AI assistants and conversational interfaces replace link-based search, and while there are certainly ways to imagine business models around personalized AI assistants and agentic workflows, Google hasn't articulated what those might look like or how they'll replace the massive cash flows from traditional search advertising.</p><h2><strong>Project Mariner</strong></h2><p>Project Mariner is Google's step toward giving AI true agency across your devices. It&#8217;s their answer to OpenAI's Operator and Anthropic's Computer Use. An infrastructure-level system for teaching AI to interact with your digital environment just like a human would.</p><p>At its core, Mariner is about <em>"teach and repeat"</em>. Show Gemini how to perform a task: filling out a form, generating a weekly status report, uploading data to a dashboard, and it can replicate that workflow again and again.</p><p>Mariner will be released as part of the Gemini API later this summer, which means developers can build agents that don&#8217;t just reason and plan, but <em>act</em>: navigating apps, automating browser actions, and manipulating on-screen interfaces.</p><p>Whether it&#8217;s booking a flight, copying events into a spreadsheet, or handling repetitive workflows across company tools, Mariner helps AI move beyond suggestions and into action.</p><h2><strong>Gemini app and Gemini Live</strong></h2><p>With the new Gemini app and its Live feature, Google is officially entering the race for the "everything AI assistant&#8221;, a direct challenger to ChatGPT, Meta AI, and Apple Intelligence.</p><p>The Gemini app is no longer just a chatbot. It&#8217;s a real-time, context-aware assistant that lives across your devices and ties directly into Google&#8217;s ecosystem: Gmail, Calendar, Keep, Docs, Maps, and even YouTube. Thanks to its tight OS-level integration (powered by Project Mariner), Gemini can also take actions on your phone.</p><p>But what really sets Gemini apart isn&#8217;t just input, it&#8217;s output:</p><ul><li><p><strong>Search Live and Project Astra - </strong>building on the capabilities of AI Mode, Google introduced Search Live, a feature that combines real-time camera input with search functionality. Users can point their device's camera at an object or scene and receive immediate information (similar to OpenAI&#8217;s Advanced Voice Mode), effectively turning their environment into an interactive search field. This feature is powered by Project Astra, Google's multimodal AI assistant that integrates visual and auditory data to provide contextually relevant responses.</p></li><li><p><strong>Canvas</strong> is Google&#8217;s answer to tools like OpenAI&#8217;s Canvas and Anthropic&#8217;s Artifacts. Ask Gemini to summarize an article and it will build an interactive webpage, infographic, quiz, or even a lightweight app.</p></li><li><p><strong>Deep Research</strong> now supports uploaded personal files, synthesizing them into study guides, plans, or insights, connecting directly to your Drive and Gmail, offering context-rich reasoning grounded in your data.</p></li><li><p><strong>Agent Mode</strong> enables task automation across Gmail, Calendar, and partner services like Zillow. Unlike a basic plugin system, this builds on Mariner&#8217;s deeper Android-level control and Google's new MCP support, enabling multi-step reasoning and actions.</p></li><li><p><strong>Quiz and Video Generation</strong> taps into Veo (text2video) and Lyria (music generation model), turning documents into test prep material and short videos.</p></li></ul><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;2e467f75-2166-4e7a-b9ad-419556273e7f&quot;,&quot;duration&quot;:null}"></div><h2><strong>Generative models for creatives</strong></h2><p>Google&#8217;s generative media stack is finally starting to feel competitive.</p><ul><li><p><strong>Veo 3</strong> is their new text-to-video model - high-quality, photorealistic footage, now with native audio generation. Think Pika or Runway, but with better motion, longer clips, and built-in sound.</p></li><li><p><strong>Image 4</strong>&nbsp;improved with sharper details, better text rendering, and is now integrated into Gemini.</p></li><li><p><strong>Lyria 2</strong> is Google&#8217;s music generation model. Based on the demo, Lyria is still in its infancy and far from the quality of Suno and Udio.</p></li><li><p><strong>Flow</strong> is a new AI-powered video editor. Type a prompt, get an 8-second clip. Stitch clips together, tweak scenes with natural language. It&#8217;s Google&#8217;s answer to creative environments like Adobe Premiere, but for AI-native workflows.</p></li></ul><p>Taken together, this is Google&#8217;s most serious push yet into generative video, music, and imagery, accessible via Google AI Studio and the Gemini API.</p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;0d8b4881-a4de-4e28-87de-13c3d59c4c44&quot;,&quot;duration&quot;:null}"></div><h2><strong>Google AI Glasses</strong></h2><p>Twelve years after the original Google Glass flop, Google&#8217;s trying again, and this time, it looks promising.</p><p>Google unveiled a new pair of smart glasses powered by Android XR and deeply integrated with the Gemini model family. They come equipped with microphones, speakers, a camera, and an in-lens display, offering a level of interactivity that goes beyond Meta&#8217;s Ray-Ban, which don&#8217;t have a display. Google is going a step further: your real world now comes with real-time captions, directions, translations, and a personal assistant whispering relevant information.</p><p>And that&#8217;s the key difference: Google has the phone and app distribution. Meta and OpenAI with its ChatGPT consumer app do not. That means Google can natively integrate with Gmail, Calendar, Maps, Docs, Translate, and YouTube&#8212;capabilities that come pre-installed on Android and are used by billions. Need to translate a live conversation? Snap a photo and auto-organize it? Navigate to a meeting while rescheduling the next one? All of that is now on your face.</p><p>To get there, Google partners with Gentle Monster and Warby Parker for manufacturing, echoing the Meta + Ray-Ban strategy.</p><p>If you&#8217;re thinking this sounds like something Ben Thompson would write a thousand-word piece about, you're not wrong. This is exactly the kind of vertical integration that makes Apple and others sweat: powerful native models, fused with real-time inputs (voice, vision), and paired with a ubiquitous OS.</p><p>The world was not ready for wearable AI in 2013. But in 2025, with AI-native operating systems and mainstream model adoption, and after Meta has proven market traction, Google may have found the perfect moment for a comeback.</p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;4603c5fd-9e9b-4738-9c2e-440022295229&quot;,&quot;duration&quot;:null}"></div><h2><br>Industry impact</h2><p>So what does all this mean if you're not Google? Below is a breakdown of the major announcements from I/O and the companies most likely to feel the heat.</p>
      <p>
          <a href="https://www.aitidbits.ai/p/google-io-25">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[When machines learn to speak]]></title><description><![CDATA[One API call from human-like AI conversation: the profound shift from typing to talking and what it means for human interaction]]></description><link>https://www.aitidbits.ai/p/when-machines-learn-to-speak</link><guid isPermaLink="false">https://www.aitidbits.ai/p/when-machines-learn-to-speak</guid><dc:creator><![CDATA[Sahar Mor]]></dc:creator><pubDate>Sun, 30 Mar 2025 15:01:07 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/a1afcdf8-4b1e-46a7-831a-c6e57fa9f24f_800x450.gif" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>This post is part of my 2&#162; series - my raw thoughts about recent topics in AI. Not always practical thoughts, but always thought-provoking. Some of my previous ones covered the <a href="https://www.aitidbits.ai/p/economies-of-scale-gen-ai">economies of scale for foundation AI models</a>, <a href="https://www.aitidbits.ai/p/the-great-ai-consolidation">consolidation in the AI space</a>, and <a href="https://www.aitidbits.ai/p/the-rise-of-autonomous-agents">autonomous agents</a>.</em></p><p><em>This post is about the unprecedented shift happening in voice AI interfaces and what it means for human interaction. As these new capabilities become accessible through simple APIs, a massive opportunity is emerging for founders to build products that reimagine how we communicate with technology and each other.</em></p><div><hr></div><p>A NotebookLM-powered podcast episode discussing this post:</p><div class="native-audio-embed" data-component-name="AudioPlaceholder" data-attrs="{&quot;label&quot;:null,&quot;mediaUploadId&quot;:&quot;ea4da737-7bce-4579-98df-b6214c8cab6f&quot;,&quot;duration&quot;:610.6906,&quot;downloadable&quot;:false,&quot;isEditorNode&quot;:true}"></div><div><hr></div><p>June 2025. Sarah paces in her living room, rehearsing an important client presentation. Her AI companion listens intently, chiming in when relevant to offer real-time feedback on her delivery and content. "I think you rushed through the ROI section," it suggests in a warm, natural voice. "Let's try that part again, but this time&#8212;" Sarah cuts in mid-sentence, "Actually, can we focus on the opening first? And don't be so nitpicky!" The AI smoothly adjusts, without awkward pauses or robotic transitions. What was once a frustrating experience of rigid, unnatural interactions with voice assistants has evolved into fluid, human-like conversation.</p><p>I've spent considerable time lately thinking about and building in the voice AI space, and something unprecedented is emerging: for the first time in history, we have real-time, affordable, and competent artificial voice that's just one API call away. In just a few months, we've seen significant leaps forward from the likes of OpenAI&#8217;s Advanced Voice Mode (AVM) and new <a href="https://www.openai.fm/">speech models</a>, Google&#8217;s real-time conversational Gemini Flash, and Sesame&#8217;s emotionally intelligent AI<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a>.</p><p>This isn't just a technical milestone&#8212;it's a fundamental shift in how we interact with technology and, potentially, with each other. It will create numerous new opportunities for builders while redefining the very nature of human communication.</p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;d8960ba2-475e-4fe4-ae6a-fc93c1e1b86d&quot;,&quot;duration&quot;:null}"></div><p>Gavin Purcell is <a href="https://www.reddit.com/r/singularity/comments/1j1yern/roleplay_with_sesames_new_voice_ai_feels_like_the/">arguing with Sesame&#8217;s realtime voice AI</a> &#128070;</p><h2><strong>The dawn of natural voice AI</strong></h2><p>Remember the last time you called your bank's automated system? The familiar dance of repeated phrases, misunderstood words, and the desperate pressing of "0" to reach a human operator. That era is ending. OpenAI's release of Advanced Voice Mode (AVM) last September marked a pivotal moment when conversing with AI began to feel genuinely human.</p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;e668c6d8-4d18-47ef-a59b-44e2f23a0e05&quot;,&quot;duration&quot;:null}"></div><p>This transformation stems from two key breakthroughs. First, the shift from cascading architectures (speech-to-text &#8594; text processing &#8594; text-to-speech) to direct speech-to-speech models eliminates intermediate processing stages that previously slowed conversational AI interactions. Second, the dramatic reduction in latency and cost. When OpenAI initially released its Realtime API, the price made it impractical for widespread adoption (18$/hour). But just four months later, Google's release of Gemini Flash 2.0 and OpenAI's 60% price reduction opened the floodgates for affordable and human-like voice AI applications that are one API call away.</p><p>Just last week, OpenAI unveiled its most human-like speech models yet, enabling developers to embed expressive cues like [WHISPERING] or [LAUGHING] directly into the text. Here's a quick demo from <a href="https://www.openai.fm/">OpenAI.fm</a>&#8212;a public tool launched alongside this release, showcasing what this new level of expressiveness sounds like in action:</p><div class="native-audio-embed" data-component-name="AudioPlaceholder" data-attrs="{&quot;label&quot;:null,&quot;mediaUploadId&quot;:&quot;e45f8f81-7949-4739-8d4e-b4cd7104a2ee&quot;,&quot;duration&quot;:28.029388,&quot;downloadable&quot;:false,&quot;isEditorNode&quot;:true}"></div><p>Builders can now launch phone assistants that <a href="http://11x.ai/">qualify sales leads</a>, <a href="https://sierra.ai/blog/sierra-speaks">resolve customer support calls</a>, <a href="https://domu-ai.com/">automate insurance sales</a>, or <a href="https://www.helpcare.ai/">screen patients before their upcoming appointments</a>. The necessary tools are already available and are just a <a href="https://www.aitidbits.ai/p/voice-agents-toolkit">single API call away</a>.</p><h2><strong>The interruption problem</strong></h2><p>However, building truly natural voice interactions isn't just about faster processing and better voice synthesis. One of the most fascinating challenges lies in handling interruptions&#8212;a fundamental aspect of human conversation that AI still struggles with.</p><p>Current voice AI systems, including the ones mentioned like OpenAI&#8217;s AVM, face several key challenges:</p><ol><li><p>Oversensitivity to background noise (I always mute myself when not speaking)</p></li><li><p>Inability to distinguish between relevant speakers and ambient conversation</p></li><li><p>Lack of visual cues that humans use to anticipate and manage interruptions</p></li></ol><p>Unlike human phone conversations, where near-zero latency and natural turn-taking make interruptions manageable, AI interactions often feel clunky when users try to interject<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a>. Interestingly, humans tend to interrupt AI more frequently and aggressively than they would other humans, creating a new challenge for voice AI developers while creating a new interaction paradigm for human-AI conversation.</p><h2><strong>The social impact</strong></h2><p>This voice revolution raises profound questions about human interaction and relationships:</p><ul><li><p>Could the instant gratification of interruptible AI conversations and the ability to be rude without consequences degrade our patience and interpersonal skills, similar to how ubiquitous access to pornography has distorted societal expectations around intimacy?</p></li><li><p>The convenience of always-available AI consultation might reduce our reliance on human relationships. Consider how we once relied on reading maps and asking locals for directions&#8212;skills now largely abandoned as we defer to GPS. Could meaningful conversations be next?</p></li><li><p>Could we soon have more conversational exchanges with AI agents than with human companions?</p></li></ul><p>Think: Would you rather rehearse a high-stakes presentation in front of a potentially judgmental friend or instantly consult a non-judgmental AI companion available 24/7?</p><p>What does this mean for our interpersonal relationships?</p><h2><strong>Cultural nuances in AI conversation</strong></h2><p>One size doesn't fit all in human conversation, and the same is true for AI. OpenAI's recent update of GPT-4o to GPT-4.5 was mainly about moving away from its "corporate HR" tone, recognizing that natural conversation varies significantly across cultures and contexts.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6PdD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e5ddd84-8aa7-483d-81f7-5ccc270e9b1c_1194x990.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6PdD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e5ddd84-8aa7-483d-81f7-5ccc270e9b1c_1194x990.png 424w, https://substackcdn.com/image/fetch/$s_!6PdD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e5ddd84-8aa7-483d-81f7-5ccc270e9b1c_1194x990.png 848w, https://substackcdn.com/image/fetch/$s_!6PdD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e5ddd84-8aa7-483d-81f7-5ccc270e9b1c_1194x990.png 1272w, https://substackcdn.com/image/fetch/$s_!6PdD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e5ddd84-8aa7-483d-81f7-5ccc270e9b1c_1194x990.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6PdD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e5ddd84-8aa7-483d-81f7-5ccc270e9b1c_1194x990.png" width="600" height="497.48743718592965" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2e5ddd84-8aa7-483d-81f7-5ccc270e9b1c_1194x990.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:990,&quot;width&quot;:1194,&quot;resizeWidth&quot;:600,&quot;bytes&quot;:217360,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.aitidbits.ai/i/159103330?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e5ddd84-8aa7-483d-81f7-5ccc270e9b1c_1194x990.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6PdD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e5ddd84-8aa7-483d-81f7-5ccc270e9b1c_1194x990.png 424w, https://substackcdn.com/image/fetch/$s_!6PdD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e5ddd84-8aa7-483d-81f7-5ccc270e9b1c_1194x990.png 848w, https://substackcdn.com/image/fetch/$s_!6PdD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e5ddd84-8aa7-483d-81f7-5ccc270e9b1c_1194x990.png 1272w, https://substackcdn.com/image/fetch/$s_!6PdD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e5ddd84-8aa7-483d-81f7-5ccc270e9b1c_1194x990.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Ex-OpenAI/Tesla, Andrej Karpathy, outlining GPT-4.5&#8217;s new personality</em></figcaption></figure></div><p>Different cultures have distinct interruption patterns, politeness norms, and conversation styles. Today's systems largely fail to account for these cultural differences, creating a significant opportunity for AI builders to develop models that adapt to:</p><ul><li><p>Cultural background</p></li><li><p>Individual user patterns</p></li><li><p>Contextual cues</p></li><li><p>Historical interactions</p></li></ul><p>OpenAI already possesses such context through its Memory feature, and Google, of course, knows virtually everything about us already.</p><p>I imagine the best conversational AI systems of the future will incorporate nuances that we take for granted.</p><h2><strong>Rethinking communication</strong></h2><p>The holy grail for conversational AI might be achieving the natural flow of a phone call between humans, where interruptions feel natural and turn-taking is seamless. But perhaps we need to aim higher. As AI systems gain multimodal capabilities (vision, touch, etc.), they could potentially surpass human conversation by reading subtle cues we often miss.</p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;f19c2f1e-d9be-44e3-b323-81d868a5d58a&quot;,&quot;duration&quot;:null}"></div><p>Figure's household robots <a href="https://x.com/Figure_robot/status/1892577871366939087">learn tasks</a> on the fly &#128070;</p><p><br>What surprises me most is how slowly Advanced Voice Mode is being adopted. Despite its impressive capabilities, many of my friends still default to typing or using Whisper (OpenAI's speech-to-text model) rather than having natural conversations with it. Perhaps this hesitation reflects our collective uncertainty about speaking naturally to machines, or simply a lack of awareness&#8212;after all, it only became available to free users <a href="https://x.com/OpenAI/status/1894495906952876101">last month</a>, and many may not yet know <a href="https://help.openai.com/en/articles/9617425-advanced-voice-mode-faq">how to use it</a>. Either way, it suggests we're in an awkward adolescent phase of voice AI adoption&#8212;the technology is capable, but our habits and expectations haven't quite caught up.</p><p>The voice AI revolution isn't just about making machines sound more human&#8212;it's about fundamentally changing how we think about conversation, relationships, and human interaction. While we'll certainly see a proliferation of phone AI agents and computer assistants in the short term, there's a more profound transformation taking shape beneath the surface.</p><p>As we build these systems, we need to consider not just what's technically possible, but what's socially desirable. For now, it's clear that we're entering an era where the line between human and AI conversation is increasingly blurry&#8212;for better or worse.</p><div><hr></div><p><em>To end on a lighter note, here&#8217;s a <a href="https://x.com/CodeByPoonam/status/1840436242326110618">fun video</a> of ChatGPT&#8217;s Voice Mode reimagining an alternate ending to Titanic.</em></p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;a34f6346-5801-432d-9054-ae3533c7810b&quot;,&quot;duration&quot;:null}"></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>Sesame just released an <a href="https://huggingface.co/sesame/csm-1b">open-sourced</a> (Apache 2.0) version of its impressive voice assistant model</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>Word around San Francisco is that top AI labs are on the cusp of a breakthrough that could solve these challenges</p></div></div>]]></content:encoded></item><item><title><![CDATA[Economies of scale for foundational AI models]]></title><description><![CDATA[Big Tech strategic race for defensible AI system]]></description><link>https://www.aitidbits.ai/p/economies-of-scale-gen-ai</link><guid isPermaLink="false">https://www.aitidbits.ai/p/economies-of-scale-gen-ai</guid><dc:creator><![CDATA[Sahar Mor]]></dc:creator><pubDate>Sun, 03 Nov 2024 16:00:19 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/371e38fa-c3d8-4161-ae84-2f716660c8c2_1912x1422.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Welcome to AI Tidbits Deep Dives: short posts offering a perspective on AI-related topics. Some of my previous ones covered <a href="https://www.aitidbits.ai/p/the-great-ai-consolidation">consolidation in the AI space</a>, <a href="https://www.aitidbits.ai/p/the-rise-of-autonomous-agents">autonomous agents</a>, and <a href="https://www.aitidbits.ai/p/doc-extraction-gpt4">document extraction with LLMs</a>.</em></p><div><hr></div><p>A NotebookLM-powered podcast episode discussing this post:</p><div class="native-audio-embed" data-component-name="AudioPlaceholder" data-attrs="{&quot;label&quot;:null,&quot;mediaUploadId&quot;:&quot;b3b4bde7-1d70-42d6-80b7-5b987b56b2eb&quot;,&quot;duration&quot;:914.57306,&quot;downloadable&quot;:false,&quot;isEditorNode&quot;:true}"></div><div><hr></div><p>After listening to a recent podcast featuring Andrej Karpathy<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a>, I've pondered the concept of economies of scale in AI systems, specifically when it comes to data acquisition. The idea of economies of scale originated in the manufacturing world, where increased production led to lower per-unit costs. In the tech world, this concept was adapted by SaaS startups, where the marginal cost of serving an additional customer approached zero. A prime example is Uber, which leveraged its platform to achieve massive scale: as more drivers joined, more riders were attracted, creating a virtuous cycle that dramatically reduced per-ride costs while improving service quality.</p><p>Hungry generative AI models drive major tech companies to pursue high-stakes data partnerships, exemplified by OpenAI's agreements with TIME magazine and Reddit and <a href="https://www.axios.com./2024/10/25/meta-reuters-ai-news-facebook-instagram">Meta's strategic alliance</a> with Reuters to secure premium training content.</p><p>This post explores how economies of scale in the context of data apply to AI and generative models, focusing on three key areas: software vs. hardware, humanoid robots, and large language models.</p><h3><strong>Scaling AI software vs. hardware - autonomous vehicles</strong></h3><p>Tesla and Waymo represent two distinct approaches to achieving autonomous driving capabilities. Tesla, under Elon Musk's leadership, has pursued a vision of making self-driving technology accessible to the mass market through its consumer vehicles. Their strategy revolves around deploying a large fleet of cars equipped with cameras and neural networks that learn from real-world driving data. Waymo, originally Google's self-driving car project, has taken a more cautious approach, focusing on developing a robust autonomous driving system using high-end sensors and detailed mapping technology. While both companies aim to revolutionize transportation, their contrasting strategies highlight fundamental differences in how AI systems can be scaled.</p><p>Software-driven AI solutions, such as Tesla's AI systems, can scale more efficiently than hardware-based systems like those deployed by Google&#8217;s Waymo.</p><p>Waymo started deploying its autonomous fleet using Jaguar cars, featuring expensive hardware such as LiDAR, radar, and high-precision GPS to capture and interpret real-time data. Penetrating a market with a high-end offering is a familiar strategy: Uber started with Uber Black before offering the more affordable option, Uber X. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3VXv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30c22851-4988-4cd2-a68b-946c15299f34_1600x1600.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3VXv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30c22851-4988-4cd2-a68b-946c15299f34_1600x1600.jpeg 424w, https://substackcdn.com/image/fetch/$s_!3VXv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30c22851-4988-4cd2-a68b-946c15299f34_1600x1600.jpeg 848w, https://substackcdn.com/image/fetch/$s_!3VXv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30c22851-4988-4cd2-a68b-946c15299f34_1600x1600.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!3VXv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30c22851-4988-4cd2-a68b-946c15299f34_1600x1600.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3VXv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30c22851-4988-4cd2-a68b-946c15299f34_1600x1600.jpeg" width="595" height="595" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/30c22851-4988-4cd2-a68b-946c15299f34_1600x1600.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:595,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Waymo's Self-Driving Jaguars Arrive With New, Homegrown Tech | WIRED&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Waymo's Self-Driving Jaguars Arrive With New, Homegrown Tech | WIRED" title="Waymo's Self-Driving Jaguars Arrive With New, Homegrown Tech | WIRED" srcset="https://substackcdn.com/image/fetch/$s_!3VXv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30c22851-4988-4cd2-a68b-946c15299f34_1600x1600.jpeg 424w, https://substackcdn.com/image/fetch/$s_!3VXv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30c22851-4988-4cd2-a68b-946c15299f34_1600x1600.jpeg 848w, https://substackcdn.com/image/fetch/$s_!3VXv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30c22851-4988-4cd2-a68b-946c15299f34_1600x1600.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!3VXv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30c22851-4988-4cd2-a68b-946c15299f34_1600x1600.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Waymo&#8217;s fifth-generation system used in the Jaguar I-PACE includes 29 cameras and 6 radar sensors</figcaption></figure></div><p>Tesla is less hardware-dependent and mostly relies on cameras, which are substantially cheaper: a fully deployed Waymo costs <a href="https://blog.dshr.org/2023/11/robotaxi-economics.html">$200k</a> compared to Tesla's Model 3 starting price of ~$39k.</p><p>Software has a distribution advantage&#8212;once built, it can be deployed and iterated across millions of devices, i.e. vehicles, with minimal additional cost. In contrast, hardware solutions are constrained by the need for physical components like sensors, processors, and maintenance, which scale more slowly and are harder to replicate.</p><p>In this case, Waymo is limited by its deployed hardware because its models are tightly coupled with their underlying sensor data.</p><p>Also, unlike Tesla, Waymo's pace of data acquisition is a function of the number of rides it provides, i.e., utilization rate. This is why partnership deals like the one with Uber make sense.</p><p>While many view Waymo&#8217;s partnership with Uber as primarily a commercial move to increase revenues, the real strategic value lies in the diverse data it gathers from Uber&#8217;s wide geographic spread of drivers and riders, both across the U.S. and, potentially, globally. The Uber partnership could also serve as a stepping stone for Waymo to expand into broader delivery services. Imagine collaborations with major retailers like Walmart or Target, or fast-food giants such as McDonald&#8217;s&#8212;directly challenging Uber and logistics providers like UPS in the last-mile delivery race, generating even more data to improve its underlying self-driving technology.</p><p>But beating Tesla on data quantity isn&#8217;t a simple feat. Each day, Tesla drivers drive <a href="https://www.roadtoautonomy.com/tesla-data-advantage/#:~:text=100%2C000%20miles%20per%20minute">137 million miles</a>, generating and sending data that includes human overrides, external cameras' video, location and trip logs, and, in some cases, footage from in-cabin cameras&#8212;an endless stream of real-world human-labeled data. Recognizing the value of detailed data, Tesla made it extremely easy for drivers to contribute richer feedback by allowing them to provide voice input immediately after disengaging the autopilot system.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ypH0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba03fe3d-a41b-4f0e-96d8-c951f3495872_732x412.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ypH0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba03fe3d-a41b-4f0e-96d8-c951f3495872_732x412.jpeg 424w, https://substackcdn.com/image/fetch/$s_!ypH0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba03fe3d-a41b-4f0e-96d8-c951f3495872_732x412.jpeg 848w, https://substackcdn.com/image/fetch/$s_!ypH0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba03fe3d-a41b-4f0e-96d8-c951f3495872_732x412.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!ypH0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba03fe3d-a41b-4f0e-96d8-c951f3495872_732x412.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ypH0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba03fe3d-a41b-4f0e-96d8-c951f3495872_732x412.jpeg" width="647" height="364.1584699453552" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ba03fe3d-a41b-4f0e-96d8-c951f3495872_732x412.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:412,&quot;width&quot;:732,&quot;resizeWidth&quot;:647,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;2022.45.12 (FSD 11.3.3) Official Tesla Release Notes - Software Updates&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="2022.45.12 (FSD 11.3.3) Official Tesla Release Notes - Software Updates" title="2022.45.12 (FSD 11.3.3) Official Tesla Release Notes - Software Updates" srcset="https://substackcdn.com/image/fetch/$s_!ypH0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba03fe3d-a41b-4f0e-96d8-c951f3495872_732x412.jpeg 424w, https://substackcdn.com/image/fetch/$s_!ypH0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba03fe3d-a41b-4f0e-96d8-c951f3495872_732x412.jpeg 848w, https://substackcdn.com/image/fetch/$s_!ypH0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba03fe3d-a41b-4f0e-96d8-c951f3495872_732x412.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!ypH0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba03fe3d-a41b-4f0e-96d8-c951f3495872_732x412.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Tesla prompts human drivers to provide context when they disengage the autopilot system</figcaption></figure></div><p>It's not only about the quantity. Tesla benefits from the diversity of data its drivers generate&#8212;different driving styles, varied terrains, and ever-changing weather conditions. </p><p>More data * More diverse data == faster iteration and deployment of better AI models.</p><p>So, even though Waymo is objectively ahead with real-world autonomous rides across California, Arizona, and Texas, Tesla could catch up quickly with a software update across its vehicle fleet. Waymo, on the other hand, would likely require a change in its sensor hardware, leading to scale issues.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!82jL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabc5e3f2-dbd6-4bc5-b8dd-924f41872a67_1858x1045.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!82jL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabc5e3f2-dbd6-4bc5-b8dd-924f41872a67_1858x1045.jpeg 424w, https://substackcdn.com/image/fetch/$s_!82jL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabc5e3f2-dbd6-4bc5-b8dd-924f41872a67_1858x1045.jpeg 848w, https://substackcdn.com/image/fetch/$s_!82jL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabc5e3f2-dbd6-4bc5-b8dd-924f41872a67_1858x1045.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!82jL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabc5e3f2-dbd6-4bc5-b8dd-924f41872a67_1858x1045.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!82jL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabc5e3f2-dbd6-4bc5-b8dd-924f41872a67_1858x1045.jpeg" width="702" height="394.875" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/abc5e3f2-dbd6-4bc5-b8dd-924f41872a67_1858x1045.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:702,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Waymo announced their sixth generation self-driving car.&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Waymo announced their sixth generation self-driving car." title="Waymo announced their sixth generation self-driving car." srcset="https://substackcdn.com/image/fetch/$s_!82jL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabc5e3f2-dbd6-4bc5-b8dd-924f41872a67_1858x1045.jpeg 424w, https://substackcdn.com/image/fetch/$s_!82jL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabc5e3f2-dbd6-4bc5-b8dd-924f41872a67_1858x1045.jpeg 848w, https://substackcdn.com/image/fetch/$s_!82jL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabc5e3f2-dbd6-4bc5-b8dd-924f41872a67_1858x1045.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!82jL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabc5e3f2-dbd6-4bc5-b8dd-924f41872a67_1858x1045.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">With the integration of new hardware and software, Waymo is expected to handle a broader range of weather conditions while reducing the need for expensive cameras and sensors</figcaption></figure></div><pre><code><code>Become a premium to access the LLM Builders series, $1k in free credits for leading AI tools and APIs, and editorial deep dives into key topics like OpenAI's DevDay and autonomous agents.

Many readers expense the paid membership from their learning and development education stipend.</code></code></pre><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.aitidbits.ai/subscribe&quot;,&quot;text&quot;:&quot;Upgrade to Premium&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.aitidbits.ai/subscribe"><span>Upgrade to Premium</span></a></p><h3><strong>Humanoid robots and teleoperations - Better Labeled Data</strong></h3><p>Another key insight involves how humanoid robots tackle real-world environments.</p><p>I once thought the primary value of humanoid robots was their relatable, user-friendly design. Picture a square robot on wheels navigating your home versus a humanoid&#8212;it&#8217;s clear why human-like robots feel more intuitive and approachable.</p><p>But, human-like robots play another significant role - they allow humans to operate them remotely, also known as teleoperations. For example, in a manufacturing setting, a skilled technician can wear motion-tracking equipment to guide a humanoid robot through complex assembly tasks, like connecting delicate electronic components or threading wires through tight spaces. The robot mirrors the technician's precise hand movements and finger positions in real-time.</p><p>This approach is crucial for gathering high-quality labeled data in real-world conditions. Through teleoperations, companies like Figure can collect diverse, precisely labeled data that mirrors human decision-making in complex environments. Such data is critical for training AI systems to perform effectively in real-world scenarios.</p><p>As Karpathy notes in the podcast, there's a significant transfer from automotive AI to humanoid robotics. Tesla's Optimus robot initially used the same computer and cameras as Tesla cars, showcasing how foundational AI systems can be adapted across different applications. This cross-pollination of technology and data between automotive and humanoid robotics accelerates development and scaling in both domains.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!IZxo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c26a37e-faa2-431b-9461-9c2eef61bc20_600x338.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!IZxo!,w_424,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c26a37e-faa2-431b-9461-9c2eef61bc20_600x338.gif 424w, https://substackcdn.com/image/fetch/$s_!IZxo!,w_848,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c26a37e-faa2-431b-9461-9c2eef61bc20_600x338.gif 848w, https://substackcdn.com/image/fetch/$s_!IZxo!,w_1272,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c26a37e-faa2-431b-9461-9c2eef61bc20_600x338.gif 1272w, https://substackcdn.com/image/fetch/$s_!IZxo!,w_1456,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c26a37e-faa2-431b-9461-9c2eef61bc20_600x338.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!IZxo!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c26a37e-faa2-431b-9461-9c2eef61bc20_600x338.gif" width="622" height="350.3933333333333" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0c26a37e-faa2-431b-9461-9c2eef61bc20_600x338.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:338,&quot;width&quot;:600,&quot;resizeWidth&quot;:622,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;temp.mov [optimize output image]&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="temp.mov [optimize output image]" title="temp.mov [optimize output image]" srcset="https://substackcdn.com/image/fetch/$s_!IZxo!,w_424,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c26a37e-faa2-431b-9461-9c2eef61bc20_600x338.gif 424w, https://substackcdn.com/image/fetch/$s_!IZxo!,w_848,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c26a37e-faa2-431b-9461-9c2eef61bc20_600x338.gif 848w, https://substackcdn.com/image/fetch/$s_!IZxo!,w_1272,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c26a37e-faa2-431b-9461-9c2eef61bc20_600x338.gif 1272w, https://substackcdn.com/image/fetch/$s_!IZxo!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c26a37e-faa2-431b-9461-9c2eef61bc20_600x338.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">The Figure 01 robot is a humanoid robot powered by OpenAI</figcaption></figure></div><h3><strong>Economies of scale for LLMs</strong></h3><p>Language models also benefit from scale:</p><ol><li><p>Inference becomes cheaper as server utilization is more predicable at scale, allowing tailored hardware and software optimization</p></li><li><p>Broader distribution means more users, which then generate more data. Data is the <a href="https://web.archive.org/web/20240406095041/https://www.nytimes.com/2024/04/06/technology/tech-giants-harvest-data-artificial-intelligence.html">#1 blocker for generative AI companies</a>, making scale crucial for continued improvement.</p></li></ol><p>Every user interaction with ChatGPT's feedback system, from rating responses to choosing between alternatives, becomes valuable training data. For example, when ChatGPT users click the thumbs-down icon or select their preferred generation out of two options, this feedback is stored in an internal database. Later, it can be used to evaluate future models or apply Reinforcement Learning from Human Feedback (RLHF), helping to better align the model for ChatGPT users or, even better, personalizing responses based on individual user preferences and interaction patterns</p><p>Theoretically, OpenAI and other model providers can go as far as clustering users according to demographics like age and political views to better align ChatGPT&#8217;s response. When a model is better aligned with users, it becomes more engaging, increasing both usage and data generation, continuously fueling the improvement cycle in a positive feedback loop.<br></p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;40a9734e-7a73-4666-81dc-b48913146a70&quot;,&quot;duration&quot;:null}"></div><p><em>OpenAI started collecting more detailed usage feedback for its new Advance Voice Mode</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!zKEM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff4f3386-d96c-40d7-b5cf-2b8d61f6e423_1262x723.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!zKEM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff4f3386-d96c-40d7-b5cf-2b8d61f6e423_1262x723.png 424w, https://substackcdn.com/image/fetch/$s_!zKEM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff4f3386-d96c-40d7-b5cf-2b8d61f6e423_1262x723.png 848w, https://substackcdn.com/image/fetch/$s_!zKEM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff4f3386-d96c-40d7-b5cf-2b8d61f6e423_1262x723.png 1272w, https://substackcdn.com/image/fetch/$s_!zKEM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff4f3386-d96c-40d7-b5cf-2b8d61f6e423_1262x723.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!zKEM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff4f3386-d96c-40d7-b5cf-2b8d61f6e423_1262x723.png" width="700" height="401.0301109350238" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ff4f3386-d96c-40d7-b5cf-2b8d61f6e423_1262x723.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:723,&quot;width&quot;:1262,&quot;resizeWidth&quot;:700,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!zKEM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff4f3386-d96c-40d7-b5cf-2b8d61f6e423_1262x723.png 424w, https://substackcdn.com/image/fetch/$s_!zKEM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff4f3386-d96c-40d7-b5cf-2b8d61f6e423_1262x723.png 848w, https://substackcdn.com/image/fetch/$s_!zKEM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff4f3386-d96c-40d7-b5cf-2b8d61f6e423_1262x723.png 1272w, https://substackcdn.com/image/fetch/$s_!zKEM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff4f3386-d96c-40d7-b5cf-2b8d61f6e423_1262x723.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">ChatGPT prompts the user to choose the generation they find better, later to be used to &#8220;help make ChatGPT better&#8221;</figcaption></figure></div><p>Meta's launch of its AI chatbot is a prime example of this strategy. By integrating AI assistants into widely used platforms like Facebook and Instagram, Meta can collect vast amounts of real-world interaction data. Such data is critical for improving its models&#8217; performance and adaptability across diverse contexts.</p><p>Such a strategy goes beyond language models, expanding to image, video, and audio. By contributing to the open-source AI ecosystem with the <a href="http://multimodal Llama">multimodal Llama</a> and the state-of-the-art image segmentation model <a href="https://ai.meta.com/sam2/">Segment Anything 2</a>, Meta leverages both users and AI developers to improve the same underlying technology the powers Instagram, WhatsApp, and its recent blockbuster, the <a href="https://techcrunch.com/2024/10/21/metas-smart-glasses-outsell-traditional-ray-bans-in-some-stores-even-before-ai-features-roll-out/">Meta Ray-Ban</a>.</p><h3><strong>The Future of AI Systems</strong></h3><p>The next frontier in AI development involves:</p><ol><li><p>Balancing software scalability and hardware constraints</p></li><li><p>Building data collection devices (e.g. robots) that resemble the real world to benefit from transfer learning and easier human labeling</p></li><li><p>Maximizing user distribution for (a) data collection (see Meta&#8217;s example above) and (b) brand recognition </p></li></ol><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!c0Dk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8060c39-6e0d-4906-a6e6-e628c04f024c_2384x912.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!c0Dk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8060c39-6e0d-4906-a6e6-e628c04f024c_2384x912.png 424w, https://substackcdn.com/image/fetch/$s_!c0Dk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8060c39-6e0d-4906-a6e6-e628c04f024c_2384x912.png 848w, https://substackcdn.com/image/fetch/$s_!c0Dk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8060c39-6e0d-4906-a6e6-e628c04f024c_2384x912.png 1272w, https://substackcdn.com/image/fetch/$s_!c0Dk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8060c39-6e0d-4906-a6e6-e628c04f024c_2384x912.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!c0Dk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8060c39-6e0d-4906-a6e6-e628c04f024c_2384x912.png" width="1456" height="557" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f8060c39-6e0d-4906-a6e6-e628c04f024c_2384x912.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:557,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:129422,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!c0Dk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8060c39-6e0d-4906-a6e6-e628c04f024c_2384x912.png 424w, https://substackcdn.com/image/fetch/$s_!c0Dk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8060c39-6e0d-4906-a6e6-e628c04f024c_2384x912.png 848w, https://substackcdn.com/image/fetch/$s_!c0Dk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8060c39-6e0d-4906-a6e6-e628c04f024c_2384x912.png 1272w, https://substackcdn.com/image/fetch/$s_!c0Dk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8060c39-6e0d-4906-a6e6-e628c04f024c_2384x912.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">OpenAI&#8217;s ChatGPT is the most searched term, far surpassing Google&#8217;s Gemini and Anthropic&#8217;s Claude, solidifying its position as the go-to choice for companies seeking a model provider</figcaption></figure></div><p>Tech giants like Google and Microsoft <a href="https://www.aitidbits.ai/p/the-great-ai-consolidation">already position themselves</a> for this AI-dominated future, recognizing the critical role of economies of scale in their race to tomorrow's AI landscape.</p><p>From autonomous vehicles to humanoid robots and LLMs, the ability to efficiently scale both software and data collection is becoming a key differentiator. As AI continues to evolve, companies that can effectively leverage these economies of scale will likely lead the way in innovation and market dominance.</p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p><a href="https://youtube.com/watch?v=hM_h0UA7upI">No Priors Ep. 80 | With Andrej Karpathy from OpenAI and Tesla</a></p></div></div>]]></content:encoded></item><item><title><![CDATA[The Great AI Consolidation]]></title><description><![CDATA[Are we entering the AI consolidation phase? Thoughts on where the AI market is heading with the acquisitions of well-funded AI companies like Character AI and Inflection]]></description><link>https://www.aitidbits.ai/p/the-great-ai-consolidation</link><guid isPermaLink="false">https://www.aitidbits.ai/p/the-great-ai-consolidation</guid><dc:creator><![CDATA[Sahar Mor]]></dc:creator><pubDate>Sun, 29 Sep 2024 15:01:08 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/35bffcff-1d4f-4670-a10e-7af96da00945_2020x1406.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>I&#8217;m excited to share a new Deep Dive after a short hiatus.</em></p><p><em>Deep Dives are short posts offering a perspective on AI-related topics. Some of my previous ones covered <a href="https://www.aitidbits.ai/p/the-rise-of-autonomous-agents">autonomous agents</a>, <a href="https://www.aitidbits.ai/p/doc-extraction-gpt4">document extraction with LLMs</a>, and a review of <a href="https://www.aitidbits.ai/p/2023-sota-report">2023&#8217;s state-of-the-art AI</a>.</em></p><div><hr></div><p>A NotebookLM-powered podcast episode discussing this post:</p><div class="native-audio-embed" data-component-name="AudioPlaceholder" data-attrs="{&quot;label&quot;:null,&quot;mediaUploadId&quot;:&quot;6c72c870-2e25-4549-aeac-fadf85c4f9fa&quot;,&quot;duration&quot;:533.969,&quot;downloadable&quot;:false,&quot;isEditorNode&quot;:true}"></div><div><hr></div><p>The AI industry is entering a consolidation phase as major tech giants like Microsoft and Amazon acquire or hire talent from smaller AI startups, securing exclusive rights to cutting-edge technologies. Meanwhile, key players like&nbsp;<a href="https://www.semafor.com/article/04/07/2023/stability-ai-is-on-shaky-ground-as-it-burns-through-cash">Stability AI</a>&nbsp;and Aleph Alpha, up until recently Germany&#8217;s sole language model provider, are pivoting away from ambitious projects, citing commercialization challenges, and the future of remaining model providers like Mistral and Cohere remains a work in progress as they strive to differentiate themselves and establish their positioning in the fast-paced generative AI landscape.</p><p>But is this merely a phase in the AI industry's evolution, or are we witnessing the start of an AI oligopoly? A brief observation of the recent acquisitions and pivots of leading AI labs, including Adept, Inflection, Character AI, and Aleph Alpha, and a humble forecast for the companies still in the race.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2CeX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16339ab1-69aa-401f-83d9-b9365568331e_1892x1066.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2CeX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16339ab1-69aa-401f-83d9-b9365568331e_1892x1066.png 424w, https://substackcdn.com/image/fetch/$s_!2CeX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16339ab1-69aa-401f-83d9-b9365568331e_1892x1066.png 848w, https://substackcdn.com/image/fetch/$s_!2CeX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16339ab1-69aa-401f-83d9-b9365568331e_1892x1066.png 1272w, https://substackcdn.com/image/fetch/$s_!2CeX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16339ab1-69aa-401f-83d9-b9365568331e_1892x1066.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2CeX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16339ab1-69aa-401f-83d9-b9365568331e_1892x1066.png" width="696" height="391.97802197802196" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/16339ab1-69aa-401f-83d9-b9365568331e_1892x1066.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:820,&quot;width&quot;:1456,&quot;resizeWidth&quot;:696,&quot;bytes&quot;:318775,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2CeX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16339ab1-69aa-401f-83d9-b9365568331e_1892x1066.png 424w, https://substackcdn.com/image/fetch/$s_!2CeX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16339ab1-69aa-401f-83d9-b9365568331e_1892x1066.png 848w, https://substackcdn.com/image/fetch/$s_!2CeX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16339ab1-69aa-401f-83d9-b9365568331e_1892x1066.png 1272w, https://substackcdn.com/image/fetch/$s_!2CeX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16339ab1-69aa-401f-83d9-b9365568331e_1892x1066.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>The rise of Big Tech acquisitions</h3><p>AI startups have reached unprecedented valuations and garnered massive customer bases. Character AI, a company that develops advanced conversational AI models allowing users to interact with customizable AI characters like Elon Musk and a couples therapist, <a href="https://techcrunch.com/2023/09/11/ai-app-character-ai-is-catching-up-to-chatgpt-in-the-u-s/">had over 5 million</a> monthly active users a year ago and was ranked as the <a href="https://a16z.com/100-gen-ai-apps/">3rd top generative AI web app</a> by a16z. Founded by Noam Shazeer and Daniel De Freitas, two highly respected AI researchers from Google, the company recently reached a valuation of $1 billion.</p><p>Last August, Google acquired its founders and most of its research team for <a href="https://futurism.com/the-byte/google-paid-billion-single-ai-researcher-back">$2.7B</a>.</p><p>Adept, a company building AI-powered agents to automate digital tasks, was well-positioned to capitalize on the recent autonomous agents trend. Its novel ACT-1 model, which debuted in September 2022, was celebrated in the AI community long before the concept of AI agents became popular. Adept raised $415 million and was valued at ~$1 billion.</p><p>Last June, Amazon acquired its founders to join its growing AI division, reporting to Rohit Prasad, the former Alexa head who&#8217;s leading a new AGI team focused on building LLMs.</p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;1519055c-1dc0-4888-8468-8be2fec992c8&quot;,&quot;duration&quot;:null}"></div><div class="pullquote"><p>Using Adept&#8217;s <a href="https://www.adept.ai/blog/act-1">Action Transformer</a> model to turn natural language into actions</p></div><p>Microsoft has also joined the wave of AI consolidation, paying a staggering $650 million to AI startup Inflection for a non-exclusive licensing deal and, crucially, to hire the bulk of its top talent, including its co-founders.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!w7r-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6351b4d4-0d2c-4986-b29d-32f8c4081744_720x900.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!w7r-!,w_424,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6351b4d4-0d2c-4986-b29d-32f8c4081744_720x900.gif 424w, https://substackcdn.com/image/fetch/$s_!w7r-!,w_848,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6351b4d4-0d2c-4986-b29d-32f8c4081744_720x900.gif 848w, https://substackcdn.com/image/fetch/$s_!w7r-!,w_1272,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6351b4d4-0d2c-4986-b29d-32f8c4081744_720x900.gif 1272w, https://substackcdn.com/image/fetch/$s_!w7r-!,w_1456,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6351b4d4-0d2c-4986-b29d-32f8c4081744_720x900.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!w7r-!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6351b4d4-0d2c-4986-b29d-32f8c4081744_720x900.gif" width="384" height="480" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6351b4d4-0d2c-4986-b29d-32f8c4081744_720x900.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:900,&quot;width&quot;:720,&quot;resizeWidth&quot;:384,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Snapinsta.app_video_AQPRsvuxdtB47BsL3oKpxHMclFIcmmE8IV7bgwHEywp80_g1aMCJP6uGciEdXIRRRDuyRONOpr7xq1ShIYIJGDA4.mp4 [optimize output image]&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Snapinsta.app_video_AQPRsvuxdtB47BsL3oKpxHMclFIcmmE8IV7bgwHEywp80_g1aMCJP6uGciEdXIRRRDuyRONOpr7xq1ShIYIJGDA4.mp4 [optimize output image]" title="Snapinsta.app_video_AQPRsvuxdtB47BsL3oKpxHMclFIcmmE8IV7bgwHEywp80_g1aMCJP6uGciEdXIRRRDuyRONOpr7xq1ShIYIJGDA4.mp4 [optimize output image]" srcset="https://substackcdn.com/image/fetch/$s_!w7r-!,w_424,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6351b4d4-0d2c-4986-b29d-32f8c4081744_720x900.gif 424w, https://substackcdn.com/image/fetch/$s_!w7r-!,w_848,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6351b4d4-0d2c-4986-b29d-32f8c4081744_720x900.gif 848w, https://substackcdn.com/image/fetch/$s_!w7r-!,w_1272,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6351b4d4-0d2c-4986-b29d-32f8c4081744_720x900.gif 1272w, https://substackcdn.com/image/fetch/$s_!w7r-!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6351b4d4-0d2c-4986-b29d-32f8c4081744_720x900.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Inflection&#8217;s Pi chatbot was considered a more emphatic conversational AI compared to ChatGPT and Claude</figcaption></figure></div><p>What makes these acquisitions striking is that they often don&#8217;t involve outright purchases of entire companies. Instead, Big Tech firms opt for reverse &#8220;acquihires,&#8221; focusing on poaching talent and securing non-exclusive licensing deals, thereby sidestepping antitrust scrutiny. This tactic allows tech giants to assimilate the best minds and technology without raising the ire of regulators.</p><p>This all makes sense in an era when AI researchers are hard to come by, and Google&#8217;s co-founder <a href="https://www.yahoo.com/tech/sergey-brin-personally-called-google-114836472.html">calls researchers</a> to prevent them from leaving for competing AI labs.</p><h3>Big AI dreams, bigger bills</h3><p>Building world-class AI models is becoming prohibitively expensive for startups. Developing competitive AI products, especially large language models, requires <a href="https://web.archive.org/web/20240611124001/https://fortune.com/2024/04/04/ai-training-costs-how-much-is-too-much-openai-gpt-anthropic-microsoft/">enormous computational resources</a> and access to vast datasets. For instance, Adept AI&#8217;s ambitious plans to create models that translate natural language into machine actions would have required not only technical innovation but also continuous fundraising.</p><p>For a startup, surviving in a space where you must lease computational power from the same giants you&#8217;re trying to compete with is unsustainable. Startups often rely on cloud services from AWS, Azure, or Google Cloud, directly paying their rivals for the means to develop their own products. Add to this the exorbitant cost of AI research talent, which can command salaries into the millions, and it is easy to see why many startups struggle to stay independent.</p><p>With these acquisitions, not only are startups losing their autonomy, but Big Tech&#8217;s growing control raises concerns about a potential oligopoly in AI.</p><pre><code><code>Become a premium member to get full access to my content and $1k in free credits for leading AI tools and APIs, including Claude, Hugging Face, Deepgram. It&#8217;s common to expense the paid membership from your company&#8217;s learning and development education stipend.</code></code></pre><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.aitidbits.ai/subscribe&quot;,&quot;text&quot;:&quot;Upgrade to Premium&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.aitidbits.ai/subscribe"><span>Upgrade to Premium</span></a></p><h3>AI oligopoly</h3><p>The ongoing trend of Big Tech&#8217;s dominance in AI raises a critical question: Will all major AI research and applications ultimately end up in the hands of a few giants? If so, what does that mean for competition and innovation?</p><p>OpenAI, once an independent startup, is now heavily funded by Microsoft. Anthropic relies on investment from Google and Amazon.</p><p>One might argue that AI consolidation is a natural phase, much like in other industries where larger companies eventually absorb smaller players. However, AI is different. The technology is poised to influence everything from personal assistants to healthcare diagnostics, and control over its development could have significant societal implications. As we can see from Microsoft's acquisition of Inflection AI&#8217;s talent and technology, these moves are not just about enhancing corporate capabilities but also about positioning Big Tech as the gatekeeper of the future of AI.</p><p>With companies like Microsoft, Amazon, and Alphabet investing heavily in their AI ecosystems, startups increasingly face a difficult choice: either scale up dramatically to remain competitive or sell out to a tech giant.</p><h3>Europe exiting the LLM arena, or not?</h3><p>The pressure to transition from research to commercial success has not spared Europe's AI sector.</p><p>Aleph Alpha, founded in 2019 by Jonas Andrulis, was once hailed as Germany's most promising AI startup. The company aimed to develop powerful language models that could compete with those created by American tech giants while adhering to European values and data protection standards.</p><p>Raising a $500 million Series B less than a year ago, the company's efforts were crucial for maintaining European technological sovereignty in AI, especially given the dominance of U.S. and Chinese firms in the sector.</p><p>However, Aleph Alpha <a href="https://techcrunch.com/2024/09/05/german-llm-maker-aleph-alpha-pivots-to-ai-support/">recently announced</a> a pivot away from developing LLMs. Instead, the company will focus on creating AI systems for specific enterprise applications. This shift marks a significant change in strategy and raises questions about Europe's ability to compete in the global AI race.</p><p>With another well-funded AI company out of the game, Europe's AI ambitions now lie with the 18-month-old Paris-based Mistral. Founded by former Google and Meta AI researchers, Mistral has been making waves with its open-source approach and impressive technical achievements.</p><div><hr></div><p><strong>Explore previous deep dives:</strong></p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;c4452c50-0ac7-4596-8892-0ec27dad1ace&quot;,&quot;caption&quot;:&quot;Welcome to Deep Dives - an AI Tidbits section providing editorial takes and insights to make sense of the latest in AI. Let&#8217;s go!&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;The rise of autonomous agents&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:3770805,&quot;name&quot;:&quot;Sahar Mor&quot;,&quot;bio&quot;:&quot;An operator and a founder in the AI space for over a decade, recently at Stripe. Helping AI researchers and builders make sense of AI @ AI Tidbits.&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa06b2072-0444-44f7-8106-7892097e4128_1690x1762.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:100}],&quot;post_date&quot;:&quot;2023-11-19T16:30:28.414Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56d15b18-239c-4403-839e-544d2e9dac77_600x378.gif&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.aitidbits.ai/p/the-rise-of-autonomous-agents&quot;,&quot;section_name&quot;:&quot;Deep Dives&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:138981811,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:16,&quot;comment_count&quot;:3,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;AI Tidbits&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F71d6ea06-1f4c-478d-b0f2-6227eede6b25_1280x1280.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;86e911d3-fb3d-484a-aa1f-97b1eb277501&quot;,&quot;caption&quot;:&quot;Welcome to Deep Dives - an AI Tidbits section providing editorial takes and insights to make sense of the latest in AI. Let&#8217;s go!&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;The Multiprocessor of Language Models&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:3770805,&quot;name&quot;:&quot;Sahar Mor&quot;,&quot;bio&quot;:&quot;An operator and a founder in the AI space for over a decade, recently at Stripe. Helping AI researchers and builders make sense of AI @ AI Tidbits.&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa06b2072-0444-44f7-8106-7892097e4128_1690x1762.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:100}],&quot;post_date&quot;:&quot;2023-08-20T15:30:09.330Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0df50929-00cd-4dd8-8768-9ca090ebe0bd_200x250.gif&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.aitidbits.ai/p/the-multiprocessor-of-language-models&quot;,&quot;section_name&quot;:&quot;Deep Dives&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:136039501,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:15,&quot;comment_count&quot;:0,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;AI Tidbits&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F71d6ea06-1f4c-478d-b0f2-6227eede6b25_1280x1280.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div><hr></div><h3>Further consolidation</h3><p>What lies ahead for the teams still building foundational models?</p><p>Mistral, known for its cutting-edge generative models like Mistral 7B and the multimodal Pixtral 12B, now faces increasing pressure from both sides of the AI model development market. On the open-source front, Meta continues to release powerful models such as Llama 3.1 405B and 3.2 90B, which have outperformed Mistral&#8217;s proprietary Large 2 on benchmarks and the Chatbot Arena leaderboard. Commercially, Mistral&#8217;s hosted Large 2 model also lags behind competitors like Claude 3.5 Sonnet and GPT-4o, raising questions about the company&#8217;s ability to maintain its competitive edge in both open-source innovation and proprietary offerings.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ps4u!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43c3aab5-369e-4a6a-aa77-3016a6f7cb31_1188x676.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ps4u!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43c3aab5-369e-4a6a-aa77-3016a6f7cb31_1188x676.png 424w, https://substackcdn.com/image/fetch/$s_!ps4u!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43c3aab5-369e-4a6a-aa77-3016a6f7cb31_1188x676.png 848w, https://substackcdn.com/image/fetch/$s_!ps4u!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43c3aab5-369e-4a6a-aa77-3016a6f7cb31_1188x676.png 1272w, https://substackcdn.com/image/fetch/$s_!ps4u!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43c3aab5-369e-4a6a-aa77-3016a6f7cb31_1188x676.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ps4u!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43c3aab5-369e-4a6a-aa77-3016a6f7cb31_1188x676.png" width="620" height="352.7946127946128" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/43c3aab5-369e-4a6a-aa77-3016a6f7cb31_1188x676.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:676,&quot;width&quot;:1188,&quot;resizeWidth&quot;:620,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ps4u!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43c3aab5-369e-4a6a-aa77-3016a6f7cb31_1188x676.png 424w, https://substackcdn.com/image/fetch/$s_!ps4u!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43c3aab5-369e-4a6a-aa77-3016a6f7cb31_1188x676.png 848w, https://substackcdn.com/image/fetch/$s_!ps4u!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43c3aab5-369e-4a6a-aa77-3016a6f7cb31_1188x676.png 1272w, https://substackcdn.com/image/fetch/$s_!ps4u!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43c3aab5-369e-4a6a-aa77-3016a6f7cb31_1188x676.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Mistral ingenious way of releasing new models - a mysterious Magent link on their X account</figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FwMk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee556b88-7259-4ade-817b-3f37151bf3ac_1600x909.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FwMk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee556b88-7259-4ade-817b-3f37151bf3ac_1600x909.png 424w, https://substackcdn.com/image/fetch/$s_!FwMk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee556b88-7259-4ade-817b-3f37151bf3ac_1600x909.png 848w, https://substackcdn.com/image/fetch/$s_!FwMk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee556b88-7259-4ade-817b-3f37151bf3ac_1600x909.png 1272w, https://substackcdn.com/image/fetch/$s_!FwMk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee556b88-7259-4ade-817b-3f37151bf3ac_1600x909.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FwMk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee556b88-7259-4ade-817b-3f37151bf3ac_1600x909.png" width="634" height="360.1085164835165" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ee556b88-7259-4ade-817b-3f37151bf3ac_1600x909.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:827,&quot;width&quot;:1456,&quot;resizeWidth&quot;:634,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FwMk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee556b88-7259-4ade-817b-3f37151bf3ac_1600x909.png 424w, https://substackcdn.com/image/fetch/$s_!FwMk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee556b88-7259-4ade-817b-3f37151bf3ac_1600x909.png 848w, https://substackcdn.com/image/fetch/$s_!FwMk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee556b88-7259-4ade-817b-3f37151bf3ac_1600x909.png 1272w, https://substackcdn.com/image/fetch/$s_!FwMk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee556b88-7259-4ade-817b-3f37151bf3ac_1600x909.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Chatbot Arena leaderboard, Sep &#8216;24</figcaption></figure></div><p>The Toronto/SF-based AI company Cohere is another long-time player in this space. It recently raised <a href="https://techcrunch.com/2024/07/22/cohere-raises-500m-to-beat-back-generative-ai-rivals/">$500 million at a $5.5 billion valuation</a>, although reportedly only having <a href="https://fortune.com/2024/04/25/cohere-ceo-openai-rival-aidan-gomez-enterprise-ai-revenues-set-to-soar/">$35M in annual recurring revenue</a>.</p><p>Mistral and Cohere represent two distinct strategies in the AI market, both attempting to carve out a niche in the face of Big Tech dominance. Mistral, for instance, has embraced an open-source approach, releasing models like Mistral 7B to foster community adoption and differentiate itself from the proprietary systems of larger players like Microsoft and OpenAI. This strategy not only aligns with the growing demand for transparency and on-premise deployment but also positions Mistral as an alternative for developers seeking more accessible and customizable tools.</p><p>Despite limited commercial traction, both Mistral and Cohere have raised significant capital (Mistral raised $645 million at a <a href="https://web.archive.org/web/20240805120500/https://www.cnbc.com/2024/06/12/mistral-ai-raises-645-million-at-a-6-billion-valuation.html">$6 billion</a> valuation)&#8212;largely due to investor confidence in their cutting-edge technology and the potential for future breakthroughs. Investors are betting that, even with modest current revenues, these companies can scale by offering specialized models for enterprise applications or monetizing their technological advancements in the long term.</p><p>This ability to secure funding in the absence of high recurring revenue highlights the immense value placed on innovation and technical leadership in the AI sector, where future returns can outweigh immediate profitability. Nonetheless, both companies will likely face a similar outcome as their previous AI peers unless they generate the cash flow to sustain further model development and keep investors happy.</p><h3>The tale of a founding team of researchers</h3><p>This current AI wave brought a new trait to the most funded and lucrative AI startups&#8212;a trait Adept, Cohere, Aleph Alpha, Mistral, and Character all share&#8212;the founders come from a deep research background.</p><p>All these companies launched groundbreaking models within months of incubation, but they all faced a common challenge: commercialization and leadership. Adept lost two co-founders early on, while Character AI raised $150M without generating any revenue.</p><p>The lack of go-to-market experience may be a significant factor in these companies' openness to acquisitions. Generating recurring revenue is a new and difficult challenge for researchers who are used to focusing more on innovation and technical breakthroughs than on commercial strategy.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3htR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4ba5c67-596a-41a6-9d6c-a80e33a28dd6_2880x2880.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3htR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4ba5c67-596a-41a6-9d6c-a80e33a28dd6_2880x2880.jpeg 424w, https://substackcdn.com/image/fetch/$s_!3htR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4ba5c67-596a-41a6-9d6c-a80e33a28dd6_2880x2880.jpeg 848w, https://substackcdn.com/image/fetch/$s_!3htR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4ba5c67-596a-41a6-9d6c-a80e33a28dd6_2880x2880.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!3htR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4ba5c67-596a-41a6-9d6c-a80e33a28dd6_2880x2880.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3htR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4ba5c67-596a-41a6-9d6c-a80e33a28dd6_2880x2880.jpeg" width="634" height="634" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c4ba5c67-596a-41a6-9d6c-a80e33a28dd6_2880x2880.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:634,&quot;bytes&quot;:2068020,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3htR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4ba5c67-596a-41a6-9d6c-a80e33a28dd6_2880x2880.jpeg 424w, https://substackcdn.com/image/fetch/$s_!3htR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4ba5c67-596a-41a6-9d6c-a80e33a28dd6_2880x2880.jpeg 848w, https://substackcdn.com/image/fetch/$s_!3htR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4ba5c67-596a-41a6-9d6c-a80e33a28dd6_2880x2880.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!3htR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4ba5c67-596a-41a6-9d6c-a80e33a28dd6_2880x2880.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Every founder in this picture comes from a heavy research background. Character AI (top left), Cohere (top right), Adept (bottom left), Mistral (bottom right)</figcaption></figure></div><h3><strong>Conclusion: A Consolidated AI Future?</strong></h3><p>While the recent flurry of acquisitions suggests that we are indeed entering the consolidation phase of AI investments, it&#8217;s not necessarily the end of innovation in the space. Startups still play a crucial role in advancing AI research, but as development costs rise, they may increasingly rely on Big Tech for survival.</p><p>However, the real question isn&#8217;t whether Big Tech will dominate AI&#8212;it already does&#8212;but whether that dominance will stifle innovation and investment.</p><p>What&#8217;s clear, however, is that the age of wild, independent AI startups might be coming to an end.</p><p></p><p><em>If you find AI Tidbits valuable, share it with a friend and consider showing your support.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.aitidbits.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.aitidbits.ai/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item><item><title><![CDATA[2023 Most Impactful Generative AI Papers]]></title><description><![CDATA[The papers that shaped AI research and industry in 2023 and beyond, from Meta's LLaMA to Stanford's ControlNet and Microsoft Orca]]></description><link>https://www.aitidbits.ai/p/2023-impactful-papers</link><guid isPermaLink="false">https://www.aitidbits.ai/p/2023-impactful-papers</guid><dc:creator><![CDATA[Sahar Mor]]></dc:creator><pubDate>Sat, 13 Jan 2024 03:24:06 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!wv-T!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46fb20cb-534b-45f1-8907-745083a474b9_4428x5298.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This is a cross-post of my guest post with AI Supremacy:</p><div class="embedded-post-wrap" data-attrs="{&quot;id&quot;:140396282,&quot;url&quot;:&quot;https://aisupremacy.substack.com/p/most-impactful-generative-ai-papers&quot;,&quot;publication_id&quot;:396235,&quot;publication_name&quot;:&quot;AI Supremacy &quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc548f8c4-823b-4a2a-b499-528f9a84cb5c_215x215.png&quot;,&quot;title&quot;:&quot;Most Impactful Generative AI Papers of 2023&quot;,&quot;truncated_body_text&quot;:&quot;&#9757; Image Created: Sahar Mor, January, 2023. Hey Everyone, I&#8217;ve really enjoyed reading AI papers in 2023 more than ever, and I hope you have as well! One of the best people to follow on breaking news in Generative A.I. is actually Sahar Mor , his LinkedIn posts&quot;,&quot;date&quot;:&quot;2024-01-06T10:56:11.421Z&quot;,&quot;like_count&quot;:52,&quot;comment_count&quot;:9,&quot;bylines&quot;:[{&quot;id&quot;:21731691,&quot;name&quot;:&quot;Michael Spencer&quot;,&quot;handle&quot;:&quot;aisupremacy&quot;,&quot;previous_name&quot;:null,&quot;photo_url&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/75d1bf99-dcf3-4af6-be2a-416c08c954a1_450x450.jpeg&quot;,&quot;bio&quot;:&quot;Michael is an amateur futurist with 210,000 LinkedIn followers and a 2-time LinkedIn Top Voice. Obsessed with future topics such as A.I, robotics, quantum computing, Web3, investing, venture capital, startups, business and technology trends. &quot;,&quot;profile_set_up_at&quot;:&quot;2021-07-09T21:10:50.118Z&quot;,&quot;publicationUsers&quot;:[{&quot;id&quot;:320401,&quot;user_id&quot;:21731691,&quot;publication_id&quot;:396235,&quot;role&quot;:&quot;admin&quot;,&quot;public&quot;:true,&quot;is_primary&quot;:true,&quot;publication&quot;:{&quot;id&quot;:396235,&quot;name&quot;:&quot;AI Supremacy &quot;,&quot;subdomain&quot;:&quot;aisupremacy&quot;,&quot;custom_domain&quot;:null,&quot;custom_domain_optional&quot;:false,&quot;hero_text&quot;:&quot;News at the intersection of Artificial Intelligence, technology and business including Op-Eds, research summaries, guest contributions and valuable info about A.I. startups. &quot;,&quot;logo_url&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/c548f8c4-823b-4a2a-b499-528f9a84cb5c_215x215.png&quot;,&quot;author_id&quot;:21731691,&quot;theme_var_background_pop&quot;:&quot;#8AE1A2&quot;,&quot;created_at&quot;:&quot;2021-06-28T21:51:38.676Z&quot;,&quot;rss_website_url&quot;:null,&quot;email_from_name&quot;:null,&quot;copyright&quot;:&quot;Michael Spencer&quot;,&quot;founding_plan_name&quot;:&quot;Founding Member&quot;,&quot;community_enabled&quot;:true,&quot;invite_only&quot;:false,&quot;payments_state&quot;:&quot;enabled&quot;,&quot;language&quot;:null,&quot;explicit&quot;:false}},{&quot;id&quot;:316708,&quot;user_id&quot;:21731691,&quot;publication_id&quot;:392690,&quot;role&quot;:&quot;admin&quot;,&quot;public&quot;:true,&quot;is_primary&quot;:false,&quot;publication&quot;:{&quot;id&quot;:392690,&quot;name&quot;:&quot;Penny Stock Central &quot;,&quot;subdomain&quot;:&quot;stockquest&quot;,&quot;custom_domain&quot;:null,&quot;custom_domain_optional&quot;:false,&quot;hero_text&quot;:&quot;Stock Quest is becoming Penny Stock Central. This is to renew my own motivation to cover a very specific kind of trading that has the highest value to my audience. I will be covering micro cap and small-cap stocks, otherwise known as penny stocks.  &quot;,&quot;logo_url&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/49f58045-7931-49dc-85b0-2b281d63c4b2_171x171.png&quot;,&quot;author_id&quot;:21731691,&quot;theme_var_background_pop&quot;:&quot;#EA410B&quot;,&quot;created_at&quot;:&quot;2021-06-24T17:43:37.779Z&quot;,&quot;rss_website_url&quot;:null,&quot;email_from_name&quot;:null,&quot;copyright&quot;:&quot;Michael Spencer&quot;,&quot;founding_plan_name&quot;:&quot;Founding Member&quot;,&quot;community_enabled&quot;:true,&quot;invite_only&quot;:false,&quot;payments_state&quot;:&quot;enabled&quot;,&quot;language&quot;:null,&quot;explicit&quot;:false}},{&quot;id&quot;:319445,&quot;user_id&quot;:21731691,&quot;publication_id&quot;:395325,&quot;role&quot;:&quot;admin&quot;,&quot;public&quot;:true,&quot;is_primary&quot;:false,&quot;publication&quot;:{&quot;id&quot;:395325,&quot;name&quot;:&quot;Artificial Intelligence Survey &#129302;&#127974;&#129517;&quot;,&quot;subdomain&quot;:&quot;futuresin&quot;,&quot;custom_domain&quot;:null,&quot;custom_domain_optional&quot;:false,&quot;hero_text&quot;:&quot;Bite size curation of links to A.I. News, funding and trending topics from around the web. &quot;,&quot;logo_url&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/e519bec5-40b6-4892-9de0-865f77e668f8_230x230.png&quot;,&quot;author_id&quot;:21731691,&quot;theme_var_background_pop&quot;:&quot;#2096FF&quot;,&quot;created_at&quot;:&quot;2021-06-27T19:57:15.745Z&quot;,&quot;rss_website_url&quot;:null,&quot;email_from_name&quot;:null,&quot;copyright&quot;:&quot;Michael Spencer&quot;,&quot;founding_plan_name&quot;:&quot;Founding Member&quot;,&quot;community_enabled&quot;:true,&quot;invite_only&quot;:false,&quot;payments_state&quot;:&quot;enabled&quot;,&quot;language&quot;:null,&quot;explicit&quot;:false}},{&quot;id&quot;:321214,&quot;user_id&quot;:21731691,&quot;publication_id&quot;:397002,&quot;role&quot;:&quot;admin&quot;,&quot;public&quot;:true,&quot;is_primary&quot;:false,&quot;publication&quot;:{&quot;id&quot;:397002,&quot;name&quot;:&quot;Datascience Learning Center&quot;,&quot;subdomain&quot;:&quot;datasciencelearningcenter&quot;,&quot;custom_domain&quot;:null,&quot;custom_domain_optional&quot;:false,&quot;hero_text&quot;:&quot;Datascience, programming, datascience, future work, digital transformation, WFH trends and the future of coding. &quot;,&quot;logo_url&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/966bde96-aa76-4d37-ab91-a3ba0299eff1_406x406.png&quot;,&quot;author_id&quot;:21731691,&quot;theme_var_background_pop&quot;:&quot;#67BDFC&quot;,&quot;created_at&quot;:&quot;2021-06-29T20:22:07.141Z&quot;,&quot;rss_website_url&quot;:null,&quot;email_from_name&quot;:null,&quot;copyright&quot;:&quot;Michael Spencer&quot;,&quot;founding_plan_name&quot;:&quot;Founding Member&quot;,&quot;community_enabled&quot;:true,&quot;invite_only&quot;:false,&quot;payments_state&quot;:&quot;enabled&quot;,&quot;language&quot;:null,&quot;explicit&quot;:false}},{&quot;id&quot;:321230,&quot;user_id&quot;:21731691,&quot;publication_id&quot;:397016,&quot;role&quot;:&quot;admin&quot;,&quot;public&quot;:true,&quot;is_primary&quot;:false,&quot;publication&quot;:{&quot;id&quot;:397016,&quot;name&quot;:&quot;A.I. Startups &amp; Funding &quot;,&quot;subdomain&quot;:&quot;cryptobullsbears&quot;,&quot;custom_domain&quot;:null,&quot;custom_domain_optional&quot;:false,&quot;hero_text&quot;:&quot;A.I. Startups &amp; Funding will cover recent venture capital and funding rounds for A.I. startups all around the world. &quot;,&quot;logo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5c05c9c1-5eff-4106-844d-1fff41e3a8f9_617x617.png&quot;,&quot;author_id&quot;:21731691,&quot;theme_var_background_pop&quot;:&quot;#121BFA&quot;,&quot;created_at&quot;:&quot;2021-06-29T20:53:00.677Z&quot;,&quot;rss_website_url&quot;:null,&quot;email_from_name&quot;:null,&quot;copyright&quot;:&quot;Michael Spencer&quot;,&quot;founding_plan_name&quot;:&quot;Founding Member&quot;,&quot;community_enabled&quot;:true,&quot;invite_only&quot;:false,&quot;payments_state&quot;:&quot;enabled&quot;,&quot;language&quot;:null,&quot;explicit&quot;:false}},{&quot;id&quot;:321351,&quot;user_id&quot;:21731691,&quot;publication_id&quot;:397128,&quot;role&quot;:&quot;admin&quot;,&quot;public&quot;:true,&quot;is_primary&quot;:false,&quot;publication&quot;:{&quot;id&quot;:397128,&quot;name&quot;:&quot;OK, Robot&quot;,&quot;subdomain&quot;:&quot;firstfuturist&quot;,&quot;custom_domain&quot;:null,&quot;custom_domain_optional&quot;:false,&quot;hero_text&quot;:&quot;A Newsletter about the present and future of robotics, startups, applications and emerging technology. &quot;,&quot;logo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4f889377-e147-4d98-abb4-16acd781a1f9_489x489.png&quot;,&quot;author_id&quot;:21731691,&quot;theme_var_background_pop&quot;:&quot;#A33ACB&quot;,&quot;created_at&quot;:&quot;2021-06-29T23:31:49.731Z&quot;,&quot;rss_website_url&quot;:null,&quot;email_from_name&quot;:&quot;Michael Spencer of Space Academy &quot;,&quot;copyright&quot;:&quot;Michael Spencer&quot;,&quot;founding_plan_name&quot;:&quot;Founding Member&quot;,&quot;community_enabled&quot;:true,&quot;invite_only&quot;:false,&quot;payments_state&quot;:&quot;enabled&quot;,&quot;language&quot;:null,&quot;explicit&quot;:false}},{&quot;id&quot;:321532,&quot;user_id&quot;:21731691,&quot;publication_id&quot;:397300,&quot;role&quot;:&quot;admin&quot;,&quot;public&quot;:true,&quot;is_primary&quot;:false,&quot;publication&quot;:{&quot;id&quot;:397300,&quot;name&quot;:&quot;Quantum Foundry &quot;,&quot;subdomain&quot;:&quot;ipotimes&quot;,&quot;custom_domain&quot;:null,&quot;custom_domain_optional&quot;:false,&quot;hero_text&quot;:&quot;Quantum computing, IPOs, startups, future companies, business models, venture capital deals, research &amp; papers, global news coverage, etc...&quot;,&quot;logo_url&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/52907a7b-c016-4530-874c-e6e5da3a7340_168x168.png&quot;,&quot;author_id&quot;:21731691,&quot;theme_var_background_pop&quot;:&quot;#9A6600&quot;,&quot;created_at&quot;:&quot;2021-06-30T05:55:21.469Z&quot;,&quot;rss_website_url&quot;:null,&quot;email_from_name&quot;:null,&quot;copyright&quot;:&quot;Michael Spencer&quot;,&quot;founding_plan_name&quot;:&quot;Founding Member&quot;,&quot;community_enabled&quot;:true,&quot;invite_only&quot;:false,&quot;payments_state&quot;:&quot;enabled&quot;,&quot;language&quot;:null,&quot;explicit&quot;:false}},{&quot;id&quot;:323389,&quot;user_id&quot;:21731691,&quot;publication_id&quot;:399085,&quot;role&quot;:&quot;admin&quot;,&quot;public&quot;:true,&quot;is_primary&quot;:false,&quot;publication&quot;:{&quot;id&quot;:399085,&quot;name&quot;:&quot;A.I. Papers with Infographics&quot;,&quot;subdomain&quot;:&quot;chinasuperpowers&quot;,&quot;custom_domain&quot;:null,&quot;custom_domain_optional&quot;:false,&quot;hero_text&quot;:&quot;A.I. Papers with Infographics will go through some of the latest A.I. papers displaying key visual infographics about them and short explanations. &quot;,&quot;logo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/372198a7-ac4e-4ff4-a1f1-6f27390f4e1f_719x719.png&quot;,&quot;author_id&quot;:21731691,&quot;theme_var_background_pop&quot;:&quot;#6C0095&quot;,&quot;created_at&quot;:&quot;2021-07-02T02:45:54.225Z&quot;,&quot;rss_website_url&quot;:null,&quot;email_from_name&quot;:null,&quot;copyright&quot;:&quot;Michael Spencer&quot;,&quot;founding_plan_name&quot;:&quot;Founding Member&quot;,&quot;community_enabled&quot;:true,&quot;invite_only&quot;:false,&quot;payments_state&quot;:&quot;enabled&quot;,&quot;language&quot;:null,&quot;explicit&quot;:false}},{&quot;id&quot;:323428,&quot;user_id&quot;:21731691,&quot;publication_id&quot;:399124,&quot;role&quot;:&quot;admin&quot;,&quot;public&quot;:true,&quot;is_primary&quot;:false,&quot;publication&quot;:{&quot;id&quot;:399124,&quot;name&quot;:&quot;Creator Economy Tips &quot;,&quot;subdomain&quot;:&quot;basicincomeworld&quot;,&quot;custom_domain&quot;:null,&quot;custom_domain_optional&quot;:false,&quot;hero_text&quot;:&quot;Writing tips, creator economy, building Email lists, building an audience hacks. Substack Growth insights. \n&quot;,&quot;logo_url&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/f1e9b1cd-4933-46ea-abd3-0d9a66c344da_720x720.png&quot;,&quot;author_id&quot;:21731691,&quot;theme_var_background_pop&quot;:&quot;#00C2FF&quot;,&quot;created_at&quot;:&quot;2021-07-02T04:36:36.683Z&quot;,&quot;rss_website_url&quot;:null,&quot;email_from_name&quot;:null,&quot;copyright&quot;:&quot;Michael Spencer&quot;,&quot;founding_plan_name&quot;:&quot;Founding Member&quot;,&quot;community_enabled&quot;:true,&quot;invite_only&quot;:false,&quot;payments_state&quot;:&quot;enabled&quot;,&quot;language&quot;:null,&quot;explicit&quot;:false}},{&quot;id&quot;:500088,&quot;user_id&quot;:21731691,&quot;publication_id&quot;:569093,&quot;role&quot;:&quot;admin&quot;,&quot;public&quot;:true,&quot;is_primary&quot;:false,&quot;publication&quot;:{&quot;id&quot;:569093,&quot;name&quot;:&quot;Artificial Intelligence Learning &#129302;&#129504;&#129470;&quot;,&quot;subdomain&quot;:&quot;offthegridxp&quot;,&quot;custom_domain&quot;:null,&quot;custom_domain_optional&quot;:false,&quot;hero_text&quot;:&quot;I wanted a place to put some Artificial Intelligence definitions, what is, and how-to short articles to complement my A.I. coverage on A.I. Supremacy and A.I. Survey. &quot;,&quot;logo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0bf35ccb-94b4-4eac-a7b3-621a7d4f3198_326x326.png&quot;,&quot;author_id&quot;:21731691,&quot;theme_var_background_pop&quot;:&quot;#45D800&quot;,&quot;created_at&quot;:&quot;2021-11-15T20:08:43.092Z&quot;,&quot;rss_website_url&quot;:null,&quot;email_from_name&quot;:null,&quot;copyright&quot;:&quot;Michael Spencer&quot;,&quot;founding_plan_name&quot;:&quot;Founding Member&quot;,&quot;community_enabled&quot;:true,&quot;invite_only&quot;:false,&quot;payments_state&quot;:&quot;enabled&quot;,&quot;language&quot;:null,&quot;explicit&quot;:false}}],&quot;twitter_screen_name&quot;:&quot;AISupremacyNews&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:100},{&quot;id&quot;:3770805,&quot;name&quot;:&quot;Sahar Mor&quot;,&quot;handle&quot;:&quot;saharmor&quot;,&quot;previous_name&quot;:&quot;No Name&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa06b2072-0444-44f7-8106-7892097e4128_1690x1762.png&quot;,&quot;bio&quot;:&quot;Bringing the latest in AI to the mass through writings and Github repos&quot;,&quot;profile_set_up_at&quot;:&quot;2022-09-08T15:34:44.062Z&quot;,&quot;is_guest&quot;:true,&quot;bestseller_tier&quot;:100,&quot;primaryPublicationId&quot;:1079420,&quot;primaryPublicationName&quot;:&quot;AI Tidbits&quot;,&quot;primaryPublicationUrl&quot;:&quot;https://www.aitidbits.ai&quot;,&quot;primaryPublicationSubscribeUrl&quot;:&quot;https://www.aitidbits.ai/subscribe?&quot;}],&quot;utm_campaign&quot;:null,&quot;belowTheFold&quot;:false,&quot;type&quot;:&quot;newsletter&quot;,&quot;language&quot;:&quot;en&quot;,&quot;source&quot;:null}" data-component-name="EmbeddedPostToDOM"><a class="embedded-post" native="true" href="https://aisupremacy.substack.com/p/most-impactful-generative-ai-papers?utm_source=substack&amp;utm_campaign=post_embed&amp;utm_medium=web"><div class="embedded-post-header"><img class="embedded-post-publication-logo" src="https://substackcdn.com/image/fetch/$s_!mF83!,w_56,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc548f8c4-823b-4a2a-b499-528f9a84cb5c_215x215.png"><span class="embedded-post-publication-name">AI Supremacy </span></div><div class="embedded-post-title-wrapper"><div class="embedded-post-title">Most Impactful Generative AI Papers of 2023</div></div><div class="embedded-post-body">&#9757; Image Created: Sahar Mor, January, 2023. Hey Everyone, I&#8217;ve really enjoyed reading AI papers in 2023 more than ever, and I hope you have as well! One of the best people to follow on breaking news in Generative A.I. is actually Sahar Mor , his LinkedIn posts&#8230;</div><div class="embedded-post-cta-wrapper"><span class="embedded-post-cta">Read more</span></div><div class="embedded-post-meta">2 years ago &#183; 52 likes &#183; 9 comments &#183; Michael Spencer and Sahar Mor</div></a></div><div><hr></div><p>Over 1,100 curated announcements and papers were featured in AI Tidbits in 2023. Only a handful of them changed the trajectory of AI research in the years to come, powering the products we use daily.</p><p>From Meta&#8217;s Llama to Stanford&#8217;s ControlNet and Microsoft phi&#8212;listing the 48 papers just as 2024 ushers in a fresh wave of groundbreaking discoveries.</p><p>Defining "most impactful" is a nuanced task, so I adopted a blend of objective and subjective criteria for selection:</p><ul><li><p>Objective - the paper&#8217;s number of citations and GitHub repository stars</p></li><li><p>Subjective - papers I identified as having a significant influence across various modalities and applications</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wv-T!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46fb20cb-534b-45f1-8907-745083a474b9_4428x5298.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wv-T!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46fb20cb-534b-45f1-8907-745083a474b9_4428x5298.png 424w, https://substackcdn.com/image/fetch/$s_!wv-T!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46fb20cb-534b-45f1-8907-745083a474b9_4428x5298.png 848w, https://substackcdn.com/image/fetch/$s_!wv-T!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46fb20cb-534b-45f1-8907-745083a474b9_4428x5298.png 1272w, https://substackcdn.com/image/fetch/$s_!wv-T!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46fb20cb-534b-45f1-8907-745083a474b9_4428x5298.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wv-T!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46fb20cb-534b-45f1-8907-745083a474b9_4428x5298.png" width="667" height="798.0178571428571" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/46fb20cb-534b-45f1-8907-745083a474b9_4428x5298.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1742,&quot;width&quot;:1456,&quot;resizeWidth&quot;:667,&quot;bytes&quot;:3548407,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wv-T!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46fb20cb-534b-45f1-8907-745083a474b9_4428x5298.png 424w, https://substackcdn.com/image/fetch/$s_!wv-T!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46fb20cb-534b-45f1-8907-745083a474b9_4428x5298.png 848w, https://substackcdn.com/image/fetch/$s_!wv-T!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46fb20cb-534b-45f1-8907-745083a474b9_4428x5298.png 1272w, https://substackcdn.com/image/fetch/$s_!wv-T!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46fb20cb-534b-45f1-8907-745083a474b9_4428x5298.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>January</h1><ul><li><p><a href="https://arxiv.org/abs/2301.12597">BLIP-2, Salesforce</a> - a novel pre-training approach for vision-language tasks, outperforming what was back then the state-of-the-art like Flamingo 80B in efficiency and zero-shot performance with significantly fewer parameters and ushering a new wave of vision language models throughout 2023</p></li><li><p><a href="https://arxiv.org/abs/2211.09800">InstructPix2Pix, Berkely</a> - a conversational UI allowing image editing via textual prompts, enabling further research in image, and later video, editing using natural language</p></li><li><p><a href="https://google-research.github.io/seanet/musiclm/examples/">MusicLM, Google</a> - a transformer-based text-to-audio model capable of producing tracks of varying genres, instruments, and concepts with superior audio quality, piquing both professional and amateur musicians&#8217; imagination and serving as a bedrock for further generative audio research</p></li></ul><h2>February</h2><ul><li><p><a href="https://ai.meta.com/blog/large-language-model-llama-meta-ai/">LLaMA, Meta</a> - an open 65B-parameter LLM trained on 1.4 trillion tokens, outperforming larger state-of-the-art LLMs like GPT-3 and PaLM-540B on most benchmarks, enabling the likes of Alpaca and Vicuna and sending the open-source LLM community off to the races</p></li><li><p><a href="https://arxiv.org/abs/2302.05543">ControlNet, Stanford</a> - a groundbreaking architecture that robustly integrates spatial conditioning into text-to-image diffusion models, offering enhanced controllability and wide-ranging applicability</p></li><li><p><a href="https://arxiv.org/abs/2302.04761">Toolformer, Meta</a> - a language model capable of teaching itself when and which external tools such as calculators and Wikipedia to use to generate accurate answers</p></li><li><p><a href="https://arxiv.org/abs/2302.14045">KOSMOS-1, Microsoft</a> - a transformer-based multimodal LLM supporting a wide range of perception tasks such as image captioning, visual question answering, and zero-shot image classification</p></li></ul><div class="pullquote"><p>The state-of-the-art today compared to December 2022 across generative AI verticals. From LLMs to generative video, image, and audio - the generative AI space has leapfrogged in 2023 across commercial companies and the open-source community.</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;d047a5bf-7966-4cea-9d87-eaf9787fc8b2&quot;,&quot;caption&quot;:&quot;Note: \&quot;SOTA\&quot; stands for state-of-the-art, referring to the most advanced and effective models currently available in the field. Exactly a year ago, ChatGPT was one month old, Anthropic just released Claude, and Microsoft unveiled the first zero-shot model to clone someone&#8217;s voice. Long before Google Bard&#8217;s debut, Stanford&#8217;s inaugural autonomous agents pa&#8230;&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;AI Tidbits 2023 SOTA Report&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:3770805,&quot;name&quot;:&quot;Sahar Mor&quot;,&quot;bio&quot;:&quot;Bringing the latest in AI to the mass through writings and Github repos&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa06b2072-0444-44f7-8106-7892097e4128_1690x1762.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:100}],&quot;post_date&quot;:&quot;2023-12-28T16:00:54.496Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/22e5f043-5e0c-41b3-b685-b5c6c62806d1_2008x1130.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.aitidbits.ai/p/2023-sota-report&quot;,&quot;section_name&quot;:&quot;Deep Dives&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:140124891,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:28,&quot;comment_count&quot;:4,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;AI Tidbits&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F71d6ea06-1f4c-478d-b0f2-6227eede6b25_1280x1280.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div></div><h2>March</h2><ul><li><p><a href="https://crfm.stanford.edu/2023/03/13/alpaca.html">Alpaca, Stanford</a> - the first paper that mined instructions from large proprietary language models like ChatGPT to power smaller instruction-following language models</p></li><li><p><a href="https://palm-e.github.io/">PaLM-E, Google</a> - a 562B-parameter multimodal AI that uses visual data to enhance its language processing capabilities, achieving exceptional performance in both robotic applications and visual-language tasks</p></li><li><p><a href="https://arxiv.org/abs/2303.10130">GPTs are GPTs, OpenAI</a> - detailing the influence of LLMs on the American workforce, revealing its potential to affect 80% of American workers</p></li></ul><pre><code><code>Become a premium member to get full access to my content and $1k in free credits for leading AI tools and APIs. It&#8217;s common to expense the paid membership from your company&#8217;s learning and development education stipend.</code></code></pre><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.aitidbits.ai/subscribe&quot;,&quot;text&quot;:&quot;Upgrade to Premium&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.aitidbits.ai/subscribe"><span>Upgrade to Premium</span></a></p><h2>April</h2><ul><li><p><a href="https://arxiv.org/abs/2304.02643">Segment Anything, Meta</a> -&nbsp;the Segment Anything project, which includes both the Segment Anything Model (SAM) and SA-1B dataset, revolutionized image segmentation and was the bedrock to further novel papers like EdgeSam, MobileSAM, and EfficientSAM</p></li><li><p><a href="https://arxiv.org/abs/2304.03442v2">Generative Agents, Stanford</a> - using language models to power autonomous agents and simulate realistic human behavior, leading to the creation of AutoGPT, BabyAGI, and many agent-powered applications</p></li><li><p><a href="https://arxiv.org/abs/2304.01373">Pythia, EleutherAI</a> - one of the first fully open-source family of language models, Pythia is a suite of 16 LLMs from 70M to 12B parameters, facilitating research into model training dynamics, bias reduction, and performance</p></li><li><p><a href="https://lmsys.org/blog/2023-03-30-vicuna/">Vicuna, LMSYS</a> - Vicuna-13B is a LLaMA fine-tuned model that achieved outstanding performance, reportedly competitive with ChatGPT and Bard, at a training cost of just $300</p></li><li><p><a href="https://llava-vl.github.io/?utm_source=aitidbits.substack.com&amp;utm_medium=newsletter">LLaVA, Microsoft</a> - one of the first capable large multimodal models and the seed for further innovative research like LLaVA-1.5 and Qwen</p></li></ul><h2>May</h2><ul><li><p><a href="https://arxiv.org/abs/2305.14314">QL</a><a href="https://arxiv.org/abs/2305.14314https://arxiv.org/abs/2305.14314">oRA, University of Washington</a> - QLoRA fine-tunes large language models with reduced memory usage, resulting in high-performing models that are extremely efficient</p></li><li><p><a href="https://arxiv.org/abs/2305.10973">DragGAN, Max Planck Institute</a> - a new model that took the internet by storm with its ability to manipulate the pose, shape, expression, and layout of any image</p></li><li><p><a href="https://arxiv.org/abs/2305.18290">Direct Preference Optimization (DPO), Stanford</a> -  a new algorithm for fine-tuning language models to align with human preferences, surpassing traditional complex RLHF methods</p></li><li><p><a href="https://gorilla.cs.berkeley.edu/?utm_source=aitidbits.substack.com&amp;utm_medium=newsletter">Gorilla, Berkeley</a> - a fine-tuned LLaMA model explicitly designed for API calls, surpassing the performance of GPT-4 in writing API calls and powering further research such as ToolLLM and AutoGen</p></li><li><p><a href="https://voyager.minedojo.org/?utm_source=aitidbits.substack.com&amp;utm_medium=newsletter">Voyager, Nvidia</a> - an LLM-powered Minecraft agent that made the rounds thanks to its innovative approach of learning skills by generating code routines, later storing them in a database and retrieving them when needed</p></li><li><p><a href="https://arxiv.org/abs/2305.09617?utm_source=aitidbits.substack.com&amp;utm_medium=newsletter">Med-PaLM 2, Google</a> - Med-PaLM 2 was the first LLM to perform at an expert test-taker level on the MedQA, reaching 85%+ accuracy. It was also the first AI system to reach a passing score on the MedMCQA dataset, scoring 72.3%</p></li></ul><h2>June</h2><ul><li><p><a href="https://audiocraft.metademolab.com/musicgen.html">MusicGen, Meta</a> - a simple and controllable model for music generation using text prompts and input melodies. Also, one of the first audio models to be widely used by musicians for producing music.</p></li><li><p><a href="https://arxiv.org/abs/2306.11644?utm_source=aitidbits.substack.com&amp;utm_medium=newsletter">Textbooks are All You Need (phi-1), Microsoft</a> - this paper proved (again) that high-quality data can enable smaller models to punch above their weight and outperform larger LMs. Since phi-1, Microsoft released phi-1.5 and phi-2 - a 2.7B model rivaling models up to 25x larger.</p></li><li><p><a href="https://arxiv.org/abs/2304.12244">WizardLM, Microsoft</a> - one of the first papers to utilize LLM-generated instruction over human-generated ones, resulting in multiple powerful models: WizardLM, WizardCoder, and Wizard Math</p></li><li><p><a href="https://arxiv.org/abs/2305.06500">InstructBLIP, Salesforce</a> - one of the first papers to explore vision-language instruction tuning, achieving SOTA zero-shot performance across diverse tasks</p></li><li><p><a href="https://arxiv.org/abs/2303.16199">LLaMA-Adapter, Shanghai AI Laboratory</a> - the first efficient fine-tuning method for LLaMA-based models, adding minimal parameters to create high-quality instruction-following models in under an hour and the stem to further innovative papers like LLaMA2-Accessory, LLaMA-Adapter V2 multimodal, and ImageBind-LLM</p></li></ul><h2>July</h2><ul><li><p><a href="https://ai.meta.com/llama/?utm_source=aitidbits.substack.com&amp;utm_medium=newsletter">Llama 2, Meta</a> - Meta&#8217;s first commercially permissive LLM enabled a host of new SOTA models from Code Llama to OpenHathi (the first Hindi LLM)</p></li><li><p><a href="https://deepmind.google/discover/blog/rt-2-new-model-translates-vision-and-language-into-action/">Robotic Transformer 2 (RT-2), Google</a> - a groundbreaking vision-language-action (VLA) model that combines web and robotics data to provide generalized instructions for robotic control, cited by subsequent novel papers like Robogen and JARVIS-1</p></li></ul><h2>August</h2><ul><li><p><a href="https://about.fb.com/news/2023/08/code-llama-ai-for-coding/">Code Llama, Meta</a> - a commercially permissible SOTA model built on top of Llama 2, fine-tuned for generating and discussing code</p></li><li><p><a href="https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/?utm_source=aitidbits.substack.com&amp;utm_medium=newsletter">3D Gaussian Splatting, Inria</a> - Radiance Field methods have revolutionized novel-view synthesis of scenes in 2023 and this paper from Inria was the first one achieving state-of-the-art visual quality while maintaining competitive training times and allow real-time generation at 1080p resolution</p></li><li><p><a href="https://arxiv.org/abs/2308.06571">ModelScope, Alibaba</a> - a powerful and commercially permissible text-to-video model that outperformed existing models with only 1.7B parameters</p></li><li><p><a href="https://arxiv.org/abs/2308.01390?utm_source=aitidbits.substack.com&amp;utm_medium=newsletter">OpenFlamingo, University of Washington</a> - an open-source alternative to DeepMind's multimodal Flamingo, achieving up to 89% of its vision-language performance</p></li></ul><h2>September</h2><ul><li><p><a href="https://mistral.ai/news/announcing-mistral-7b/?utm_source=aitidbits.substack.com&amp;utm_medium=newsletter">Mistral 7B, Mistral</a> - a fully open-source model that outperformed all available open-source models up to 13B parameters. Mistral 7B&#8217;s edge is its efficiency, delivering strong performance with less computational demand than larger LMs, powering a suite of SOTA LLMs such as Solar 10.7B, Zephyr, OpenHermes, and even the multimodal Nous-Hermes-2-Vision</p></li><li><p><a href="https://arxiv.org/abs/2309.10668?utm_source=aitidbits.substack.com&amp;utm_medium=newsletter">Language Modeling Is Compression, DeepMind</a> - discovering that LLMs are powerful lossless compressors, outperforming domain-specific counterparts (PNG, gzip, FLAC) in compressing data</p></li><li><p><a href="https://arxiv.org/abs/2309.12307?utm_source=aitidbits.substack.com&amp;utm_medium=newsletter">LongLoRA, MIT</a> - leveraging LoRA to extend LLMs' context window, demonstrating significant computational savings with strong performance on various tasks and models</p></li><li><p><a href="https://arxiv.org/abs/2309.16609">Qwen, Alibaba</a> -  this suite of powerful multilingual LLMs, including versatile base models and specialized chat models in coding and mathematics, substantially outperformed Llama 2</p></li><li><p><a href="https://arxiv.org/abs/2309.03409">Large Language Models as Optimizers (OPRO), DeepMind</a> - OPRO took the field of prompt engineering to the next level by leveraging LLMs to create prompts, outperforming human-designed prompts</p></li></ul><h2>October</h2><ul><li><p><a href="https://arxiv.org/abs/2310.05344">SteerLM, Nvidia</a> - a technique that enables real-time customization of LLMs during inference, addressing the two major limitations of RLHF: (1) complex training setup and (2) static values that end users cannot control at run-time</p></li><li><p><a href="https://arxiv.org/abs/2310.16944">Zephyr, Hugging Face</a> - a series of Mistral-based chat models with comparable performance to Anthropic's Claude 2 on AlpacaEval. The key innovation is distilled supervised fine-tuning, a method involving using the output from a larger, more capable &#8216;teacher&#8217; model to train a smaller &#8216;student&#8217; model</p></li><li><p><a href="https://minigpt-v2.github.io/?utm_source=aitidbits.substack.com&amp;utm_medium=newsletter">MiniGPT-v2, KAUST</a> - a unified interface for diverse vision-language tasks, enhancing image description, visual question answering, and visual grounding</p></li><li><p><a href="https://arxiv.org/abs/2310.11441">Set-of-Mark (SoM), Microsoft</a> - a technique that boosts the performance of multimodal models like GPT-4V by segmenting objects in an image before passing it to the model</p></li></ul><h2>November</h2><ul><li><p><a href="https://arxiv.org/abs/2311.11045">Orca + Orca 2, Microsoft</a> - Orca was one of the first to create tailored and high-quality synthetic data to equip smaller LMs with enhanced reasoning abilities, typically found only in much larger models. Orca progressively learns from complex explanation traces of GPT-4, with Orca-2 teaching small language models how to reason by employing advanced training methods and achieving reasoning levels comparable to models 5-10 times its size</p></li><li><p><a href="https://arxiv.org/abs/2311.03079">CogVLM, Tsinghua University</a> - a novel model combining vision and language features, incorporating a trainable visual expert module, and achieving state-of-the-art performance across various multimodal benchmarks, rivaling larger models like PaLI-X 55B</p></li><li><p><a href="https://arxiv.org/abs/2311.16452?utm_source=aitidbits.substack.com&amp;utm_medium=newsletter">Medprompt, Microsoft</a> - demonstrating how novel prompt engineering strategies can enable generalized models like GPT-4 to outperform specialist models like Med-PaLM 2 with significant efficiency gains</p></li><li><p><a href="https://arxiv.org/abs/2311.05556">Latent Consistency Models-LoRA, </a><a href="https://arxiv.org/abs/2311.03079">Tsinghua University</a> - a distillation method that enables real-time image generation with Stable Diffusion</p></li></ul><h2>December</h2><ul><li><p><a href="https://arxiv.org/abs/2312.11514?utm_source=aitidbits.substack.com&amp;utm_medium=newsletter">LLM in a flash, Apple</a> - on-device LLMs hold a big promise for AI and personal agents. Apple achieved up to 25x faster LLM inference thanks to dynamic storage and memory management, windowing for efficient data transfer, and a unique memory allocation algorithm.</p></li><li><p><a href="https://blog.research.google/2023/12/videopoet-large-language-model-for-zero.html?utm_source=aitidbits.substack.com&amp;utm_medium=newsletter">VideoPoet, Google</a> - an innovative LLM that excels in diverse video generation tasks supporting multimodal inputs, diverse motion, and style generation, various video orientations, audio generation, and integration with other modalities</p></li><li><p><a href="https://deepmind.google/discover/blog/funsearch-making-new-discoveries-in-mathematical-sciences-using-large-language-models">FunSearch, DeepMind</a> - in the pursuit of an AI capable of scientific breakthroughs, FunSearch represents a pioneering stride. It suggested a correct and previously unknown solution to the cap set problem and showcased the power of LLMs for making discoveries in mathematical and computer sciences.</p></li><li><p><a href="https://showlab.github.io/magicanimate/?utm_source=aitidbits.substack.com&amp;utm_medium=newsletter">MagicAnimate, National University of Singapore</a> - MagicAnimate is considered a step change in human image animation thanks to its improved quality over previous models. It is still far from an indistinguishable generated video, but its diffusion framework lays the foundation for progress in this space in 2024</p></li></ul><p></p><p><strong>Recent Deep Dives</strong></p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;99546385-9be0-44dd-b0ac-473831b6de2a&quot;,&quot;caption&quot;:&quot;Welcome to Deep Dives - an AI Tidbits section providing editorial takes and insights to make sense of the latest in AI. Over ten papers outlining novel prompting techniques were published in the last few months alone. While our X and LinkedIn feeds buzz with countless secret prompting tips &#8220;97% of ChatGPT users don&#8217;t know about&#8221;, a definitive, research-backed guide aggregating these advanced prompting strategies is hard to come by. This gap prevents LLM developers and everyday users from harnessing these novel frameworks to enhance performance and achieve more accurate results.&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Harnessing research-backed prompting techniques for enhanced LLM performance&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:3770805,&quot;name&quot;:&quot;Sahar Mor&quot;,&quot;bio&quot;:&quot;Bringing the latest in AI to the mass through writings and Github repos&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa06b2072-0444-44f7-8106-7892097e4128_1690x1762.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:100}],&quot;post_date&quot;:&quot;2023-12-10T16:00:41.722Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7ccf1c5f-bca1-40ef-be43-2a7ec84c2f40_2014x1132.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.aitidbits.ai/p/advanced-prompting&quot;,&quot;section_name&quot;:&quot;Deep Dives&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:139449913,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:33,&quot;comment_count&quot;:0,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;AI Tidbits&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F71d6ea06-1f4c-478d-b0f2-6227eede6b25_1280x1280.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;f4e43924-8570-46c8-a898-e0a43bdfb66d&quot;,&quot;caption&quot;:&quot;Welcome to Deep Dives - a dedicated AI Tidbits section providing editorial takes and insights to make sense of the latest in AI.&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Most popular and upcoming Generative AI tools and APIs&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:3770805,&quot;name&quot;:&quot;Sahar Mor&quot;,&quot;bio&quot;:&quot;Bringing the latest in AI to the mass through writings and Github repos&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa06b2072-0444-44f7-8106-7892097e4128_1690x1762.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:100}],&quot;post_date&quot;:&quot;2023-12-19T15:30:19.597Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52307a3c-6727-4ca5-a4da-208969e7b833_1944x1090.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.aitidbits.ai/p/most-used-tools&quot;,&quot;section_name&quot;:&quot;Deep Dives&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:139821359,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:18,&quot;comment_count&quot;:4,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;AI Tidbits&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F71d6ea06-1f4c-478d-b0f2-6227eede6b25_1280x1280.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p></p><p><em>If you find AI Tidbits valuable, share it with a friend and consider showing your support.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.aitidbits.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.aitidbits.ai/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item><item><title><![CDATA[AI Tidbits 2023 SOTA Report]]></title><description><![CDATA[Looking back at 2023's advancements to gauge how far we've come since 2022]]></description><link>https://www.aitidbits.ai/p/2023-sota-report</link><guid isPermaLink="false">https://www.aitidbits.ai/p/2023-sota-report</guid><dc:creator><![CDATA[Sahar Mor]]></dc:creator><pubDate>Thu, 28 Dec 2023 16:00:54 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/22e5f043-5e0c-41b3-b685-b5c6c62806d1_2008x1130.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Note: "SOTA" stands for state-of-the-art, referring to the most advanced and effective models currently available in the field.</em></p><p>Exactly a year ago, ChatGPT was one month old, Anthropic just released Claude, and Microsoft unveiled the first zero-shot model to clone someone&#8217;s voice. Long before Google Bard&#8217;s debut, Stanford&#8217;s inaugural autonomous agents paper, and the incorporation of the video-generating startup Pika Labs.</p><p>In 2023 alone, more than 1,100 curated announcements and papers were featured in AI Tidbits. It is hard to keep up with such mind-boggling progress when hundreds of innovative papers are published every single week, yet it is easy to forget the leapfrog progress the AI community has realized in just one year.</p><p>Just before we yell at ChatGPT once again as it got one detail wrong, let&#8217;s review the state-of-the-art today compared to December 2022 across different generative AI verticals.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!q-1w!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F557f0c51-c676-4dfd-a948-49e66a2603ed_2008x1130.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!q-1w!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F557f0c51-c676-4dfd-a948-49e66a2603ed_2008x1130.jpeg 424w, https://substackcdn.com/image/fetch/$s_!q-1w!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F557f0c51-c676-4dfd-a948-49e66a2603ed_2008x1130.jpeg 848w, https://substackcdn.com/image/fetch/$s_!q-1w!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F557f0c51-c676-4dfd-a948-49e66a2603ed_2008x1130.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!q-1w!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F557f0c51-c676-4dfd-a948-49e66a2603ed_2008x1130.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!q-1w!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F557f0c51-c676-4dfd-a948-49e66a2603ed_2008x1130.jpeg" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/557f0c51-c676-4dfd-a948-49e66a2603ed_2008x1130.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;No alt text provided for this image&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="No alt text provided for this image" title="No alt text provided for this image" srcset="https://substackcdn.com/image/fetch/$s_!q-1w!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F557f0c51-c676-4dfd-a948-49e66a2603ed_2008x1130.jpeg 424w, https://substackcdn.com/image/fetch/$s_!q-1w!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F557f0c51-c676-4dfd-a948-49e66a2603ed_2008x1130.jpeg 848w, https://substackcdn.com/image/fetch/$s_!q-1w!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F557f0c51-c676-4dfd-a948-49e66a2603ed_2008x1130.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!q-1w!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F557f0c51-c676-4dfd-a948-49e66a2603ed_2008x1130.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;9b6a068a-1121-4b53-8fce-75d503ffbce3&quot;,&quot;caption&quot;:&quot;The papers that shaped AI research and industry in 2023 and beyond, from Meta's LLaMA to Stanford's ControlNet and Microsoft Orca.&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;md&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;2023 Most Impactful Generative AI Papers&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:3770805,&quot;name&quot;:&quot;Sahar Mor&quot;,&quot;bio&quot;:&quot;Bringing the latest in AI to the mass through writings and Github repos&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa06b2072-0444-44f7-8106-7892097e4128_1690x1762.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:100}],&quot;post_date&quot;:&quot;2024-01-13T03:24:06.678Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46fb20cb-534b-45f1-8907-745083a474b9_4428x5298.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.aitidbits.ai/p/2023-impactful-papers&quot;,&quot;section_name&quot;:&quot;Deep Dives&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:140306457,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:0,&quot;comment_count&quot;:0,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;AI Tidbits&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F71d6ea06-1f4c-478d-b0f2-6227eede6b25_1280x1280.png&quot;,&quot;belowTheFold&quot;:false,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><h2>Language Models</h2><h3><strong>Open-source models</strong></h3><p>2023 was the year open-source language models started catching up, with Yi's 200k context window and Mistral's Mixture of Experts outperforming GPT-3.5, which was the SOTA just earlier this year.</p><p>Commercially permissive models with groundbreaking architectures such as Llama 2 gave birth to advanced models like Code Llama and whole new model families such as Vicuna. Small language models also had their moment, yielding dozens of tokens per second on consumer-grade devices.</p><p>Despite the significant advancements in open-source models in 2023, the evaluation of language models has yet to achieve consensus, and the AI community&#8217;s trust in researchers&#8217; published benchmarks is at an all-time low.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pvsY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99dea2e6-793f-4ff2-a2aa-bb7cb3dec5c7_1180x438.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pvsY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99dea2e6-793f-4ff2-a2aa-bb7cb3dec5c7_1180x438.png 424w, https://substackcdn.com/image/fetch/$s_!pvsY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99dea2e6-793f-4ff2-a2aa-bb7cb3dec5c7_1180x438.png 848w, https://substackcdn.com/image/fetch/$s_!pvsY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99dea2e6-793f-4ff2-a2aa-bb7cb3dec5c7_1180x438.png 1272w, https://substackcdn.com/image/fetch/$s_!pvsY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99dea2e6-793f-4ff2-a2aa-bb7cb3dec5c7_1180x438.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pvsY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99dea2e6-793f-4ff2-a2aa-bb7cb3dec5c7_1180x438.png" width="588" height="218.25762711864408" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/99dea2e6-793f-4ff2-a2aa-bb7cb3dec5c7_1180x438.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:438,&quot;width&quot;:1180,&quot;resizeWidth&quot;:588,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pvsY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99dea2e6-793f-4ff2-a2aa-bb7cb3dec5c7_1180x438.png 424w, https://substackcdn.com/image/fetch/$s_!pvsY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99dea2e6-793f-4ff2-a2aa-bb7cb3dec5c7_1180x438.png 848w, https://substackcdn.com/image/fetch/$s_!pvsY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99dea2e6-793f-4ff2-a2aa-bb7cb3dec5c7_1180x438.png 1272w, https://substackcdn.com/image/fetch/$s_!pvsY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99dea2e6-793f-4ff2-a2aa-bb7cb3dec5c7_1180x438.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption"><a href="https://twitter.com/karpathy/status/1737544497016578453">Andrej Karpathy on X</a></figcaption></figure></div><p>Our report therefore only includes community-vetted LLMs or ones I&#8217;ve tinkered with and can vouch for. You can find the full list of almost 1,000 open-source language models on <a href="https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard">Hugging Face&#8217;s Open LLM Leaderboard</a>.</p><p>As for the benchmarks I&#8217;ve considered as the performance proxy, I chose <a href="https://arxiv.org/abs/2009.03300">MMLU</a> for its wide-ranging evaluation of language understanding across various domains and <a href="https://arxiv.org/abs/2109.07958">TruthfulQA</a> to assess the AI's ability to provide factual and reliable information, ensuring a comprehensive analysis of both depth and accuracy.</p><h4>SOTA models &lt;=7B parameters</h4><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FoD-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0abe7aeb-312c-445e-84f1-d62d079f9d4f_2052x1146.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FoD-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0abe7aeb-312c-445e-84f1-d62d079f9d4f_2052x1146.png 424w, https://substackcdn.com/image/fetch/$s_!FoD-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0abe7aeb-312c-445e-84f1-d62d079f9d4f_2052x1146.png 848w, https://substackcdn.com/image/fetch/$s_!FoD-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0abe7aeb-312c-445e-84f1-d62d079f9d4f_2052x1146.png 1272w, https://substackcdn.com/image/fetch/$s_!FoD-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0abe7aeb-312c-445e-84f1-d62d079f9d4f_2052x1146.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FoD-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0abe7aeb-312c-445e-84f1-d62d079f9d4f_2052x1146.png" width="566" height="316.0425824175824" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0abe7aeb-312c-445e-84f1-d62d079f9d4f_2052x1146.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:813,&quot;width&quot;:1456,&quot;resizeWidth&quot;:566,&quot;bytes&quot;:1326299,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FoD-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0abe7aeb-312c-445e-84f1-d62d079f9d4f_2052x1146.png 424w, https://substackcdn.com/image/fetch/$s_!FoD-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0abe7aeb-312c-445e-84f1-d62d079f9d4f_2052x1146.png 848w, https://substackcdn.com/image/fetch/$s_!FoD-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0abe7aeb-312c-445e-84f1-d62d079f9d4f_2052x1146.png 1272w, https://substackcdn.com/image/fetch/$s_!FoD-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0abe7aeb-312c-445e-84f1-d62d079f9d4f_2052x1146.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><a href="https://huggingface.co/EleutherAI/gpt-j-6b">GPT-J</a> and <a href="https://deci.ai/blog/introducing-decilm-7b-the-fastest-and-most-accurate-7b-large-language-model-to-date/">DeciLM</a> as SOTA for 2022 and 2023, respectively</figcaption></figure></div><h4>SOTA models 7B &#8594; 40B parameters</h4><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5Gw7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb86e00f2-3378-45f8-8c4c-149c2291273b_1716x960.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5Gw7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb86e00f2-3378-45f8-8c4c-149c2291273b_1716x960.png 424w, https://substackcdn.com/image/fetch/$s_!5Gw7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb86e00f2-3378-45f8-8c4c-149c2291273b_1716x960.png 848w, https://substackcdn.com/image/fetch/$s_!5Gw7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb86e00f2-3378-45f8-8c4c-149c2291273b_1716x960.png 1272w, https://substackcdn.com/image/fetch/$s_!5Gw7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb86e00f2-3378-45f8-8c4c-149c2291273b_1716x960.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5Gw7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb86e00f2-3378-45f8-8c4c-149c2291273b_1716x960.png" width="564" height="315.70054945054943" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b86e00f2-3378-45f8-8c4c-149c2291273b_1716x960.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:815,&quot;width&quot;:1456,&quot;resizeWidth&quot;:564,&quot;bytes&quot;:976399,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5Gw7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb86e00f2-3378-45f8-8c4c-149c2291273b_1716x960.png 424w, https://substackcdn.com/image/fetch/$s_!5Gw7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb86e00f2-3378-45f8-8c4c-149c2291273b_1716x960.png 848w, https://substackcdn.com/image/fetch/$s_!5Gw7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb86e00f2-3378-45f8-8c4c-149c2291273b_1716x960.png 1272w, https://substackcdn.com/image/fetch/$s_!5Gw7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb86e00f2-3378-45f8-8c4c-149c2291273b_1716x960.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><a href="https://huggingface.co/google/flan-t5-xxl">Flan-T5-xxl</a> and <a href="https://huggingface.co/01-ai/Yi-34B-200K">Yi-34B-200K</a> as SOTA for 2022 and 2023, respectively</figcaption></figure></div><h4>SOTA models &gt;40B parameters</h4><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gPuC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F432f4ef4-2f45-49c7-99c3-395859f230e3_1724x966.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gPuC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F432f4ef4-2f45-49c7-99c3-395859f230e3_1724x966.png 424w, https://substackcdn.com/image/fetch/$s_!gPuC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F432f4ef4-2f45-49c7-99c3-395859f230e3_1724x966.png 848w, https://substackcdn.com/image/fetch/$s_!gPuC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F432f4ef4-2f45-49c7-99c3-395859f230e3_1724x966.png 1272w, https://substackcdn.com/image/fetch/$s_!gPuC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F432f4ef4-2f45-49c7-99c3-395859f230e3_1724x966.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gPuC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F432f4ef4-2f45-49c7-99c3-395859f230e3_1724x966.png" width="570" height="319.45054945054943" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/432f4ef4-2f45-49c7-99c3-395859f230e3_1724x966.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:816,&quot;width&quot;:1456,&quot;resizeWidth&quot;:570,&quot;bytes&quot;:994790,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!gPuC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F432f4ef4-2f45-49c7-99c3-395859f230e3_1724x966.png 424w, https://substackcdn.com/image/fetch/$s_!gPuC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F432f4ef4-2f45-49c7-99c3-395859f230e3_1724x966.png 848w, https://substackcdn.com/image/fetch/$s_!gPuC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F432f4ef4-2f45-49c7-99c3-395859f230e3_1724x966.png 1272w, https://substackcdn.com/image/fetch/$s_!gPuC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F432f4ef4-2f45-49c7-99c3-395859f230e3_1724x966.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><a href="https://huggingface.co/bigscience/bloom">BLOOM</a> and <a href="https://huggingface.co/Qwen/Qwen-72B">Qwen 72B</a> as SOTA for 2022 and 2023, respectively</figcaption></figure></div><p>Mistral&#8217;s recently released <a href="https://mistral.ai/news/mixtral-of-experts/">Mixtral 8x7B model</a> deserves a special shout-out. It is a Mixture of Experts (MoE) model that outperforms Llama 2 70B and GPT-3.5 on most benchmarks with 6x faster inference. It allows commercial use and judging on its predecessor, Mistral 7B, we can expect a host of Mixtral-based state-of-the-art language models coming in the next few months.</p><p>While the language models mentioned above are the state-of-the-art ones today, there are further noteworthy models released in 2023 that paved the way for these models to emerge: <a href="https://huggingface.co/google/flan-ul2">Flan-UL2</a>, <a href="https://ai.meta.com/llama/">Llama 2</a>, <a href="https://www.mosaicml.com/mpt">MosaicML MPT</a>, <a href="https://falconllm.tii.ae/">Falcon</a>, <a href="https://lmsys.org/blog/2023-03-30-vicuna/">Vicuna</a>, <a href="https://www.databricks.com/blog/2023/04/12/dolly-first-open-commercially-viable-instruction-tuned-llm">Dolly</a>, <a href="https://huggingface.co/microsoft/phi-1_5">phi 1.5</a>, <a href="https://huggingface.co/microsoft/Orca-2-13b">Orca 2</a>, and <a href="https://github.com/nlpxucan/WizardLM">WizardLM</a>.</p><p>Just listing these models made me realize what a monumental year 2023 was for open-source language models, and we are scratching the surface of what's possible&#8211;most of these models are base models, i.e. they can be further fine-tuned for specific tasks to boost accuracy and achieve a smaller model size.</p><h3><strong>Commercial models</strong></h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iCbu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4965d9bf-6463-473e-96da-7c6deee09f5a_1722x966.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iCbu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4965d9bf-6463-473e-96da-7c6deee09f5a_1722x966.png 424w, https://substackcdn.com/image/fetch/$s_!iCbu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4965d9bf-6463-473e-96da-7c6deee09f5a_1722x966.png 848w, https://substackcdn.com/image/fetch/$s_!iCbu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4965d9bf-6463-473e-96da-7c6deee09f5a_1722x966.png 1272w, https://substackcdn.com/image/fetch/$s_!iCbu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4965d9bf-6463-473e-96da-7c6deee09f5a_1722x966.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iCbu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4965d9bf-6463-473e-96da-7c6deee09f5a_1722x966.png" width="582" height="326.57554945054943" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4965d9bf-6463-473e-96da-7c6deee09f5a_1722x966.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:817,&quot;width&quot;:1456,&quot;resizeWidth&quot;:582,&quot;bytes&quot;:1014033,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!iCbu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4965d9bf-6463-473e-96da-7c6deee09f5a_1722x966.png 424w, https://substackcdn.com/image/fetch/$s_!iCbu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4965d9bf-6463-473e-96da-7c6deee09f5a_1722x966.png 848w, https://substackcdn.com/image/fetch/$s_!iCbu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4965d9bf-6463-473e-96da-7c6deee09f5a_1722x966.png 1272w, https://substackcdn.com/image/fetch/$s_!iCbu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4965d9bf-6463-473e-96da-7c6deee09f5a_1722x966.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Surprisingly enough, GPT remained SOTA throughout 2023, with Gemini Ultra emerging as the only potential contender a few weeks ago, claiming to outperform GPT-4 across several benchmarks. However, given the divide between Gemini Pro's proclaimed and actual capabilities, such supremacy remains doubtful.</p><p>Anthropic also made great progress with its release of Claude 2.1, featuring the largest context window for proprietary LLMs (200k) at a substantially lower price than GPT-4.</p><pre><code>Become a premium member to get full access to my content and $1k in free credits for leading AI tools and APIs. It&#8217;s common to <a href="http://aitidbits.ai/expense">expense</a> the paid membership from your company&#8217;s learning and development education stipend.</code></pre><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.aitidbits.ai/subscribe&quot;,&quot;text&quot;:&quot;Upgrade to Premium&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.aitidbits.ai/subscribe"><span>Upgrade to Premium</span></a></p><h2>Multimodal AI</h2><p>Multimodal AI refers to AI systems that can process and interpret multiple forms of data input, such as text, images, and audio, simultaneously or in an integrated manner, very much like humans.</p><p>Models specialized for a single modality, such as images or text, are limited in their capabilities and require much more training data. This contrasts with human learning, in which people learn much more efficiently thanks to different kinds of sensory inputs.</p><p>Progress in this space is paramount because it enables more sophisticated, efficient, and accurate AI applications and powers the next generation of <a href="https://hu.ma.ne/">wearables</a> and embodied robotics.</p><p><strong>2022</strong> was relatively quiet for multimodal AI. The only major release was Meta&#8217;s <a href="https://ai.meta.com/blog/ai-self-supervised-learning-data2vec/">data2vec</a>, a model operating across multiple modalities, including speech, images, and text.</p><p><strong>2023</strong> saw several breakthrough models and frameworks, with <a href="https://github.com/THUDM/CogVLM">CogVLM</a> crowned as the SOTA open-source model.</p><p>CogVLM outperformed all previous models, such as LLaVA 1.5, an open-source chatbot alternative to ChatGPT that can converse using images and text, and Adept&#8217;s Fuyu, capable of understanding charts, documents, and interfaces. Meta also made significant contributions through its <a href="https://ai.meta.com/blog/imagebind-six-modalities-binding-ai/">ImageBind</a> and <a href="https://arxiv.org/abs/2309.16058?utm_source=aitidbits.substack.com&amp;utm_medium=newsletter">AnyMAL</a> papers.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xxlQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb55a60f-3da4-48a9-8ce3-de9b39c9f85b_1250x926.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xxlQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb55a60f-3da4-48a9-8ce3-de9b39c9f85b_1250x926.png 424w, https://substackcdn.com/image/fetch/$s_!xxlQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb55a60f-3da4-48a9-8ce3-de9b39c9f85b_1250x926.png 848w, https://substackcdn.com/image/fetch/$s_!xxlQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb55a60f-3da4-48a9-8ce3-de9b39c9f85b_1250x926.png 1272w, https://substackcdn.com/image/fetch/$s_!xxlQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb55a60f-3da4-48a9-8ce3-de9b39c9f85b_1250x926.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xxlQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb55a60f-3da4-48a9-8ce3-de9b39c9f85b_1250x926.png" width="632" height="468.1856" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cb55a60f-3da4-48a9-8ce3-de9b39c9f85b_1250x926.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:926,&quot;width&quot;:1250,&quot;resizeWidth&quot;:632,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xxlQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb55a60f-3da4-48a9-8ce3-de9b39c9f85b_1250x926.png 424w, https://substackcdn.com/image/fetch/$s_!xxlQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb55a60f-3da4-48a9-8ce3-de9b39c9f85b_1250x926.png 848w, https://substackcdn.com/image/fetch/$s_!xxlQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb55a60f-3da4-48a9-8ce3-de9b39c9f85b_1250x926.png 1272w, https://substackcdn.com/image/fetch/$s_!xxlQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb55a60f-3da4-48a9-8ce3-de9b39c9f85b_1250x926.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">CogVLM understands and answers various types of questions</figcaption></figure></div><p>On the proprietary front, GPT-4V(ision) is the current industry leader, showcasing strong performance. Gemini Ultra seems like a close runner-up, though hard to tell as Google hasn&#8217;t opened developers access yet.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Oawj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd8bec4c-627b-4c5d-8747-faf80e4b582f_1600x901.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Oawj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd8bec4c-627b-4c5d-8747-faf80e4b582f_1600x901.png 424w, https://substackcdn.com/image/fetch/$s_!Oawj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd8bec4c-627b-4c5d-8747-faf80e4b582f_1600x901.png 848w, https://substackcdn.com/image/fetch/$s_!Oawj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd8bec4c-627b-4c5d-8747-faf80e4b582f_1600x901.png 1272w, https://substackcdn.com/image/fetch/$s_!Oawj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd8bec4c-627b-4c5d-8747-faf80e4b582f_1600x901.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Oawj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd8bec4c-627b-4c5d-8747-faf80e4b582f_1600x901.png" width="629" height="354.2445054945055" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dd8bec4c-627b-4c5d-8747-faf80e4b582f_1600x901.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:820,&quot;width&quot;:1456,&quot;resizeWidth&quot;:629,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Oawj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd8bec4c-627b-4c5d-8747-faf80e4b582f_1600x901.png 424w, https://substackcdn.com/image/fetch/$s_!Oawj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd8bec4c-627b-4c5d-8747-faf80e4b582f_1600x901.png 848w, https://substackcdn.com/image/fetch/$s_!Oawj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd8bec4c-627b-4c5d-8747-faf80e4b582f_1600x901.png 1272w, https://substackcdn.com/image/fetch/$s_!Oawj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd8bec4c-627b-4c5d-8747-faf80e4b582f_1600x901.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">GPT can count objects (left) and read JRR Tolkien&#8217;s handwritten text (right) <a href="https://encord.com/blog/gpt4-vision/">Source</a></figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0z_o!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d5fb405-f5e3-4f18-8499-c9e73fd44d3b_582x318.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0z_o!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d5fb405-f5e3-4f18-8499-c9e73fd44d3b_582x318.gif 424w, https://substackcdn.com/image/fetch/$s_!0z_o!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d5fb405-f5e3-4f18-8499-c9e73fd44d3b_582x318.gif 848w, https://substackcdn.com/image/fetch/$s_!0z_o!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d5fb405-f5e3-4f18-8499-c9e73fd44d3b_582x318.gif 1272w, https://substackcdn.com/image/fetch/$s_!0z_o!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d5fb405-f5e3-4f18-8499-c9e73fd44d3b_582x318.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0z_o!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d5fb405-f5e3-4f18-8499-c9e73fd44d3b_582x318.gif" width="632" height="345.319587628866" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6d5fb405-f5e3-4f18-8499-c9e73fd44d3b_582x318.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:318,&quot;width&quot;:582,&quot;resizeWidth&quot;:632,&quot;bytes&quot;:1920532,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!0z_o!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d5fb405-f5e3-4f18-8499-c9e73fd44d3b_582x318.gif 424w, https://substackcdn.com/image/fetch/$s_!0z_o!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d5fb405-f5e3-4f18-8499-c9e73fd44d3b_582x318.gif 848w, https://substackcdn.com/image/fetch/$s_!0z_o!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d5fb405-f5e3-4f18-8499-c9e73fd44d3b_582x318.gif 1272w, https://substackcdn.com/image/fetch/$s_!0z_o!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d5fb405-f5e3-4f18-8499-c9e73fd44d3b_582x318.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Gemini solves visual puzzles thanks to its vision capabilities. Note that this demo was <a href="https://developers.googleblog.com/2023/12/how-its-made-gemini-multimodal-prompting.html">meticulously forged</a></figcaption></figure></div><h2>Autonomous Agents</h2><p>An autonomous agent is an AI program capable of planning and executing tasks based on a given objective. Imagine asking an AI to &#8220;book a flight&#8221; or &#8220;create a website for people interested in renting their apartments when they&#8217;re away&#8221; - and the agent goes to work.</p><p>It does so by repeatedly asking &#8220;What should be the next steps to achieve the task at hand&#8221;, utilizing an LLM to answer this question and devise a plan to execute.</p><p>AI Agents that can do work for us is a great promise. In <strong>2022,</strong> the main player was Adept with its <a href="https://www.adept.ai/blog/act-1">ACT-1 model</a>. Although mainly a demo, Adept showcased an agent that can find apartments on Redfin, take actions on Google Sheets, and input information into Salesforce using natural language.</p><p>On the contrary, <strong>2023</strong> was filled with open-source and commercial progress. Stanford&#8217;s <a href="https://arxiv.org/abs/2304.03442">Simulacra</a> paper ignited everyone&#8217;s imagination and created innovative frameworks for agent builders (<a href="https://github.com/Significant-Gravitas/AutoGPT">AutoGPT</a>, <a href="https://github.com/yoheinakajima/babyagi">BabyAGI</a>) and agent-based applications (<a href="https://github.com/gpt-engineer-org/gpt-engineer">GPTEngineer</a>, <a href="https://github.com/geekan/MetaGPT">MetaGPT</a>).</p><p>To date, the open-source SOTA model is <a href="https://arxiv.org/abs/2312.08914">CogAgent</a> (<a href="http://36.103.203.44:7861/">demo</a>), a powerful 18B visual language model capable of navigating mobile applications and websites, surpassing existing LLMs in both text and general VQA benchmarks.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8Vb0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2384ff9-f7e8-4f94-9308-037b9a64af2c_1318x1254.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8Vb0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2384ff9-f7e8-4f94-9308-037b9a64af2c_1318x1254.png 424w, https://substackcdn.com/image/fetch/$s_!8Vb0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2384ff9-f7e8-4f94-9308-037b9a64af2c_1318x1254.png 848w, https://substackcdn.com/image/fetch/$s_!8Vb0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2384ff9-f7e8-4f94-9308-037b9a64af2c_1318x1254.png 1272w, https://substackcdn.com/image/fetch/$s_!8Vb0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2384ff9-f7e8-4f94-9308-037b9a64af2c_1318x1254.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8Vb0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2384ff9-f7e8-4f94-9308-037b9a64af2c_1318x1254.png" width="613" height="583.2336874051593" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c2384ff9-f7e8-4f94-9308-037b9a64af2c_1318x1254.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1254,&quot;width&quot;:1318,&quot;resizeWidth&quot;:613,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8Vb0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2384ff9-f7e8-4f94-9308-037b9a64af2c_1318x1254.png 424w, https://substackcdn.com/image/fetch/$s_!8Vb0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2384ff9-f7e8-4f94-9308-037b9a64af2c_1318x1254.png 848w, https://substackcdn.com/image/fetch/$s_!8Vb0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2384ff9-f7e8-4f94-9308-037b9a64af2c_1318x1254.png 1272w, https://substackcdn.com/image/fetch/$s_!8Vb0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2384ff9-f7e8-4f94-9308-037b9a64af2c_1318x1254.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">CogAgent in action</figcaption></figure></div><p>There is still no clear winner on the commercial front. The leading startups operating in this space include <a href="https://www.adept.ai/">Adept</a>, <a href="https://embra.app/">Embra</a>, <a href="https://www.lindy.ai/">Lindy</a>, <a href="https://www.induced.ai/">Induced</a>, and <a href="https://www.hyperwriteai.com/personal-assistant">HyperWrite AI</a>. As the language models powering such agents become increasingly cheaper and more competent, 2024 might be the year of the first widely-used agent, our first AI companion.</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;bca40c66-7c2f-4bd8-9899-d186612763c0&quot;,&quot;caption&quot;:&quot;Welcome to Deep Dives - an AI Tidbits section providing editorial takes and insights to make sense of the latest in AI. Let&#8217;s go! Last February, Stanford published a paper that sparked everyone&#8217;s imagination. In this paper, the researchers leveraged ChatGPT to power human-like agents. A mini-simulation of humanity.&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;The rise of autonomous agents&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:3770805,&quot;name&quot;:&quot;Sahar Mor&quot;,&quot;bio&quot;:&quot;Bringing the latest in AI to the mass through writings and Github repos&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa06b2072-0444-44f7-8106-7892097e4128_1690x1762.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:100}],&quot;post_date&quot;:&quot;2023-11-19T16:30:28.414Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56d15b18-239c-4403-839e-544d2e9dac77_600x378.gif&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.aitidbits.ai/p/the-rise-of-autonomous-agents&quot;,&quot;section_name&quot;:&quot;Deep Dives&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:138981811,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:15,&quot;comment_count&quot;:3,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;AI Tidbits&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F71d6ea06-1f4c-478d-b0f2-6227eede6b25_1280x1280.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><h2> Image generation</h2><p>Image generation is the verticle having the most notable progress across modalities. Over the past two years, the transformation has been astounding, evolving from artificial-looking creations to artistry so professional it is virtually indistinguishable from human work.</p><p><strong>2022 </strong>was the year in which image generation steered away from GANs and onto Diffusion models. In one buzzing summer, DALL-E 2, Stable Diffusion, and Midjourney were released, sending the image synthesis space off to the races. In the eighteen months since these releases, both the open-source and commercial models have undergone massive improvements.</p><p>Given the subjective nature of art, pinpointing a definitive <strong>2023 </strong>SOTA is challenging. However, OpenAI's DALL-E 3 and Midjourney are the reigning champions, distinguished by their widespread usage and consistently high-quality outputs. In 2023, these image generation models overcame a significant hurdle that plagued image diffusion models in 2022: the accurate rendering of faces, hands, and text.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Hhtu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01f1a42d-266d-4756-a8d8-c54617bbaadc_1600x899.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Hhtu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01f1a42d-266d-4756-a8d8-c54617bbaadc_1600x899.png 424w, https://substackcdn.com/image/fetch/$s_!Hhtu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01f1a42d-266d-4756-a8d8-c54617bbaadc_1600x899.png 848w, https://substackcdn.com/image/fetch/$s_!Hhtu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01f1a42d-266d-4756-a8d8-c54617bbaadc_1600x899.png 1272w, https://substackcdn.com/image/fetch/$s_!Hhtu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01f1a42d-266d-4756-a8d8-c54617bbaadc_1600x899.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Hhtu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01f1a42d-266d-4756-a8d8-c54617bbaadc_1600x899.png" width="634" height="356.18956043956047" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/01f1a42d-266d-4756-a8d8-c54617bbaadc_1600x899.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:818,&quot;width&quot;:1456,&quot;resizeWidth&quot;:634,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Hhtu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01f1a42d-266d-4756-a8d8-c54617bbaadc_1600x899.png 424w, https://substackcdn.com/image/fetch/$s_!Hhtu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01f1a42d-266d-4756-a8d8-c54617bbaadc_1600x899.png 848w, https://substackcdn.com/image/fetch/$s_!Hhtu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01f1a42d-266d-4756-a8d8-c54617bbaadc_1600x899.png 1272w, https://substackcdn.com/image/fetch/$s_!Hhtu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01f1a42d-266d-4756-a8d8-c54617bbaadc_1600x899.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">DALL-E 3 (right) is better at generating faces and hands compared to DALL-E 2 (left, <a href="https://spectrum.ieee.org/openai-dall-e-2">source</a>) using the same prompt: <em>Seven engineers gathered around a whiteboard</em></figcaption></figure></div><p>DALL-E 3 has introduced several advances over its <strong>2022</strong> predecessor with improved caption fidelity, overall image quality, and better steerability to reduce the generation of harmful images, demographic biases, and public figures.</p><p>In <strong>2023</strong>, Midjourney graduated from Discord onto a web app and jumped from v5 to v6, showcasing notable improvements in detail and text generation capabilities.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YqKb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a42a0e2-afe1-45ce-984e-bbce485c8044_960x1200.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YqKb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a42a0e2-afe1-45ce-984e-bbce485c8044_960x1200.png 424w, https://substackcdn.com/image/fetch/$s_!YqKb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a42a0e2-afe1-45ce-984e-bbce485c8044_960x1200.png 848w, https://substackcdn.com/image/fetch/$s_!YqKb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a42a0e2-afe1-45ce-984e-bbce485c8044_960x1200.png 1272w, https://substackcdn.com/image/fetch/$s_!YqKb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a42a0e2-afe1-45ce-984e-bbce485c8044_960x1200.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YqKb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a42a0e2-afe1-45ce-984e-bbce485c8044_960x1200.png" width="591" height="738.75" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8a42a0e2-afe1-45ce-984e-bbce485c8044_960x1200.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1200,&quot;width&quot;:960,&quot;resizeWidth&quot;:591,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YqKb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a42a0e2-afe1-45ce-984e-bbce485c8044_960x1200.png 424w, https://substackcdn.com/image/fetch/$s_!YqKb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a42a0e2-afe1-45ce-984e-bbce485c8044_960x1200.png 848w, https://substackcdn.com/image/fetch/$s_!YqKb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a42a0e2-afe1-45ce-984e-bbce485c8044_960x1200.png 1272w, https://substackcdn.com/image/fetch/$s_!YqKb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a42a0e2-afe1-45ce-984e-bbce485c8044_960x1200.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Midjourney&#8217;s generation quality leapfrogged in less than two years (<a href="https://twitter.com/Evolving_AI/status/1737858378549088412">source</a>)</figcaption></figure></div><p>Other noteworthy step changes include Stability AI&#8217;s SDXL Turbo, supporting high-resolution image generation at a fraction of the compute and cost, and <a href="https://ideogram.ai/">Ideagram</a>, which made the rounds with their text generation capabilities, a differentiating feature before DALL-E 3 was released.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2B5p!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6024652e-d3e0-4d97-9fde-9a65b3ee0af7_1600x895.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2B5p!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6024652e-d3e0-4d97-9fde-9a65b3ee0af7_1600x895.png 424w, https://substackcdn.com/image/fetch/$s_!2B5p!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6024652e-d3e0-4d97-9fde-9a65b3ee0af7_1600x895.png 848w, https://substackcdn.com/image/fetch/$s_!2B5p!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6024652e-d3e0-4d97-9fde-9a65b3ee0af7_1600x895.png 1272w, https://substackcdn.com/image/fetch/$s_!2B5p!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6024652e-d3e0-4d97-9fde-9a65b3ee0af7_1600x895.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2B5p!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6024652e-d3e0-4d97-9fde-9a65b3ee0af7_1600x895.png" width="620" height="346.6208791208791" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6024652e-d3e0-4d97-9fde-9a65b3ee0af7_1600x895.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:814,&quot;width&quot;:1456,&quot;resizeWidth&quot;:620,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2B5p!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6024652e-d3e0-4d97-9fde-9a65b3ee0af7_1600x895.png 424w, https://substackcdn.com/image/fetch/$s_!2B5p!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6024652e-d3e0-4d97-9fde-9a65b3ee0af7_1600x895.png 848w, https://substackcdn.com/image/fetch/$s_!2B5p!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6024652e-d3e0-4d97-9fde-9a65b3ee0af7_1600x895.png 1272w, https://substackcdn.com/image/fetch/$s_!2B5p!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6024652e-d3e0-4d97-9fde-9a65b3ee0af7_1600x895.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Stable Diffusion 2.0 (left, <a href="https://promptbase.com/profile/transeunte">source</a>) vs. SDXL Turbo (right, <a href="https://blog.segmind.com/generating-photographic-images-with-stable-diffusion/">source</a>)</figcaption></figure></div><p>On the open-source front, we got <a href="https://github.com/lllyasviel/Fooocus">Fooocus</a>, a package that allows everyone to generate Midjourney-quality images on their own computer without needing the advanced prompting techniques employed by Midjourney experts.</p><p>Generation latency has also substantially reduced thanks to novel techniques such as <a href="https://latent-consistency-models.github.io/">Latent Consistency Models</a> (LCM), supporting real-time image generation at typing speed.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BNgw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa102910a-363d-4cf1-aa4f-e56b59d2c962_826x662.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BNgw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa102910a-363d-4cf1-aa4f-e56b59d2c962_826x662.gif 424w, https://substackcdn.com/image/fetch/$s_!BNgw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa102910a-363d-4cf1-aa4f-e56b59d2c962_826x662.gif 848w, https://substackcdn.com/image/fetch/$s_!BNgw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa102910a-363d-4cf1-aa4f-e56b59d2c962_826x662.gif 1272w, https://substackcdn.com/image/fetch/$s_!BNgw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa102910a-363d-4cf1-aa4f-e56b59d2c962_826x662.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BNgw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa102910a-363d-4cf1-aa4f-e56b59d2c962_826x662.gif" width="464" height="371.8740920096852" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a102910a-363d-4cf1-aa4f-e56b59d2c962_826x662.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:662,&quot;width&quot;:826,&quot;resizeWidth&quot;:464,&quot;bytes&quot;:1080044,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BNgw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa102910a-363d-4cf1-aa4f-e56b59d2c962_826x662.gif 424w, https://substackcdn.com/image/fetch/$s_!BNgw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa102910a-363d-4cf1-aa4f-e56b59d2c962_826x662.gif 848w, https://substackcdn.com/image/fetch/$s_!BNgw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa102910a-363d-4cf1-aa4f-e56b59d2c962_826x662.gif 1272w, https://substackcdn.com/image/fetch/$s_!BNgw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa102910a-363d-4cf1-aa4f-e56b59d2c962_826x662.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Real-time generation with a reasonable quality. <a href="https://huggingface.co/spaces/latent-consistency/Real-Time-LCM-Text-to-Image-Lora-SD1.5">Try it yourself</a></figcaption></figure></div><h2>Video generation</h2><p>Generative models in video weren&#8217;t really a thing in <strong>2022</strong>. The space was practically dormant, with merely a teaser from Runway about its upcoming text-to-video model Gen-1 and a few closed releases such as Google&#8217;s <a href="https://imagen.research.google/video/">Imagen Video</a> and Meta&#8217;s <a href="https://makeavideo.studio/">Make-A-Video</a>.</p><p>This changed in <strong>2023</strong>, with numerous open-source packages emerging and notable advancements in commercial products, including an extended maximum video duration of 18 seconds (up from 4 seconds in 2022), alongside substantial enhancements in video quality and consistency.</p><p>On the commercial front, <a href="https://pika.art/">Pika Labs</a> and <a href="https://runwayml.com/">Runway</a> lead the pack, both developing their own foundation text-to-video models. This year, they added video inpainting and outpainting capabilities and the ability to render styles like anime and cinematic.</p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;cebcca55-eb50-48f0-84f6-1d77cdc7eed4&quot;,&quot;duration&quot;:null}"></div><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;bce82f59-5e45-41a1-a894-a2e95524dcd6&quot;,&quot;duration&quot;:null}"></div><p>Another notable contender is <a href="https://www.heygen.com/video-translate">HeyGen</a>, which uses AI to translate videos into almost 30 languages by cloning the speaker&#8217;s voice and adjusting lip movements to the target language.</p><p>On the open-source front, <a href="https://github.com/AILab-CVC/VideoCrafter">VideoCrafter1</a> and <a href="https://huggingface.co/spaces/damo-vilab/modelscope-text-to-video-synthesis">ModelScope</a> are the current state-of-the-art models. VideoCrafter1 can generate realistic and cinematic-quality videos from text with a resolution of 1024 &#215; 576, outperforming previous open-source models in terms of quality. It features another image-to-video model designed to produce videos that strictly adhere to the content of the provided reference image. ModelScope can generate videos up to 25 seconds long at a reasonable quality.&nbsp;</p><p>Also this year, Stability AI released <a href="https://huggingface.co/stabilityai/stable-video-diffusion-img2vid">Stable Video Difussion</a>, turning images into short video clips, though permitting research use only.</p><p>Lastly, just before year-end, Meta released <a href="https://emu-video.metademolab.com">Emu Video</a>, capable of efficiently generating high-quality, 512px, 4-second long videos, conditioned on text prompts and initial generated images, outperforming other state-of-the-art text-to-video models like Runway and Pika in terms of video quality and faithfulness to prompts.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rE-8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b3315b5-b7f3-4f25-8b30-c7ac5cc1855c_1886x876.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rE-8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b3315b5-b7f3-4f25-8b30-c7ac5cc1855c_1886x876.png 424w, https://substackcdn.com/image/fetch/$s_!rE-8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b3315b5-b7f3-4f25-8b30-c7ac5cc1855c_1886x876.png 848w, https://substackcdn.com/image/fetch/$s_!rE-8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b3315b5-b7f3-4f25-8b30-c7ac5cc1855c_1886x876.png 1272w, https://substackcdn.com/image/fetch/$s_!rE-8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b3315b5-b7f3-4f25-8b30-c7ac5cc1855c_1886x876.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rE-8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b3315b5-b7f3-4f25-8b30-c7ac5cc1855c_1886x876.png" width="630" height="292.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7b3315b5-b7f3-4f25-8b30-c7ac5cc1855c_1886x876.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:676,&quot;width&quot;:1456,&quot;resizeWidth&quot;:630,&quot;bytes&quot;:141383,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!rE-8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b3315b5-b7f3-4f25-8b30-c7ac5cc1855c_1886x876.png 424w, https://substackcdn.com/image/fetch/$s_!rE-8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b3315b5-b7f3-4f25-8b30-c7ac5cc1855c_1886x876.png 848w, https://substackcdn.com/image/fetch/$s_!rE-8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b3315b5-b7f3-4f25-8b30-c7ac5cc1855c_1886x876.png 1272w, https://substackcdn.com/image/fetch/$s_!rE-8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b3315b5-b7f3-4f25-8b30-c7ac5cc1855c_1886x876.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Emu Video&#8217;s win rates against other generative video models</figcaption></figure></div><h2>Speech understanding</h2><p>Transcribing sound and audio has been predominantly human-driven up until recently.</p><p>OpenAI&#8217;s release of Whisper in <strong>2022</strong>, back then the SOTA open source transcription model, reignited the space and fostered a wave of speech-powered applications.</p><p>Since then, two more <a href="https://github.com/openai/whisper">Whisper</a> versions have been released in <strong>2023</strong>, supporting 58 languages with reduced hallucinations and a 10%-20% lower Word Error Rate (WER). Thanks to packages such as <a href="https://github.com/Vaibhavs10/insanely-fast-whisper">Insanely Fast Whisper</a>, inference latency has also been reduced to near real-time, enabling live conversational use cases.</p><p>The current commercial SOTA speech-to-text model is <a href="https://deepgram.com/learn/nova-2-speech-to-text-api">Nova-2</a> from Deepgram. It outperforms all alternatives in terms of accuracy, speed, and cost and achieves an 8.4 WER, 30% less than its 2022 predecessor.</p><h2>Speech generation</h2><p>The space of generative voice AI didn&#8217;t see much progress until this year, with incumbents like Amazon and Google providing ok-ish solutions.</p><p>In <strong>2023</strong>, <a href="https://elevenlabs.io/">ElevenLabs</a> wowed the industry with its remarkably fast text-to-speech model, outperforming all other models on latency and quality. AI-generated voice is no longer distinguishable from human voice. As competition heats up in this space with contenders such as <a href="https://platform.openai.com/docs/guides/text-to-speech">OpenAI&#8217;s TTS</a>, we can expect better and more affordable models in 2024.</p><div class="native-audio-embed" data-component-name="AudioPlaceholder" data-attrs="{&quot;label&quot;:null,&quot;mediaUploadId&quot;:&quot;16df2d2f-ecbf-40f8-a2c5-042120bcafcc&quot;,&quot;duration&quot;:11.990204,&quot;downloadable&quot;:false,&quot;isEditorNode&quot;:true}"></div><p><em>Joe Biden&#8217;s voice cloned via ElevenLabs (<a href="https://www.theverge.com/2023/1/31/23579289/ai-voice-clone-deepfake-abuse-4chan-elevenlabs">source</a>)</em></p><p>The open-source community is not far behind. Meta open-sourced Seamless&#8211;a suite of AI translation models that can generate and translate speech in real time.</p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;15989cb9-a9ee-4f9a-beed-17304cf5bb13&quot;,&quot;duration&quot;:null}"></div><p>Microsoft also introduced <a href="https://www.microsoft.com/en-us/research/project/vall-e-x/">VALL-E X</a> (<a href="https://github.com/Plachtaa/VALL-E-X">open-source implementation</a>), capable of synthesizing high-quality speech with only a 3-second enrolled recording of an unseen speaker as input, and Coqui released <a href="https://coqui.ai/blog/tts/open_xtts">XTTS</a> - a voice generation model capable of cloning voices into 17 different languages by using a 6-second audio clip.</p><h2>Music generation</h2><p><strong>2022</strong> ended with the release of <a href="https://arstechnica.com/information-technology/2022/12/riffusions-ai-generates-music-from-text-using-visual-sonograms/">Riffusion</a>, an app that generates music from text using visual sonograms.</p><p>Like other generative AI vertices, <strong>2023</strong> was different. On the open-source front, Meta&#8217;s <a href="https://audiocraft.metademolab.com/musicgen.html">MusicGen</a> demonstrated remarkable performance, turning text and melodies into music. MusicGen&#8217;s release was complemented with <a href="https://github.com/open-mmlab/Amphion">Amphion</a> - an open-source toolkit for audio, music, and speech generation released a few weeks ago.</p><p>On the commercial side, <a href="https://www.suno.ai/">Suno AI</a> is the prominent industry leader, with <a href="https://stableaudio.com/">Stable Audio</a> and Google&#8217;s <a href="https://aitestkitchen.withgoogle.com/tools/music-fx">MusicFX</a> as runner-ups. Suno generates original songs with lyrics and melody, all from text prompts and within a few minutes. To fully appreciate how far we have come, I strongly recommend experimenting with Suno through its <a href="https://app.suno.ai/">web application</a> or by using <a href="https://copilot.microsoft.com/">Microsoft Copilot</a>.</p><p>In eight months, Suno extended song duration into full-length songs and added lyrics generation capabilities. April &#8216;23 (top) compared to Dec &#8216;23 (bottom):</p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;ee15531d-4c64-4166-bce7-eda69bbd5c05&quot;,&quot;duration&quot;:null}"></div><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;53cdf64b-bd9c-4028-90a2-b905685bf65b&quot;,&quot;duration&quot;:null}"></div><div><hr></div><h2>2023 Deep Dives</h2><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;5fbfa9a5-297c-45d6-b54c-986b9a38b2ac&quot;,&quot;caption&quot;:&quot;Welcome to Deep Dives - a dedicated AI Tidbits section providing editorial takes and insights to make sense of the latest in AI.&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Most popular and upcoming Generative AI tools and APIs&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:3770805,&quot;name&quot;:&quot;Sahar Mor&quot;,&quot;bio&quot;:&quot;Bringing the latest in AI to the mass through writings and Github repos&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa06b2072-0444-44f7-8106-7892097e4128_1690x1762.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:100}],&quot;post_date&quot;:&quot;2023-12-19T15:30:19.597Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52307a3c-6727-4ca5-a4da-208969e7b833_1944x1090.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.aitidbits.ai/p/most-used-tools&quot;,&quot;section_name&quot;:&quot;Deep Dives&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:139821359,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:15,&quot;comment_count&quot;:3,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;AI Tidbits&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F71d6ea06-1f4c-478d-b0f2-6227eede6b25_1280x1280.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;7e450059-aa64-4de7-b1ad-375e80e95227&quot;,&quot;caption&quot;:&quot;Welcome to Deep Dives - an AI Tidbits section providing editorial takes and insights to make sense of the latest in AI. Over ten papers outlining novel prompting techniques were published in the last few months alone. While our X and LinkedIn feeds buzz with countless secret prompting tips &#8220;97% of ChatGPT users don&#8217;t know about&#8221;, a definitive, research-backed guide aggregating these advanced prompting strategies is hard to come by. This gap prevents LLM developers and everyday users from harnessing these novel frameworks to enhance performance and achieve more accurate results.&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Harnessing research-backed prompting techniques for enhanced LLM performance&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:3770805,&quot;name&quot;:&quot;Sahar Mor&quot;,&quot;bio&quot;:&quot;Bringing the latest in AI to the mass through writings and Github repos&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa06b2072-0444-44f7-8106-7892097e4128_1690x1762.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:100}],&quot;post_date&quot;:&quot;2023-12-10T16:00:41.722Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7ccf1c5f-bca1-40ef-be43-2a7ec84c2f40_2014x1132.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.aitidbits.ai/p/advanced-prompting&quot;,&quot;section_name&quot;:&quot;Deep Dives&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:139449913,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:32,&quot;comment_count&quot;:0,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;AI Tidbits&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F71d6ea06-1f4c-478d-b0f2-6227eede6b25_1280x1280.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;1a313309-7ef8-4eb2-bcbb-00b800b21514&quot;,&quot;caption&quot;:&quot;Welcome to Deep Dives - an AI Tidbits section providing editorial takes and insights to make sense of the latest in AI. Let&#8217;s go! Last February, Stanford published a paper that sparked everyone&#8217;s imagination. In this paper, the researchers leveraged ChatGPT to power human-like agents. A mini-simulation of humanity.&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;The rise of autonomous agents&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:3770805,&quot;name&quot;:&quot;Sahar Mor&quot;,&quot;bio&quot;:&quot;Bringing the latest in AI to the mass through writings and Github repos&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa06b2072-0444-44f7-8106-7892097e4128_1690x1762.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:100}],&quot;post_date&quot;:&quot;2023-11-19T16:30:28.414Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56d15b18-239c-4403-839e-544d2e9dac77_600x378.gif&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.aitidbits.ai/p/the-rise-of-autonomous-agents&quot;,&quot;section_name&quot;:&quot;Deep Dives&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:138981811,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:15,&quot;comment_count&quot;:3,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;AI Tidbits&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F71d6ea06-1f4c-478d-b0f2-6227eede6b25_1280x1280.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;8e8465ee-47c7-439f-8484-2860eb255619&quot;,&quot;caption&quot;:&quot;Welcome to Deep Dives - an AI Tidbits section providing editorial takes and insights to make sense of the latest in AI.&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;OpenAI DevDay - a pivotal moment for AI &quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:3770805,&quot;name&quot;:&quot;Sahar Mor&quot;,&quot;bio&quot;:&quot;Bringing the latest in AI to the mass through writings and Github repos&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa06b2072-0444-44f7-8106-7892097e4128_1690x1762.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:100}],&quot;post_date&quot;:&quot;2023-11-07T15:30:25.921Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98fdd507-a5dc-4517-a165-87cab224ee7c_2300x1286.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.aitidbits.ai/p/openai-devday&quot;,&quot;section_name&quot;:&quot;Deep Dives&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:138650315,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:27,&quot;comment_count&quot;:3,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;AI Tidbits&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F71d6ea06-1f4c-478d-b0f2-6227eede6b25_1280x1280.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;eefb8602-9f8f-467d-8b4a-93fb40f5e5b1&quot;,&quot;caption&quot;:&quot;Welcome to Deep Dives - an AI Tidbits section providing editorial takes and insights to make sense of the latest in AI. Let&#8217;s go! In June 2020, OpenAI unveiled GPT-3. As a veteran in the document processing domain, I had long recognized the limitations of prevailing document extraction technologies, which largely relied on rigid, rule-based logic. I wondered if language models could be the answer to intelligent data extraction. And indeed, they were.&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Revolutionizing document processing with multimodal GPT&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:3770805,&quot;name&quot;:&quot;Sahar Mor&quot;,&quot;bio&quot;:&quot;Bringing the latest in AI to the mass through writings and Github repos&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa06b2072-0444-44f7-8106-7892097e4128_1690x1762.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:100}],&quot;post_date&quot;:&quot;2023-10-30T14:30:30.962Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a4c326a-53e0-492d-b375-9c69899b8fcd_800x1032.gif&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.aitidbits.ai/p/doc-extraction-gpt4&quot;,&quot;section_name&quot;:&quot;Deep Dives&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:138339915,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:14,&quot;comment_count&quot;:0,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;AI Tidbits&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F71d6ea06-1f4c-478d-b0f2-6227eede6b25_1280x1280.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;cf6b40c0-264b-4cb5-83c4-34cde130208a&quot;,&quot;caption&quot;:&quot;Welcome to Deep Dives - an AI Tidbits section providing editorial takes and insights to make sense of the latest in AI. Let&#8217;s go! Three years ago, I started a company that turns PDF and image documents into structured data. My twist? Using language models. Two years later, I decided to refocus my energy elsewhere with the main reason being commoditization. The OCR and document intelligence market were a race to the bottom, and even though I could raise VC money - I realized it was a lost war.&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;The era of AI-powered SMBs&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:3770805,&quot;name&quot;:&quot;Sahar Mor&quot;,&quot;bio&quot;:&quot;Bringing the latest in AI to the mass through writings and Github repos&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa06b2072-0444-44f7-8106-7892097e4128_1690x1762.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:100}],&quot;post_date&quot;:&quot;2023-09-24T15:00:30.517Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fbf9e3ae-a4ee-4758-8510-834bad752d4e_480x360.gif&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.aitidbits.ai/p/ai-powered-smbs&quot;,&quot;section_name&quot;:&quot;Deep Dives&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:137343711,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:8,&quot;comment_count&quot;:0,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;AI Tidbits&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F71d6ea06-1f4c-478d-b0f2-6227eede6b25_1280x1280.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;c0aba9f5-5533-4902-b236-7e7885dfcbc2&quot;,&quot;caption&quot;:&quot;Welcome to Deep Dives - an AI Tidbits section providing editorial takes and insights to make sense of the latest in AI. Let&#8217;s go! I was somewhat new to the Payments space when I joined Stripe. I remember being dazzled by the sheer amount of complexities taking place when one hits the &#8220;Book a Ride&#8221; button to catch an Uber. It was at Stripe where I got to learn about a new concept - Multiprocessor.&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;The Multiprocessor of Language Models&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:3770805,&quot;name&quot;:&quot;Sahar Mor&quot;,&quot;bio&quot;:&quot;Bringing the latest in AI to the mass through writings and Github repos&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa06b2072-0444-44f7-8106-7892097e4128_1690x1762.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:100}],&quot;post_date&quot;:&quot;2023-08-20T15:30:09.330Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0df50929-00cd-4dd8-8768-9ca090ebe0bd_200x250.gif&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.aitidbits.ai/p/the-multiprocessor-of-language-models&quot;,&quot;section_name&quot;:&quot;Deep Dives&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:136039501,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:14,&quot;comment_count&quot;:0,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;AI Tidbits&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F71d6ea06-1f4c-478d-b0f2-6227eede6b25_1280x1280.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;15de8b18-d124-4f68-b46c-8dcf300acc36&quot;,&quot;caption&quot;:&quot;Welcome to Deep Dives - an AI Tidbits section providing editorial takes and insights to make sense of the latest in AI. Let&#8217;s go! Three months ago, after years without any substantial updates, Google refactored its money-making machine - Google Search, infusing it with generative AI and launched&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;The future of Internet Search in the era of LLMs&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:3770805,&quot;name&quot;:&quot;Sahar Mor&quot;,&quot;bio&quot;:&quot;Bringing the latest in AI to the mass through writings and Github repos&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa06b2072-0444-44f7-8106-7892097e4128_1690x1762.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:100}],&quot;post_date&quot;:&quot;2023-08-13T15:31:31.847Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29e8f6f3-55fa-4805-8e90-301404560ddb_730x440.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.aitidbits.ai/p/future-of-internet-search&quot;,&quot;section_name&quot;:&quot;Deep Dives&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:135923979,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:16,&quot;comment_count&quot;:5,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;AI Tidbits&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F71d6ea06-1f4c-478d-b0f2-6227eede6b25_1280x1280.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;ddb3257f-564d-469f-864d-a12e71979f5c&quot;,&quot;caption&quot;:&quot;Welcome to Deep Dives - a new section of AI Tidbits providing editorial takes and insights to make sense of the latest in AI. Let&#8217;s go! &#8220;What do you mean there is an open-source library for that? We built the entire thing ourselves&#8221; is a quote I often hear from builders in the LLM space. I&#8217;ve been building with LLMs for the last year and turned my personal list of >60 useful packages into a public table so others won&#8217;t experience the same frustration.&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Open-source Generative AI&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:3770805,&quot;name&quot;:&quot;Sahar Mor&quot;,&quot;bio&quot;:&quot;Bringing the latest in AI to the mass through writings and Github repos&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa06b2072-0444-44f7-8106-7892097e4128_1690x1762.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:100}],&quot;post_date&quot;:&quot;2023-08-06T16:30:15.749Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/885bba4a-9f47-4763-82f1-b7b9196ed69d_1664x958.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.aitidbits.ai/p/open-source-llms&quot;,&quot;section_name&quot;:&quot;Deep Dives&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:135729768,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:17,&quot;comment_count&quot;:0,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;AI Tidbits&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F71d6ea06-1f4c-478d-b0f2-6227eede6b25_1280x1280.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p></p><p><em>If you find AI Tidbits valuable, share it with a friend and consider showing your support.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.aitidbits.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.aitidbits.ai/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item><item><title><![CDATA[OpenAI DevDay - a pivotal moment for AI ]]></title><description><![CDATA[Making sense and sharing insights from OpenAI's announcements including a faster and cheaper GPT-4 model, a new text-to-speech API, an App Store for GPT agents, and more.]]></description><link>https://www.aitidbits.ai/p/openai-devday</link><guid isPermaLink="false">https://www.aitidbits.ai/p/openai-devday</guid><dc:creator><![CDATA[Sahar Mor]]></dc:creator><pubDate>Tue, 07 Nov 2023 15:30:25 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!6VQ_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98fdd507-a5dc-4517-a165-87cab224ee7c_2300x1286.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Welcome to Deep Dives <strong>- </strong>an AI Tidbits section providing editorial takes and insights to make sense of the latest in AI.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.aitidbits.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.aitidbits.ai/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><p>OpenAI's latest announcements have sent shockwaves through the tech world, signaling a new era for artificial intelligence. The company's ground-breaking announcements and bold moves would eliminate thousands of companies, from GPT-wrappers to deep tech ones, and pose a substantial threat to big tech incumbents.</p><p>These developments aren't just about shaking up the industry&#8212;they create a goldmine of possibilities for innovators and businesses in AI, turning previously unfeasible ideas into profitable ventures.</p><p>In this post, I'll unpack each of these pivotal announcements and outline the impact they have on those at the cutting edge of AI.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6VQ_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98fdd507-a5dc-4517-a165-87cab224ee7c_2300x1286.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6VQ_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98fdd507-a5dc-4517-a165-87cab224ee7c_2300x1286.png 424w, https://substackcdn.com/image/fetch/$s_!6VQ_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98fdd507-a5dc-4517-a165-87cab224ee7c_2300x1286.png 848w, https://substackcdn.com/image/fetch/$s_!6VQ_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98fdd507-a5dc-4517-a165-87cab224ee7c_2300x1286.png 1272w, https://substackcdn.com/image/fetch/$s_!6VQ_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98fdd507-a5dc-4517-a165-87cab224ee7c_2300x1286.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6VQ_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98fdd507-a5dc-4517-a165-87cab224ee7c_2300x1286.png" width="1456" height="814" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/98fdd507-a5dc-4517-a165-87cab224ee7c_2300x1286.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:814,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1855077,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6VQ_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98fdd507-a5dc-4517-a165-87cab224ee7c_2300x1286.png 424w, https://substackcdn.com/image/fetch/$s_!6VQ_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98fdd507-a5dc-4517-a165-87cab224ee7c_2300x1286.png 848w, https://substackcdn.com/image/fetch/$s_!6VQ_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98fdd507-a5dc-4517-a165-87cab224ee7c_2300x1286.png 1272w, https://substackcdn.com/image/fetch/$s_!6VQ_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98fdd507-a5dc-4517-a165-87cab224ee7c_2300x1286.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Announcements covered:</p><ul><li><p>GPT-4 Turbo with a 128k context length</p></li><li><p>Substantial price reductions</p></li><li><p>New text-to-speech (TTS) model and API</p></li><li><p>Whisper v3</p></li><li><p>Assistants API and Retrieval </p></li><li><p>GPTs and the GPT Store, &#224; la OpenAI&#8217;s App Store</p></li><li><p>Other announcements</p></li></ul><h2><strong>A new model: GPT-4 Turbo having a 128k context length</strong></h2><p>GPT-4 Turbo is a cheaper and faster version of GPT-4. It has an updated knowledge cutoff of April 2023, with OpenAI stating they will keep it up-to-date. It has a 128k context window so it can fit the equivalent of more than 300 pages of text in a single prompt.&nbsp;</p><h3><strong>Availab</strong>i<strong>lity</strong></h3><p>Available for all paying developers</p><h3><strong>Why it matters?</strong></h3><ul><li><p><strong>Cheaper</strong> - having GPT-4 level capabilities at a 2.75x cheaper cost enables multiple applications that up until now didn&#8217;t make sense from a margins perspective.</p></li><li><p><strong>Faster</strong> - latency is a main consideration for LLM builders, especially for those building user-facing apps. A faster yet capable model unlocks applications for which latency is a core part of the user experience.</p></li><li><p><strong>Longer context window</strong> - context extends the language model&#8217;s knowledge, e.g. by augmenting it with your company&#8217;s Slack data. Up until today, the GPT-4 context window was limited to 32k tokens. A 4x longer context window means more data can fit in one prompt, reducing the need for retrieval augmented generation and the frequency of hallucinations by grounding GPT&#8217;s response in your data.</p></li></ul><h3>Examples of affected companies</h3><p>LLM API companies such as AWS Bedrock, Google PaLM 2, Anthropic (Claude Instant), Hugging Face, and AI21</p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;90257cef-b0d5-4783-906b-e90ca29eac32&quot;,&quot;duration&quot;:null}"></div><div><hr></div><h2><strong>Substantial price reductions</strong></h2><p>The new GPT-4 Turbo API will be 2.75x cheaper than GPT-4. The same applies to the GPT-3.5 Turbo 16k model.</p><p>Developers using the 4k context version of GPT-3.5 would also benefit from a 33% reduction.</p><p>Fine-tuned GPT-3.5 Turbo 4K model input tokens are reduced by 3x as well.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2l3d!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d84e7fd-fa8b-4471-ad23-c3804891c2ad_1644x1094.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2l3d!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d84e7fd-fa8b-4471-ad23-c3804891c2ad_1644x1094.png 424w, https://substackcdn.com/image/fetch/$s_!2l3d!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d84e7fd-fa8b-4471-ad23-c3804891c2ad_1644x1094.png 848w, https://substackcdn.com/image/fetch/$s_!2l3d!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d84e7fd-fa8b-4471-ad23-c3804891c2ad_1644x1094.png 1272w, https://substackcdn.com/image/fetch/$s_!2l3d!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d84e7fd-fa8b-4471-ad23-c3804891c2ad_1644x1094.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2l3d!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d84e7fd-fa8b-4471-ad23-c3804891c2ad_1644x1094.png" width="1456" height="969" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6d84e7fd-fa8b-4471-ad23-c3804891c2ad_1644x1094.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:969,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:184864,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!2l3d!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d84e7fd-fa8b-4471-ad23-c3804891c2ad_1644x1094.png 424w, https://substackcdn.com/image/fetch/$s_!2l3d!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d84e7fd-fa8b-4471-ad23-c3804891c2ad_1644x1094.png 848w, https://substackcdn.com/image/fetch/$s_!2l3d!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d84e7fd-fa8b-4471-ad23-c3804891c2ad_1644x1094.png 1272w, https://substackcdn.com/image/fetch/$s_!2l3d!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d84e7fd-fa8b-4471-ad23-c3804891c2ad_1644x1094.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Updated pricing across base and fine-tuned models</figcaption></figure></div><h3><strong>Availability</strong></h3><p>Available for all paying developers</p><h3><strong>Why it matters?</strong></h3><p>GPT-powered app builders constantly monitor the cost of serving their apps. Cheaper generations translate to better margins, and better margins enable more innovation.</p><p>Substantially cheaper fine-tuned GPT models reduce the strain on long context windows and RAG applications.</p><h3>Examples of affected companies</h3><p>Anthropic, AWS Bedrock, Hugging Face, Google PaLM 2, AI21 Labs</p><div><hr></div><h2><strong>New text-to-speech (TTS) model and API</strong></h2><p>Developers can now generate human-quality speech from text via a <a href="https://platform.openai.com/docs/guides/text-to-speech">text-to-speech API</a>. The current TTS model offers six preset voices to choose from and two model variants, tts-1 and tts-1-hd.</p><p>tts-1 is optimized for real-time use cases and tts-1-hd is optimized for quality, i.e. more human-like speech in exchange of latency.</p><p>OpenAI&#8217;s TTS supports real-time audio streaming and pricing starts at $0.015 per input 1,000 characters. For comparison, ElevenLabs, which is considered to be the current best TTS service, starts at $0.165 per 1k characters. That&#8217;s &gt;10x the cost.</p><p>OpenAI's preset voice Nova in action:</p><div class="native-audio-embed" data-component-name="AudioPlaceholder" data-attrs="{&quot;label&quot;:null,&quot;mediaUploadId&quot;:&quot;ed26572a-ce95-4bc8-8a39-6d9ce2982029&quot;,&quot;duration&quot;:13.165714,&quot;downloadable&quot;:false,&quot;isEditorNode&quot;:true}"></div><h3><strong>Availability</strong></h3><p>Available for all developers</p><h3><strong>Why it matters?</strong></h3><p>Generating voice was the missing modality in OpenAI&#8217;s ecosystem. An OpenAI-grade TTS engine, that is cheaper and faster thanks to OpenAI&#8217;s economies of scale, will enable more business use cases and increase competition in the space.</p><p>Combined with OpenAI&#8217;s new GPTs vision (see below), a future of OpenAI-powered voice assistants is imminent.</p><h3>Examples of affected companies</h3><p>ElevanLabs, PlayHT, Coqui, Resemble AI, and cloud providers (AWS Polly, Google Text-to-Speech, Azure TTS)</p><div><hr></div><h2><strong>Whisper v3</strong></h2><p>Whisper is OpenAI&#8217;s cutting-edge Automatic Speech Recognition model (ASR) model. Whisper&#8217;s open-source release in Sep 2022 had a profound impact on the speech2text industry and enabled many speech-powered applications since then.</p><p>Whisper large-v3 is OpenAI&#8217;s next-generation ASR which features improved performance across languages.</p><h3><strong>Availability</strong></h3><p>Immediately via the <a href="https://github.com/openai/whisper">Whisper package on GitHub</a>. API access will arrive in the &#8220;near future&#8221;.</p><h3><strong>Why it matters?</strong></h3><p>Natural language is how humans interact with one another. Incorporating Whisper v3 into the OpenAI API ecosystem will make this technology more accessible to developers, enabling them to integrate sophisticated speech-to-text features into their applications. This move would also drive the commoditization of the currently expensive speech2text market, allowing for a broader spectrum of uses and users.</p><h3><strong>Examples of affected companies</strong></h3><p>Deepgram, Azure TTS, Google Text-to-Speech AI, Amazon Transcribe</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hcz_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ee592dc-3fd6-42a7-bff5-3f54aac8174d_1106x1586.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hcz_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ee592dc-3fd6-42a7-bff5-3f54aac8174d_1106x1586.png 424w, https://substackcdn.com/image/fetch/$s_!hcz_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ee592dc-3fd6-42a7-bff5-3f54aac8174d_1106x1586.png 848w, https://substackcdn.com/image/fetch/$s_!hcz_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ee592dc-3fd6-42a7-bff5-3f54aac8174d_1106x1586.png 1272w, https://substackcdn.com/image/fetch/$s_!hcz_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ee592dc-3fd6-42a7-bff5-3f54aac8174d_1106x1586.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hcz_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ee592dc-3fd6-42a7-bff5-3f54aac8174d_1106x1586.png" width="618" height="886.2097649186256" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2ee592dc-3fd6-42a7-bff5-3f54aac8174d_1106x1586.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1586,&quot;width&quot;:1106,&quot;resizeWidth&quot;:618,&quot;bytes&quot;:237801,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!hcz_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ee592dc-3fd6-42a7-bff5-3f54aac8174d_1106x1586.png 424w, https://substackcdn.com/image/fetch/$s_!hcz_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ee592dc-3fd6-42a7-bff5-3f54aac8174d_1106x1586.png 848w, https://substackcdn.com/image/fetch/$s_!hcz_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ee592dc-3fd6-42a7-bff5-3f54aac8174d_1106x1586.png 1272w, https://substackcdn.com/image/fetch/$s_!hcz_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ee592dc-3fd6-42a7-bff5-3f54aac8174d_1106x1586.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Comparing Whisper v2 and v3 across languages. A smaller value means better performance (WER stands for Word Error Rate)</figcaption></figure></div><div><hr></div><h2><strong>Assistants API and Retrieval</strong></h2><p>Using the <a href="https://platform.openai.com/docs/assistants/overview">Assistants API</a>, developers can create agent-like AI within their applications, equipped with specialized functions like Code Interpreter, Retrieval, and function calling for efficient task execution. No more fancy Retrieval Augmented Generation (RAG) pipelines. Users just need to upload files to extend GPT&#8217;s knowledge. This alone eliminates GPT wrappers like ChatPDF and the need for smaller LangChain apps for conversing over your data.</p><p>The Assistants API also features persistent threads, which are designed to allow developers to manage long-running conversations and complex tasks without the limitations of a short-term memory context, enabling more coherent and contextually aware interactions over time.</p><p>Assistants can run Python code, manage diverse data, create visual content, and tap into external knowledge sources, eliminating the need for developers to embed or search through large datasets. Developers can also define and call custom functions through the API, e.g. calculating shipping costs based on weight, dimensions, and destination provided by the customer in a chat message.</p><h3><strong>Availability</strong></h3><p>Available for all developers under a beta program</p><h3><strong>Why it matters?</strong></h3><p>The Assistants API and interface is a no-code builder for intelligence AI agents. Non-engineers can build small pieces of software powered by OpenAI&#8217;s models and then share them with others. They can even monetize those agents through OpenAI&#8217;s new GPT Store (see below).</p><p>Users can also expand their assistant&#8217;s knowledge by uploading documents such as PDFs, Excel files, etc., removing the need for fancy RAG frameworks or customized LangChain scripts. OpenAI takes care of this all!</p><h3>Examples of affected companies</h3><p>AutoGPT, Characther.AI, ChatPDF, LangChain, Adept, Hugging Face</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!eRIe!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8044c76-b6b7-4417-bcf8-f9298126cd4b_600x600.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!eRIe!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8044c76-b6b7-4417-bcf8-f9298126cd4b_600x600.gif 424w, https://substackcdn.com/image/fetch/$s_!eRIe!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8044c76-b6b7-4417-bcf8-f9298126cd4b_600x600.gif 848w, https://substackcdn.com/image/fetch/$s_!eRIe!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8044c76-b6b7-4417-bcf8-f9298126cd4b_600x600.gif 1272w, https://substackcdn.com/image/fetch/$s_!eRIe!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8044c76-b6b7-4417-bcf8-f9298126cd4b_600x600.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!eRIe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8044c76-b6b7-4417-bcf8-f9298126cd4b_600x600.gif" width="600" height="600" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f8044c76-b6b7-4417-bcf8-f9298126cd4b_600x600.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:600,&quot;width&quot;:600,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2935217,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!eRIe!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8044c76-b6b7-4417-bcf8-f9298126cd4b_600x600.gif 424w, https://substackcdn.com/image/fetch/$s_!eRIe!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8044c76-b6b7-4417-bcf8-f9298126cd4b_600x600.gif 848w, https://substackcdn.com/image/fetch/$s_!eRIe!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8044c76-b6b7-4417-bcf8-f9298126cd4b_600x600.gif 1272w, https://substackcdn.com/image/fetch/$s_!eRIe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8044c76-b6b7-4417-bcf8-f9298126cd4b_600x600.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Assistants Playground</figcaption></figure></div><div><hr></div><h2><strong>GPTs and the GPT Store, &#224; la OpenAI&#8217;s App Store</strong></h2><p>Users can now create customized ChatGPT versions, dubbed GPTs, enabling users to craft personalized AI for specific uses like learning, work, or leisure, and to share these with others.</p><p>GPTs offer task-oriented assistance, such as explaining board game rules, teaching math, or designing graphics. They require zero coding skills and come with the capability to perform web searches, create images, and analyze data.&nbsp;</p><p>GPTs would also have access to custom actions, connecting them to external APIs and enabling real-world interactions. This functionality can transform GPTs into versatile tools capable of interacting with databases, managing emails, or assisting with shopping. Building on the Plugins beta experience, the update gives developers more control and simplifies the transition for those with existing plugins, allowing them to seamlessly integrate these capabilities into their GPTs.</p><p>Lastly, OpenAI enables users to create and share custom GPTs, with a monetizable GPT Store launching soon.</p><h3><strong>Availability</strong></h3><p>Immediately for ChatGPT Plus and Enterprise users</p><h3><strong>Why it matters?</strong></h3><p>Customizable GPTs mark a pivotal shift towards more personalized and specific AI utility. Those without coding expertise will be able to craft AI tools for a range of tasks, thereby broadening the technology's accessibility and application. </p><p>For <strong>developers</strong>, it facilitates the integration of AI with other services, fostering more dynamic and practical uses in real-world scenarios.</p><p>For <strong>enterprises</strong>, custom GPTs offer a new frontier in customization, enabling the creation of AI tailored to specific corporate needs and proprietary data. Companies can now streamline operations, creating AI solutions for internal tasks like marketing, customer support, and employee onboarding while ensuring data privacy. Such integration makes many startups that raised lofty funding rounds to monetize generative AI for the enterprise almost obsolete.</p><p>For <strong>consumers</strong>, having specialized GPTs erodes the unique selling proposition of companies such as Character AI, which just last month saw almost 5M monthly active users.</p><h3>Examples of affected companies</h3><p>AutoGPT, Adept, Character AI, LangChain, Contextual AI, Hugging Face</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dul3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F352de6b1-39e8-465d-bdac-d400e457f083_1199x670.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dul3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F352de6b1-39e8-465d-bdac-d400e457f083_1199x670.jpeg 424w, https://substackcdn.com/image/fetch/$s_!dul3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F352de6b1-39e8-465d-bdac-d400e457f083_1199x670.jpeg 848w, https://substackcdn.com/image/fetch/$s_!dul3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F352de6b1-39e8-465d-bdac-d400e457f083_1199x670.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!dul3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F352de6b1-39e8-465d-bdac-d400e457f083_1199x670.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dul3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F352de6b1-39e8-465d-bdac-d400e457f083_1199x670.jpeg" width="1199" height="670" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/352de6b1-39e8-465d-bdac-d400e457f083_1199x670.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:670,&quot;width&quot;:1199,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Image&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Image" title="Image" srcset="https://substackcdn.com/image/fetch/$s_!dul3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F352de6b1-39e8-465d-bdac-d400e457f083_1199x670.jpeg 424w, https://substackcdn.com/image/fetch/$s_!dul3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F352de6b1-39e8-465d-bdac-d400e457f083_1199x670.jpeg 848w, https://substackcdn.com/image/fetch/$s_!dul3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F352de6b1-39e8-465d-bdac-d400e457f083_1199x670.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!dul3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F352de6b1-39e8-465d-bdac-d400e457f083_1199x670.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">A host of agents</figcaption></figure></div><div><hr></div><h2><strong>Other notable announcements</strong></h2><ul><li><p><a href="https://platform.openai.com/docs/guides/images?context=node">DALL-E 3 API access</a>, with pricing starting at $0.04 per image</p></li><li><p><a href="https://openai.com/form/custom-models">Custom models</a> - a new program from OpenAI that offers selected organizations the chance to collaborate with researchers to create bespoke GPT-4 models that are highly specialized to their domains, ensuring exclusive access and privacy for their proprietary data</p></li><li><p>GPT-4 fine-tuning</p></li><li><p><a href="https://platform.openai.com/docs/guides/function-calling">Function calling</a> is now more efficient and accurate, with updates that enable the calling of multiple functions in a single message and enhancements that increase the likelihood of returning the correct function parameters.</p></li><li><p>Higher rate limits - doubling the current tokens per minute limit and allowing users to request rate increases.</p></li><li><p><a href="https://platform.openai.com/docs/guides/text-generation/json-mode">JSON mode</a> - ensuring GPT returns valid JSON outputs</p></li><li><p>A <a href="https://platform.openai.com/docs/guides/text-generation/reproducible-outputs">new seed parameter</a> for consistent completions, facilitating debugging, comprehensive unit testing, and enhanced control over model behavior.</p></li><li><p>Copyright Shield - OpenAI will step in to defend and pay the costs incurred of any legal claims around copyright infringement for its users.</p></li></ul><h2>A new era</h2><p>It is clear that the horizon for AI is not just broadening&#8212;it's being redefined. </p><p>With the advent of more efficient, cost-effective, and powerful models like GPT-4 Turbo, new text-to-speech capabilities, and an innovative GPT Store, OpenAI is empowering creators and businesses with tools that were once out of reach.</p><p>The implications are profound: barriers to entry are crumbling, enabling a democratization of technology that accelerates innovation at a breakneck pace. The sheer volume of possibilities for personalization, integration, and expansion in AI applications is staggering.</p><p>As developers, entrepreneurs, and technologists harness these breakthroughs, the next wave of AI utility and business models is upon us. To all those at the forefront of this change: the future is not just knocking, it has already stepped through the door.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.aitidbits.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">AI Tidbits is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Revolutionizing document processing with multimodal GPT]]></title><description><![CDATA[A set of experiments with the new GPT-4V demonstrating its potential in making a whole industry obsolete]]></description><link>https://www.aitidbits.ai/p/doc-extraction-gpt4</link><guid isPermaLink="false">https://www.aitidbits.ai/p/doc-extraction-gpt4</guid><dc:creator><![CDATA[Sahar Mor]]></dc:creator><pubDate>Mon, 30 Oct 2023 14:30:30 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!pUGl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a4c326a-53e0-492d-b375-9c69899b8fcd_800x1032.gif" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Welcome to Deep Dives <strong>- </strong>an AI Tidbits section providing editorial takes and insights to make sense of the latest in AI. Let&#8217;s go!</em></p><div><hr></div><p>In June 2020, OpenAI unveiled GPT-3. As a veteran in the document processing domain, I had long recognized the limitations of prevailing document extraction technologies, which largely relied on rigid, rule-based logic. I wondered if language models could be the answer to intelligent data extraction. And indeed, they were.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pUGl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a4c326a-53e0-492d-b375-9c69899b8fcd_800x1032.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pUGl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a4c326a-53e0-492d-b375-9c69899b8fcd_800x1032.gif 424w, https://substackcdn.com/image/fetch/$s_!pUGl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a4c326a-53e0-492d-b375-9c69899b8fcd_800x1032.gif 848w, https://substackcdn.com/image/fetch/$s_!pUGl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a4c326a-53e0-492d-b375-9c69899b8fcd_800x1032.gif 1272w, https://substackcdn.com/image/fetch/$s_!pUGl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a4c326a-53e0-492d-b375-9c69899b8fcd_800x1032.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pUGl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a4c326a-53e0-492d-b375-9c69899b8fcd_800x1032.gif" width="434" height="559.86" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3a4c326a-53e0-492d-b375-9c69899b8fcd_800x1032.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1032,&quot;width&quot;:800,&quot;resizeWidth&quot;:434,&quot;bytes&quot;:2147827,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pUGl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a4c326a-53e0-492d-b375-9c69899b8fcd_800x1032.gif 424w, https://substackcdn.com/image/fetch/$s_!pUGl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a4c326a-53e0-492d-b375-9c69899b8fcd_800x1032.gif 848w, https://substackcdn.com/image/fetch/$s_!pUGl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a4c326a-53e0-492d-b375-9c69899b8fcd_800x1032.gif 1272w, https://substackcdn.com/image/fetch/$s_!pUGl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a4c326a-53e0-492d-b375-9c69899b8fcd_800x1032.gif 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">GPT-powered document intelligence with AirPaper. <a href="https://twitter.com/theaievangelist/status/1300862719969681411">Source</a></figcaption></figure></div><p>What started as a side project turned into a venture called <a href="https://airpaper.ai/">AirPaper</a>. Back then, GPT-3 was the cutting-edge language model, and it was only one API call away. The main challenges were that GPT-3 was expensive, 55x compared to today&#8217;s GPT-3.5 Turbo, and had a tiny context window of 2,048 tokens, compared to today&#8217;s 32k.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6dxZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fa67465-c1f3-499c-b200-84293c0347c6_1260x542.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6dxZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fa67465-c1f3-499c-b200-84293c0347c6_1260x542.png 424w, https://substackcdn.com/image/fetch/$s_!6dxZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fa67465-c1f3-499c-b200-84293c0347c6_1260x542.png 848w, https://substackcdn.com/image/fetch/$s_!6dxZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fa67465-c1f3-499c-b200-84293c0347c6_1260x542.png 1272w, https://substackcdn.com/image/fetch/$s_!6dxZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fa67465-c1f3-499c-b200-84293c0347c6_1260x542.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6dxZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fa67465-c1f3-499c-b200-84293c0347c6_1260x542.png" width="1260" height="542" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2fa67465-c1f3-499c-b200-84293c0347c6_1260x542.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:542,&quot;width&quot;:1260,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!6dxZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fa67465-c1f3-499c-b200-84293c0347c6_1260x542.png 424w, https://substackcdn.com/image/fetch/$s_!6dxZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fa67465-c1f3-499c-b200-84293c0347c6_1260x542.png 848w, https://substackcdn.com/image/fetch/$s_!6dxZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fa67465-c1f3-499c-b200-84293c0347c6_1260x542.png 1272w, https://substackcdn.com/image/fetch/$s_!6dxZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fa67465-c1f3-499c-b200-84293c0347c6_1260x542.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">OpenAI&#8217;s original pricing, June 2020</figcaption></figure></div><p>Another challenge was that language models, even if performant, only play along with text. This necessitated an extensive preprocessing phase to prepare documents for GPT: extracting text, &#224; la OCR, structuring it in a way that would fit GPT&#8217;s limited context window, and intelligently mapping GPT's output to the relevant fields, such as invoice numbers or sales tax amounts on an invoice.</p><p>That was in 2020.</p><h2>Enter Multimodal AI</h2><p>The space of document intelligence has undergone massive shifts in recent years, with the underlying technology getting gradually commoditized. More and more state-of-the-art libraries were released, most of them with a commercially permissible license:</p><ul><li><p><a href="https://github.com/clovaai/donut">Donut &#127849;</a></p></li><li><p><a href="https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.7/README_en.md">PaddleOCR</a></p></li><li><p><a href="https://huggingface.co/microsoft/layoutlmv3-large">layoutlm-document-qa</a></p></li><li><p><a href="https://github.com/deepdoctection/deepdoctection">Deepdoctection</a></p></li><li><p>And the impressive <a href="https://huggingface.co/microsoft/layoutlmv3-large">LayoutLMv3</a>, which is the only one not allowing commercial use</p></li></ul><p>But then powerful multimodal AI in the form of <a href="https://llava.hliu.cc/">LLaVA</a> and GPT-4V arrived. Easily accessible for anyone with ChatGPT access.</p><p>I again wondered, can GPT-4V turn mere images of documents into structured data? The results were mindblowing.</p><p>Let's dive deeper into some of the use cases I've explored and their results.</p>
      <p>
          <a href="https://www.aitidbits.ai/p/doc-extraction-gpt4">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[The era of AI-powered SMBs]]></title><description><![CDATA[Who needs VCs&#8217; money anyway?]]></description><link>https://www.aitidbits.ai/p/ai-powered-smbs</link><guid isPermaLink="false">https://www.aitidbits.ai/p/ai-powered-smbs</guid><dc:creator><![CDATA[Sahar Mor]]></dc:creator><pubDate>Sun, 24 Sep 2023 15:00:30 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/fbf9e3ae-a4ee-4758-8510-834bad752d4e_480x360.gif" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Welcome to Deep Dives <strong>- </strong>an AI Tidbits section providing editorial takes and insights to make sense of the latest in AI. Let&#8217;s go!</em></p><div><hr></div><p>Three years ago, I started a company that turns PDF and image documents into structured data. My twist? Using language models. Two years later, I decided to refocus my energy elsewhere with the main reason being commoditization. The OCR and document intelligence market were a race to the bottom, and even though I could raise VC money - I realized it was a lost war.</p><p>Since then, the same commoditization conclusion has spread to many more applications and fields. We are in an era in which AI can finally construct plausible essays. Actually, not only that - it can <a href="https://arxiv.org/abs/2309.07430?utm_source=aitidbits.substack.com&amp;utm_medium=newsletter">outperform doctors</a> in summarizing clinical text, writing poems, and being your <a href="https://www.vice.com/en/article/z3mnve/we-spoke-to-people-who-started-using-chatgpt-as-their-therapist">therapist</a>. It can turn text into songs, images, and videos in mere seconds and at a fraction of the cost of what used to be cutting-edge technology just a few years ago.</p><p>Those with great aspirations would conclude such breakthroughs prompt starting a venture. Their mission? Democratize [enter profession here] and make it more accessible to all. The route? Raising money from venture capital funds.</p><p>That makes sense. A whole generation from 2008-2023, myself included, grew up in an environment that worships growth at all costs. The caliber of talent you can access will depend on the VCs on your About page. An era where VCs ask: &#8220;<em>Well, what if Google builds the same thing?</em>&#8221;, followed by a room-wide laugh, acknowledging the fact that incumbents don&#8217;t build, they buy.</p><p>Things have changed. Incumbents are no longer sleeping giants. The latest technology makes building new products too easy for companies to spend time integrating with that new YC startup, and disruptive progress is outpacing distribution, preventing companies from building their distribution moat before getting commoditized (see <a href="https://www.theinformation.com/articles/jasper-an-early-generative-ai-winner-cuts-internal-valuation-as-growth-slows">Jasper</a>).</p><p>The tricky part? We are far from reaching a standstill. With the releases of multimodal GPT and Google&#8217;s Gemini around the corner, things are only going to become less predictable. The times of VC-backed ventures as the default is over, and the faster we adjust, the better.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!t4dE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9209ed9-4cba-4db1-8b0e-de2911568245_1600x893.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!t4dE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9209ed9-4cba-4db1-8b0e-de2911568245_1600x893.png 424w, https://substackcdn.com/image/fetch/$s_!t4dE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9209ed9-4cba-4db1-8b0e-de2911568245_1600x893.png 848w, https://substackcdn.com/image/fetch/$s_!t4dE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9209ed9-4cba-4db1-8b0e-de2911568245_1600x893.png 1272w, https://substackcdn.com/image/fetch/$s_!t4dE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9209ed9-4cba-4db1-8b0e-de2911568245_1600x893.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!t4dE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9209ed9-4cba-4db1-8b0e-de2911568245_1600x893.png" width="658" height="367.41346153846155" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b9209ed9-4cba-4db1-8b0e-de2911568245_1600x893.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:813,&quot;width&quot;:1456,&quot;resizeWidth&quot;:658,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!t4dE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9209ed9-4cba-4db1-8b0e-de2911568245_1600x893.png 424w, https://substackcdn.com/image/fetch/$s_!t4dE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9209ed9-4cba-4db1-8b0e-de2911568245_1600x893.png 848w, https://substackcdn.com/image/fetch/$s_!t4dE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9209ed9-4cba-4db1-8b0e-de2911568245_1600x893.png 1272w, https://substackcdn.com/image/fetch/$s_!t4dE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9209ed9-4cba-4db1-8b0e-de2911568245_1600x893.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Multimodal GPT announced soon?</figcaption></figure></div><p><em>P.S. It is not all gloomy. I also explore a few areas where I still see opportunities for VC-backed startups to operate in the outro section below.</em></p><h1>New Truths</h1><h2>#1 The energized giants</h2>
      <p>
          <a href="https://www.aitidbits.ai/p/ai-powered-smbs">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[The Multiprocessor of Language Models]]></title><description><![CDATA[How having a centralized LLM API can leapfrog companies ahead of the competition]]></description><link>https://www.aitidbits.ai/p/the-multiprocessor-of-language-models</link><guid isPermaLink="false">https://www.aitidbits.ai/p/the-multiprocessor-of-language-models</guid><dc:creator><![CDATA[Sahar Mor]]></dc:creator><pubDate>Sun, 20 Aug 2023 15:30:09 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/0df50929-00cd-4dd8-8768-9ca090ebe0bd_200x250.gif" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Welcome to Deep Dives <strong>- </strong>an AI Tidbits section providing editorial takes and insights to make sense of the latest in AI. Let&#8217;s go!</em></p><div><hr></div><p>I was somewhat new to the Payments space when I joined Stripe. I remember being dazzled by the sheer amount of complexities taking place when one hits the &#8220;Book a Ride&#8221; button to catch an Uber. It was at Stripe where I got to learn about a new concept - Multiprocessor.&nbsp;</p><p>Imagine you are the Head of Payments at Amazon or Uber&#8212;you operate across dozens of countries, serve multiple types of customers, and employ different pricing structures, such as subscription-based and one-time payments. How can you maximize your approval rate, i.e. ensure every legitimate payment is approved? How do you avoid losing millions of dollars for every minute your partnering payment processor is down?</p><p>Enter <a href="https://www.pymnts.com/news/retail/2019/merchants-multiple-payment-processing-modo/">Multiprocessor</a>.</p><p>The likes of Amazon partner with more than one payment processor (Stripe is one). That way, they ensure redundancy by falling back to another processor if their main one is down. They also achieve maximal approval rates by routing payments to regional processors based on their geography, e.g., using a European processor for a Norway-based cardholder. Beyond better payment performance, they get to leverage this multiprocessor setup to drive down partners&#8217; costs and push them to deliver better results. Otherwise, they will churn some of their payment traffic.</p><p>Now let&#8217;s talk about language models. Today, there are five major proprietary LLM providers: OpenAI (GPT), Google (PaLM), Anthropic (Claude), Inflection (Pi), and Cohere. There are dozens more open-source language models, small and large, with the <a href="https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard">best-performing ones</a> including Meta&#8217;s Llama 2, Stability AI&#8217;s StableBeluga2, and the recently-introduced <a href="https://platypus-llm.github.io/">Platypus</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TGV3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2136d3c-7dff-430a-b18f-d2bfbefd0414_1600x455.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TGV3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2136d3c-7dff-430a-b18f-d2bfbefd0414_1600x455.png 424w, https://substackcdn.com/image/fetch/$s_!TGV3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2136d3c-7dff-430a-b18f-d2bfbefd0414_1600x455.png 848w, https://substackcdn.com/image/fetch/$s_!TGV3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2136d3c-7dff-430a-b18f-d2bfbefd0414_1600x455.png 1272w, https://substackcdn.com/image/fetch/$s_!TGV3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2136d3c-7dff-430a-b18f-d2bfbefd0414_1600x455.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TGV3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2136d3c-7dff-430a-b18f-d2bfbefd0414_1600x455.png" width="720" height="204.72527472527472" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d2136d3c-7dff-430a-b18f-d2bfbefd0414_1600x455.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:414,&quot;width&quot;:1456,&quot;resizeWidth&quot;:720,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!TGV3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2136d3c-7dff-430a-b18f-d2bfbefd0414_1600x455.png 424w, https://substackcdn.com/image/fetch/$s_!TGV3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2136d3c-7dff-430a-b18f-d2bfbefd0414_1600x455.png 848w, https://substackcdn.com/image/fetch/$s_!TGV3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2136d3c-7dff-430a-b18f-d2bfbefd0414_1600x455.png 1272w, https://substackcdn.com/image/fetch/$s_!TGV3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2136d3c-7dff-430a-b18f-d2bfbefd0414_1600x455.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a><figcaption class="image-caption">A snapshot of today&#8217;s open-source leaderboard, Aug 20th, 2023</figcaption></figure></div><p>Each one of these models has its own traits and quirks: faster, cheaper, easier to experiment with, fits on consumer hardware, or is better suited for specific language tasks. Numerous guides teach how to choose the right LLM, with everyone stating the obvious - start with OpenAI and move from there. The truth is everyone anyway defaults to GPT-4. &#8220;We will take care of costs and latency later&#8221; is usually what I hear.</p><p>There is merit to it, but why choose one?</p><h2>A unified LLM API layer, &#224; la LLM Multiprocessor</h2>
      <p>
          <a href="https://www.aitidbits.ai/p/the-multiprocessor-of-language-models">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[The future of Internet Search in the era of LLMs]]></title><description><![CDATA[How OpenAI's GPTBot and Google's new generative-AI-infused Search will redefine how we seek information and shop online]]></description><link>https://www.aitidbits.ai/p/future-of-internet-search</link><guid isPermaLink="false">https://www.aitidbits.ai/p/future-of-internet-search</guid><dc:creator><![CDATA[Sahar Mor]]></dc:creator><pubDate>Sun, 13 Aug 2023 15:31:31 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29e8f6f3-55fa-4805-8e90-301404560ddb_730x440.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Welcome to Deep Dives <strong>- </strong>an AI Tidbits section providing editorial takes and insights to make sense of the latest in AI. Let&#8217;s go!</em></p>
      <p>
          <a href="https://www.aitidbits.ai/p/future-of-internet-search">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Open-source Generative AI]]></title><description><![CDATA[A comprehensive list of >60 repositories every AI builder should know.]]></description><link>https://www.aitidbits.ai/p/open-source-llms</link><guid isPermaLink="false">https://www.aitidbits.ai/p/open-source-llms</guid><dc:creator><![CDATA[Sahar Mor]]></dc:creator><pubDate>Sun, 06 Aug 2023 16:30:15 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/885bba4a-9f47-4763-82f1-b7b9196ed69d_1664x958.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Welcome to Deep Dives <strong>- </strong>a new section<strong>&nbsp;</strong>of AI Tidbits providing editorial takes and insights to make sense of the latest in AI. Let&#8217;s go!</em></p><div><hr></div><p><em>&#8220;What do you mean there is an open-source library for that? We built the entire thing ourselves&#8221;</em> is a quote I often hear from builders in the LLM space. I&#8217;ve been building with LLMs for the last year and turned my personal list of &gt;60 useful packages into a public table so others won&#8217;t experience the same frustration.<br><br>Each one of these packages will save you hours and enable use cases you were interested in building yet didn't have the technical chops to do so.</p><p><strong>Highlighted packages (link to full list below)</strong></p><ol><li><p><a href="https://github.com/jerryjliu/llama_index?utm_source=aitidbits.substack.com&amp;utm_medium=newsletter">LlamaIndex</a> - a framework to augment LLM applications with private data</p></li><li><p><a href="https://github.com/AntonOsika/gpt-engineer?utm_source=aitidbits.substack.com&amp;utm_medium=newsletter">GPT Engineer</a> - let AI code full apps for you</p></li><li><p><a href="https://github.com/mukulpatnaik/researchgpt?utm_source=aitidbits.ai&amp;utm_medium=newsletter">ResearchGPT</a> - an LLM-based research assistant that allows you to have a conversation with a research paper</p></li><li><p><a href="https://github.com/context-labs/autodoc?utm_source=aitidbits.substack.com&amp;utm_medium=newsletter">Autodoc</a> - a toolkit for auto-generating codebase documentation using LLMs</p></li><li><p><a href="https://github.com/mlc-ai/web-llm?utm_source=aitidbits.substack.com&amp;utm_medium=newsletter">web-llm</a> - run LLMs in web browsers without the need for a backend server</p></li><li><p><a href="https://github.com/oobabooga/text-generation-webui?utm_source=aitidbits.substack.com&amp;utm_medium=newsletter">Text generation web UI</a> - web UI for running LLMs like Llama 2, llama.cpp, and Vicuna</p></li><li><p><a href="https://github.com/Mintplex-Labs/anything-llm?utm_source=aitidbits.substack.com&amp;utm_medium=newsletter">AnythingLLM</a> - turn any document into an intelligent chatbot</p></li><li><p><a href="https://github.com/vocodedev/vocode-python?utm_source=aitidbits.substack.com&amp;utm_medium=newsletter">Vocode</a> - build voice-based LLM agents</p></li><li><p><a href="https://github.com/young-geng/EasyLM?utm_source=aitidbits.substack.com&amp;utm_medium=newsletter">EasyLM</a> - a one-stop solution for pre-training, finetuning, evaluating, and serving LLMs in JAX/Flax.</p></li><li><p><a href="https://github.com/getmetal/motorhead?utm_source=aitidbits.substack.com&amp;utm_medium=newsletter">Motorhead</a> - a memory and information retrieval server for LLMs</p></li></ol><p><br>Full table with &gt;60 packages (I keep it up to date):</p>
      <p>
          <a href="https://www.aitidbits.ai/p/open-source-llms">
              Read more
          </a>
      </p>
   ]]></content:encoded></item></channel></rss>