The Rise of Cloud Coding Agents

What it’s actually like to work with today’s leading agents such as Devin, Codex, and Cursor

Sep 04, 2025

Welcome to another post in the AI Coding Series, where I'll share the strategies and insights I've developed for effective AI-assisted coding.

In this post, I break down the shift from desktop to cloud-based coding agents, exploring what makes them different, how they fit into real-world development workflows, and where each leading tool stands today. Whether you’re exploring Devin, Codex, Jules, Factory, or Cursor Background Agents, this guide will help you understand how they work, their strengths and trade-offs, and how to get the most out of them.

A NotebookLM-powered video podcast summarizing this post

Keep exploring this post:

Open in ChatGPT

Open in Claude

Agent-assisted coding is evolving quickly. Tools like Cursor, Windsurf, and Claude Code are already part of many developers’ workflows. These desktop agents run locally and rely on continuous back-and-forth: Developer drafts a coding task prompt → Coding agent generates code → Developer asks for changes/fixes → Coding agent implements change → You commit local changes as part of a pull request.

This pair-programming style boosts productivity, but it doesn’t scale. The interaction is synchronous: you must constantly steer the agent from the initial coding task prompt to creating a pull request. Running multiple coding agents in parallel feels like managing multiple junior developers simultaneously.

Real-world engineering teams work differently.

Enter cloud agents, asynchronous coding agents that better resemble a dev on your team: You assign a task → Cloud agent spins up its own environment in the cloud (as if it had its own laptop) → Cloud agent makes changes → Cloud agent opens a pull request for you to review.

You can request and merge changes once the code meets your standards. Some even integrate with Slack and other collaboration tools such as Linear and GitHub, further streamlining the development and CD/CI cycles.

In 2025, the line between desktop and cloud agents is blurring. Cognition, the creator of Devin, the web-managed cloud agent, acquired Windsurf, a Cursor-like IDE that acts as a desktop agent. Cursor, on the other hand, now offers background agents that run asynchronously both locally and on the web. Factory AI (cloud agent) offers a downloadable bridge that enables asynchronous workflows in local environments. Google’s Jules (cloud agent) just graduated out of Beta to complement Gemini desktop CLI (desktop agent), mirroring OpenAI’s Codex (web) and Codex CLI (desktop) approach.

The path for coding alongside AI is set: as models and tooling improve and best practices solidify, coding agents are shifting to asynchronous-first workflows. To clarify, autonomy isn’t a “web” feature, it’s an agent capability. It just so happens that, today, most fully autonomous agents are delivered as web-based tools.

In this post, I’ll walk you through what it’s like to work with each of the leading cloud agents, including a screen recording of my workflow so you can see how the interfaces look and behave in practice. Whether you’re curious about what these agents can actually do or trying to figure out which one fits best into your development workflow, this guide is for you. I’ve also included a comparison table at the end that makes clear which tools truly stand out.

Non-exhaustive map of the leading coding agents

Evaluation framework

Each agent was evaluated across four criteria:

Overall experience - onboarding flow, coding UX, working process smoothness (planning → execution → testing), and pull request clarity.
Team integration - how well the agent fits into real workflows: taking tasks, opening solid pull requests, addressing feedback, and communicating through platforms like Slack.
Autonomy - the level of independence from assignment to pull request: does the agent require step-by-step guidance and close supervision, or can it deliver end-to-end?
Cost - pricing model and the actual cost of completing the benchmark task.

To evaluate the agents, I gave each one the same benchmark assignment: add recurring task support to a lightweight to-do app repository:

Add support for recurring tasks. Users should be able to pick from daily, weekly, or monthly recurrence options when creating or editing a task. When a recurring task is marked complete, create the next occurrence immediately with the due date shifted by the chosen interval. Keep changes simple.

I deliberately chose a more straightforward task that all agents naturally completed successfully. The goal of this post is not to benchmark their performance, but to evaluate the experience of working with them. In future posts, I plan to conduct more complex evaluations to compare these agents on challenging, real-world tasks.

Become a premium member to access the full LLM Builders series, $1k in free credits for leading AI tools and APIs (Claude, Hugging Face, Deepgram), and editorial deep dives covering AI coding and voice agents. It's also a great way to show your support :)

Many readers expense the paid membership from their learning and development education stipend.

Upgrade to Premium

The results

Devin

Overall experience

Setup took minutes: sign in, connect GitHub, and Devin is ready to go. It scanned the codebase, created a confident plan, executed the task, and opened a well-structured pull request, all autonomously.

The experience felt like pair programming with a senior engineer: you see the shell (the command line for running code), VS Code (where code is edited), and a browser (for testing), all updating in real-time.

The pull request included a clear summary, test plan, and even a diagram, making review easy.

Devin handled feedback directly through GitHub, just like a real teammate. It felt like collaborating with someone who not only ships quality code, but also knows how to get it merged.

Team integration

Devin slots naturally into team workflows. On GitHub, you can review its code or ask it questions exactly as you would with a colleague. It also integrates with Slack, Linear, and Jira, allowing you to tag it in a thread or assign it to an issue.

Devin can also connect to MCP servers, enabling seamless connections to external tools and internal systems. Through its MCP server, Devin can pull in structured context from documentation, analytics, and monitoring platforms like Notion, Sentry, and Datadog. This makes it easier for Devin to act with deeper awareness of your infrastructure and business logic.

Autonomy

Devin is fully autonomous: once you assign a task, it produces a pull request without further input. For web apps, it can even run and test the app itself. This autonomy is powerful because it allows you to run multiple coding agents that don’t require supervision. The downside is that it can go off-track if your prompt and intentions are vague, wasting time and tokens. Fortunately, Devin has substantially improved since the last time I tested it in December, making it autonomous and useful.

Cost

Devin’s pricing is structured in Agent Compute Units (ACUs). Those units represent the work done by Devin in a single session. Steps like planning, gathering context, running code, or using the browser all consume ACUs.

Each ACU costs $2.25. My benchmark task used 3 ACUs, which comes to about $6.75. That’s steep for a simple job. This novel ACU model also introduces friction. Since no other coding agent uses it, there's no mental benchmark, making it harder for developers to estimate costs. The lack of transparency creates hesitation that hinders adoption, especially when simpler pricing models are the norm.

How to get the most out of Devin

(1) The Prompt Improvement Button

Devin has a built-in prompt improver that refines your instructions before it starts. Running prompts through it clarifies intent and removes ambiguity, which helps Devin produce more accurate, review-ready pull requests.

(2) Leverage Devin’s Knowledge capability

Knowledge lets you onboard Devin with your project’s context, just like you’d ramp up a new engineer. It serves the same purpose as coding agents context files such as cursor.md or AGENTS.md, but with structured triggers built in.

Add information in small pieces, group it in folders, and link it to repositories with triggers so Devin knows when to apply it. Store anything you would want an engineer in your team to know: coding standards, workflows, deployment steps, bug fixes, etc.. Once added, Devin recalls and applies it automatically. More tips covering Knowledge here.

(3) Devin Playbooks, à la Claude Subagents

Playbooks are reusable prompts for recurring tasks. Instead of re-explaining a process to Devin every time, create a playbook and ask Devin to use it. It’s like showing a teammate how to do something once and having them write it down so they never ask again.

(4) Connect Devin to Slack/Linear/Jira

Plug Devin into your team’s task management workflow: Assign it issues or tag it in threads. It will pick up the task immediately and get to work. That is especially useful when on the road, as you can tag Devin in a Slack conversation and ask it to take a first (and last?) pass at fixing a bug or implementing a feature.

DeepWiki: Understand Any Codebase

Sahar Mor

Aug 17

Read full story

Sahar’s Coding with AI guide

Sahar Mor

Apr 27

Read full story

LinkedIn Highlights, May 2025 - AI Coding Edition

Sahar Mor

Jun 8

Read full story