The Loop Is the Product

Loop engineering is being framed as a coding-agent trick. The more important reading is that the loop is a product pattern: an autonomous agent is a loop closed over a human's goal tightly enough that they don't have to stay in the turn. A first-person, skeptical take, with Otto as the example.

On loop engineering, and why it matters more for products than for code.

There’s a phrase going around right now: loop engineering. The idea wasn’t Addy Osmani’s — Boris Cherny, who built Claude Code, had said “my job is to write loops,” and Peter Steinberger had urged developers to “design the loops that prompt your agents” — but Osmani is the one who put a name on it, in a post in early June that defined it as “replacing yourself as the person who prompts the agent — you design the system that does it instead.” The label stuck, which tells you the shape was already everywhere and just needed a word.

I find the idea genuinely exciting, and I’m also skeptical of it, and I want to hold both of those at once for the length of this post. Because I think the people making the argument are mostly making a smaller version of it than the one that’s actually true.

What a loop is

Strip away the tooling and a loop is almost embarrassingly simple. You take an action, you observe what happened, you reason about it, and you repeat until a goal is met. That’s it. It’s distinct from single-shot prompting, where you ask once and take whatever comes back, and from a linear chain, where you wire up a fixed sequence of steps that runs start to finish regardless of what it learns along the way. A loop has a goal and a stopping condition, and it keeps going until the second satisfies the first.

This is not a new shape. The military strategist John Boyd described decision-making as an Observe–Orient–Decide–Act loop in the 1970s. Control theory has run on monitor-analyze-plan-execute feedback loops for longer than that. What’s new is that the thing inside the loop — the part that observes and reasons — can now be a language model, which means the loop can handle goals you could never have specified as a fixed program.

The research community has spent the last three years mapping the variants, and they rhyme with each other in a useful way. ReAct (Yao et al., 2022) interleaves a reasoning step with an action and an observation, over and over — the model thinks, calls a tool, reads the result, thinks again. Reflexion (Shinn et al., 2023) wraps a second loop around the first: after a full attempt, the agent writes itself a verbal critique of what went wrong and carries that note into the next try, so it improves across attempts rather than just within one. Plan-Execute loops separate a planner from an executor and re-plan when reality diverges from the plan. Generator–Verifier loops put one model’s output in front of a second model that judges it, and only stop when the judge is satisfied. There are hierarchical versions, where a high-level controller decomposes a goal and hands sub-goals to inner loops, and reward-driven versions that turn the whole thing into a training signal.

If you read the literature as a taxonomy — which I did, with some help — the through-line is that every one of these is a way of answering two questions: how do I decide what to do next, and how do I know when to stop. Everything else is detail.

Where I get nervous

Before the bigger claim, the honest caveat. A loop that runs unattended is also a loop that makes mistakes unattended, and a loop that runs every few minutes spends money every few minutes — and that cost is wildly uneven. If you’re token-rich, you can afford a verifier checking every step all day and the loop just hums; if you’re token-poor, the same design bankrupts you, and you end up rationing the very iterations that make it work. Geoffrey Huntley’s “Ralph loop” — a bash loop that resets the agent’s context each iteration and reloads its state from disk — got popular precisely because keeping a long-running loop from drowning in its own context takes real discipline. So “stop prompting, write loops” isn’t something you adopt wholesale. It’s a leverage point that moved, and whether it helps you depends on judgment you still have to supply.

Hold that thought. It comes back, and it bites hardest in the place almost nobody is talking about.

The loop is not just an architecture pattern

Almost everyone writing about loop engineering is writing about coding agents. The loop babysits your pull requests, triages your CI failures, fixes the bug somebody added last week. That’s a real and useful application, and it’s also a strangely narrow one, because the loop in those examples runs against a codebase — a closed world with tests as ground truth and a repo as memory.

The more interesting claim is that the loop is also a product pattern, and that the world can be the thing it runs against.

I work on an autonomous travel agent, and the clearest way I can explain what I mean is to point at what it already does. At the architecture level, none of this is novel — the agent runs ReAct-style reasoning-action loops to handle a booking, and Reflexion-style self-correction when a step fails. That’s table stakes now; it’s the same machinery everyone’s agents run on. The part I think is underappreciated is that the most valuable loops in the product aren’t inside the agent’s turn at all. They run on a timer, against reality, with no human in the loop and no chat window open.

Price-drop monitoring is a loop: observe the fare on a trip you’ve already booked, reason about whether the drop is worth acting on, rebook if it clears the bar, repeat until the trip is over. Disruption management is a loop: watch for the cancelled flight or the missed connection, reason about the options, rebook the traveler, notify them — and the observe step is running against the airline’s world, not a database we control. Future-trip detection is a loop that watches for the signal that a trip is even going to happen before the customer has asked for anything. None of these are “the user prompts the agent and reads the reply.” The user set a goal once — get me there, keep me whole, don’t make me think about it — and the loop carries that goal forward indefinitely.

That’s the move I want to make with the phrase. Loop engineering, applied to a product, isn’t about replacing the engineer who prompts the agent. It’s about replacing the customer as the person who has to keep poking the system to get value out of it. The single-shot version of a travel product is a search box: you ask, it answers, you’re back to holding the tool one turn at a time. The loop version is a standing goal that the product keeps satisfying while you do literally anything else.

Why this is the right frame for autonomy

If your vision is fully autonomous agents that complete tasks for humans — which is ours, at Otto — then “the loop is the product” isn’t a metaphor, it’s the spec. An autonomous agent is a loop that has closed over a human’s goal tightly enough that the human doesn’t have to stay in the turn. Everything that makes that hard is a loop-engineering problem: how does the loop know what the goal still is when circumstances change, how does it decide an action is worth taking, and — the one I lose sleep over — how does it know when to stop, or when to stop and ask. Get those right and you don’t have a smarter chatbot; you have a product that quietly does the thing while the customer gets on with their life.

Why it’s harder than a coding loop

This is where the worry from earlier comes back, and why it bites harder here than anywhere in the coding-agent conversation. When a coding loop is wrong, you get a bad pull request, and the cost lands on a reviewer who can throw it away. When a product loop is wrong, it rebooks a flight that shouldn’t have been touched, or moves money, or wakes a customer at 3am about a disruption that already resolved itself. The action has consequences in the world, and somebody bears them. The whole reason the research literature is obsessed with verification gates, deterministic halting, and keeping the maker separate from the checker is that an unsupervised loop with real-world side effects is exactly where things go quietly and expensively wrong.

There’s empirical weight behind this. A 2026 study from Google Research turned a family of self-improvement methods into a generator–verifier game — a loop grading its own work — and found it reliably improves and then plateaus, leaving a persistent eight-to-thirteen-percent gap to a model trained on real ground-truth answers; what narrowed the gap was a bigger, more expensive checker. A loop is only ever as good as its verifier. In a product, the verifier isn’t a benchmark detail — it’s the thing standing between the agent and a wrong booking, and making it better costs exactly the tokens that earlier caveat was about.

So this is also where the human role survives, rather than where it disappears. The loop can observe, reason, and act far faster and more tirelessly than I can. What it can’t do is bear the consequence of being wrong — that lands on a person or a company, and so designing the loop is really designing where accountability sits. You can automate the cognition all the way down. You cannot automate the skin in the game. The best loops I’ve seen don’t try to; they push the iteration out to the machine and pull the consequential decisions — the irreversible booking, the spend above a threshold, the wake-the-customer call — back to a place where someone is answerable for them.

ARIA, an agent ByteDance deployed inside TikTok Pay’s compliance screening, is almost a literal instance of this, running against more than 150 million users’ worth of activity. It does the obvious loop — read a case, judge it, move on — but the interesting part is what it does with its own uncertainty: it interrogates its own judgment, and when it finds a genuine knowledge gap it spends from a fixed budget to ask a human expert, then writes the answer into a timestamped store it can reconcile against later. The iteration runs unattended; the consequential, low-confidence calls get routed to a person who’s accountable for them; the memory lives outside any single run. And that memory is doing more work than it looks like — the agent forgets everything between runs, so whatever it carries forward has to live in a store someone deliberately maintains and prunes, or the loop quietly resets to zero every morning.

Where this leaves me

Loop engineering is a real shift, and probably a preview of how a lot of work is going to feel: you stop doing the steps and start designing the system that does them. For coding agents it’s already here — you can watch it in Claude Code’s /goal and Codex’s automations. But the version I care about points the other way. A coding loop relocates work from inside your own hands to a system you supervise. A product loop relocates it out of the customer’s life entirely — the product stops being a thing they operate and becomes a thing that operates on their behalf. That’s the bet at Otto, and it’s why I think “the loop is the product” is the more important reading of the phrase, and the one that separates it from every post telling you to go automate your pull requests.

Just don’t mistake “runs without you” for “needs no one.” The same property that makes a loop powerful is the one that should keep you honest: a loop is only as good as its stopping condition and its verifier, and the consequential calls still have to land on someone accountable. Build the loop. Then spend most of your effort on the two least glamorous parts, because in a product they aren’t engineering details — they’re the whole game: how it knows it’s done, and who answers for it when it’s wrong.

Sources & further reading

Addy Osmani, “Loop Engineering” (June 7, 2026)
The New Stack, “The Anthropic leader who built Claude Code says he ditched prompting — now he just writes loops”
Data Science Dojo, “Agentic Loops: From ReAct to Loop Engineering”
Geoffrey Huntley, “Ralph Wiggum as a software engineer” (the Ralph loop)
Yao et al., ReAct: Synergizing Reasoning and Acting in Language Models (arXiv:2210.03629)
Shinn et al., Reflexion: Language Agents with Verbal Reinforcement Learning (arXiv:2303.11366)
Qi et al., On the Generalization Gap in Self-Evolving Language Model Reasoning (Google Research, arXiv:2606.01075)
He et al., Enabling Self-Improving Agents to Learn at Test Time With Human-In-The-Loop Guidance — ARIA (arXiv:2507.17131)

DEC 4

After 9 months in Beta, Otto is now open to everyone! Read our announcement