Why the Best Agent-Native Apps Use Less AI

The mark of a great agent-native application is what it doesn't send to the model.

It was 2 AM, and I was still grinding away on my 'free design.md' hackathon project, so I barely noticed it at first. But my own agent had just routed a string into a frontier model so it could parse a few dozen lines of JSON.

The response, to be fair, was perfectly formed. The schema was understood. But here's the thing that slowly dawned on me, with some horror. That whole string I'd passed in was 12 fields and about 400 bytes. But the way my system was currently wired, the agent was the only execution layer for anything that the underlying code couldn't directly support.

With only one option, the agent passed my request to an expensive frontier model, chewed on it for over 50 seconds, burned through about 50,000 tokens (because it was carrying a lot of context it didn't need), and then returned the solution. I was stunned. That same answer could have come from a JSON.parse call that would have returned in under one millisecond, with zero risk of hallucinating, and for zero cost.

That was the moment I understood I needed to write this piece. You see, the dominant agent-native discourse right now has the quality signal backward. Agent-native applications are the future. There's no denying it. There's no going back. But we've been measuring agent-native applications by their agentic surface area, by how much the agent can do, by how many tools it can reach, by how autonomous its loops are.

Instead, we should be measuring success by the inverse. By how much work an agent-native system can route back to production code, or to actions, which are newly written, reusable snippets of code that run on the backend. My app shouldn't need Claude to parse 20 lines of JSON.

So here's my argument: AI restraint will become the true quality signal for all future software.

The two-surface trap: the biggest problem with agent-native apps

It's worth asking why well-engineered AI products keep routing string parsing, arithmetic, and field lookups through the most expensive component in their stack.

The answer is architectural, not behavioral. Most agent-native applications, as the term is used today, give their users exactly two execution surfaces.

The two-surface trap — user intent splits between a fixed UI and an expensive agent, with no place else to go

The first surface is the UI. Buttons, forms, and flows are the things the developers thought to build. Fast, predictable, testable, deterministic. But fixed. If a workflow isn't in the UI, it's not available to the user.

The second surface is the agent. It can access an LLM along with whatever tools the developers chose to wire up at build time. This surface is infinitely flexible, in the sense that I can describe anything I want, and the model will attempt it. But it's also slow, expensive, non-deterministic, and prone to confident hallucinations.

The UI is finite. The user's intent is infinite. And the agent is the only available bridge. So everything that falls into the gap, every parse, every filter, every date calculation, every status lookup, every sort, gets routed through inference by default. Not because anyone decided it should be. Because there's no other place for it to go.

This is the architectural defect at the heart of most agent-native applications. They've made the model the universal solvent for anything the UI can't do. And the universal solvent is, predictably, overkill for most of the things it dissolves.

The third execution surface

Agent-native applications, if the term is to mean anything precise, introduce a third execution surface.

The third surface unifies human and agent — both call the same registry of actions authored at runtime

This third execution surface is where users can define deterministic actions that the UI or agent can call. These actions are authored at runtime, not at build time. They are defined by people who are not the original engineers. They become available immediately to both the human UI and the agent loop. They are cheap, fast, testable, and correct by construction.

The crucial detail is the unification. In an agent-native architecture, the 'actions' surface is the same surface the agent calls and the same surface the human invokes through the UI. The agent doesn't know, and doesn't need to know, whether a given action was shipped by the original developer six months ago or written by a power user last Tuesday afternoon. The agent just sees an action it can call. The user just sees one thing the product can do.

This is the architectural move that makes restraint possible. Without it, restraint is a discipline that one team practices and another team forgets. With it, restraint becomes a property of the system itself. Every time someone notices that the agent is being asked to do the same deterministic thing repeatedly, they can crystallize that work into an action, and from that moment forward, the agent calls a fast, free function instead of running a slow, expensive inference.

My JSON.parse debacle stops being a hackathon embarrassment the moment somebody adds a parseResponse(endpoint, schema) action. Neither requires a code deploy. Neither requires the original engineers to be in the room. The agent learns about the new action through the same registry that exposes everything else, and from then on, the work happens at the speed of a function call. This is agent-native at its best.

Agents are the prototype, actions are the product

There's a useful way to think about how the agent and the action surface relate to each other over the lifetime of an application.

The agent is the prototype. It's where novelty gets handled. It's where unfamiliar requests get reasoned through. It's the surface that absorbs the long tail of user intent that nobody anticipated and nobody coded for. It's, in a sense, the runtime version of an engineer thinking through a problem for the first time, which is exactly what makes it valuable and exactly what makes it expensive.

The crystallization loop — repeated agent work gets promoted to actions, dropping cost per invocation by orders of magnitude

The actions are the resulting production code. They are what the prototype crystallizes into once the work becomes repeatable. The pattern is roughly this: when a user, a team, a metric, or an on-call log notices that the agent is being asked to do the same shape of thing repeatedly, that shape becomes a candidate for promotion to an action. Once promoted, the agent stops re-deriving the answer from first principles and starts calling the function. The reasoning moves from runtime to design time. The cost per invocation drops by 5×, 10×, or 100×. The variance collapses to zero.

This isn't a new idea in computing. It's how every other layer of the stack already works. Hot paths get optimized. Interpreters give way to compilers. Manual queries become stored procedures. Repeated business logic gets extracted from one-off scripts into shared libraries. What's new is that the AI era compresses this entire process. The "prototype" can be authored by typing a sentence in natural language. The promotion to "production code" can be done by a non-engineer at runtime. The crystallization happens at the speed of usage, not at the speed of sprint planning.

A great agent-native application is, structurally, one that makes this crystallization easy and continuous. A mediocre one keeps everything routed through the model forever, because there's no third surface to crystallize into.

Why AI restraint compounds into a moat

It's tempting to read my argument so far as a developer-experience point. It's more than that. The economics of restraint compound in a way that becomes a massive advantage at scale.

In any market where the AI cost structure is a meaningful fraction of the unit economics, which is to say all of them, the company that aggressively cultivates AI restraint will end up with structurally better margins, faster products, and higher trust. They'll be able to undercut the price, ship faster, and operate at a scale that the maximalist competitor can't match without setting their gross margins on fire.

Restraint compounds. The companies that figure this out first will look, from the outside, like they have a moat that competitors cannot cross. But the moat is just the cumulative effect of years of crystallization on the third surface.

What this means for how I build

The practical implication for anyone building in this space is that the architecture of the third execution surface (Actions) is not a feature you add later. It's what determines whether your product can ever become great.

The Builder team has been making this argument under the name "agent-native architecture," and the framework at agent-native.com is one concrete instantiation of the third-surface pattern. There will be others. The pattern is more important than any single implementation. What matters is recognizing that the third surface exists as a distinct architectural choice, and that the choice to build it (or not) determines whether your product can practice restraint in any serious way.

Build the action surface early. Make actions a first-class primitive, not a power-user escape hatch. Expose the same surface to humans and agents, so that anything the agent can call, a human can also invoke, and vice versa. Make it cheap and obvious for non-engineers to author new actions. Build the registry that lets the agent discover newly authored actions without a deploy. Track which actions get called most often, because that telemetry is your roadmap. Track which agent invocations could have been actions but weren't, because that telemetry is your debt.

The signal

We're in a moment where the agent-native discourse rewards the wrong things. Demos optimize for what the agent can do on its own. Investor decks measure agentic surface area. Engineering blog posts brag about loop depth and tool count. None of these is the quality signal.

The quality signal is, and will be, whether the system uses inference judiciously. Whether the architecture admits a third execution surface where deterministic work can crystallize. Whether the team treats every recurring agent call as a candidate for promotion to a function. Whether, over time, the ratio of work the agent could do to work the agent actually does is moving in the right direction.

In a few years, the gap between agent-native applications that practice restraint and the ones that don't will be one of the most visible quality signals in software. The restrained ones will be cheaper, faster, more correct, and more trustworthy. They'll feel, to users, like products that respect their time and money. The maximalist ones will feel like products that are constantly thinking out loud at the user's expense.

This is the case for restraint. The best agent-native applications use less AI. Not because AI is bad, but because the architecture that earns the agent-native label in the first place is the architecture that makes restraint possible.

AI

Why the Best Agent-Native Apps Use Less AI

May 26, 2026

Written By Matt Abrams

The two-surface trap: the biggest problem with agent-native apps

The third execution surface

Agents are the prototype, actions are the product

Why AI restraint compounds into a moat

What this means for how I build

The signal

Share

Platform

Developer Resources

Frameworks

Resources

Popular Guides

Company