Skip to content

10 Minutes vs. Three Months: How Anthropic Turned Infrastructure into an API

You spent three months building Agent infrastructure. Anthropic says it now takes just 10 minutes.

This isn't clickbait. According to official data, through Managed Agents, Rakuten completed Agent deployment across five departments in a week; Notion can run 30+ concurrent tasks. 79% reduction in time-to-market, 97% reduction in critical errors.

But I want to talk about not these numbers. I want to talk about a more fundamental question: Why is building Agent infrastructure so painful, and is this pain unavoidable?


A Overlooked Fact

In the past year, the industry has been discussing what Agents "can do."

What can GPT-4o do. What can Claude Agent SDK do. What can OpenAI's Agents SDK do.

But "what it can do" and "what it can scale to" are two different things.

According to 2026 enterprise AI research reports, over 70% of companies see deployment as the biggest obstacle, not model capability. In other words, models are already strong enough—strong enough to not be the bottleneck anymore.

The bottleneck lies elsewhere: How to make AI work continuously, stably, and safely.

This is what I call Agent Infra—not the Agent itself, but the entire set of engineering capabilities around the Agent:

  • How to set up sandbox environments
  • How to manage state
  • How to orchestrate tools
  • How to recover from errors
  • How to control permissions
  • How to maintain sessions

These problems, individually, aren't hard. But together, they add up to three months of engineering work.


How Anthropic Decoupled It

Anthropic's engineering blog published an article called "Scaling Managed Agents: Decoupling the brain from the hands." The title says it all.

Core idea: Separate the "brain" from the "hands."

Translation:

Concept Traditional Approach Anthropic's Approach
Brain Embedded in Agent Model + Harness, separable
Hands Build your own environment Virtualized as Sandbox
Memory Stuffed in Context Extracted as Session
Control Loop Write your own loop Abstracted as Harness

Why does this work?

Because any component can be independently replaced without dragging down other parts.

Example: Today you use Claude Sonnet 4.6 with a general Harness. Tomorrow the model upgrades to Opus 4.6, or a new specialized model appears—you can directly replace the Brain part without rebuilding the entire system.

What does this resemble? Like operating systems virtualizing hardware.

The problem operating systems solved: Programmers didn't need to care whether the machine used Intel or ARM. What Anthropic is doing is similar: People building Agents don't need to care about how the underlying environment is set up.

But there's a key difference here: They didn't just give you a tool—they gave you a managed service.


Four Core Components

Managed Agents consists of four components. Understanding these helps you judge whether this product fits you.

1. Session

An Agent's memory shouldn't go in the model's Context. Model Context has capacity limits, and every re-send counts toward billing.

Session is an append-only log, recording all events that occurred. It exists outside Claude's context, as a persistent state object.

Technical value: Long-running tasks won't be interrupted by Context overflow.

2. Harness

This is the loop logic that calls the model and routes tool execution.

In traditional Agent implementations, Harness and model are bound together. Anthropic abstracted this layer—you can replace Harness without changing the model.

Technical value: You can use different Harnesses for different scenarios without retraining the model.

3. Brain

Brain = Model + Harness.

This combination should be replaceable. When model capabilities improve or new models appear, you can switch directly.

Technical value: Architecture won't become obsolete due to model iteration.

4. Sandbox

The isolated environment where Agents execute tasks. Could be containers, phone emulators, or anything else.

Key point: Sandbox and Brain are separated. Brain can work across environments.

Technical value: No vendor lock-in—experience in this environment can transfer to the next.


Who Should Use It, and How

Three questions.

Question One: Use Cases

Managed Agents fits these situations:

  • Team has no dedicated AI infra engineer: You need something that works, not something to research
  • Need long-running Agents: Not something that finishes and is done, but continuous service
  • Multi-Agent collaboration: Notion can run 30+ concurrent tasks—this isn't something you can build yourself
  • Non-technical user scenarios: You need a managed solution

If you meet any of the above, keep reading.

Question Two: Not Suitable For

  • Already have a satisfactory solution: If you've built something that works well, don't change for the sake of change
  • Short, synchronous tasks: If it's the kind of send-and-receive call, you don't need managed service
  • Extremely cost-sensitive: Managed service has additional costs, $0.08/session-hour

Question Three: Decision Checklist

Three questions, answer them and you'll know whether to use it:

  1. How long does your Agent need to run? If it's within a few minutes, probably unnecessary
  2. Does your team have someone dedicated to infra? If not, this is an option
  3. In your scenario, is there a big difference between "done" and "done well"? If you have high quality requirements, read the next section

The Overlooked Capability: Self-evaluation

Most Agent problems aren't "not doing"—it's "doing poorly."

We ourselves often encounter this with Claude Code: It says it's done, but when you look, it's far from it.

Anthropic did something right: Separated evaluation from execution.

They introduced the Generator-Evaluator pattern in their architecture:

  • Generator: Responsible for generating output
  • Evaluator: Independent evaluation Agent, scoring against your defined success criteria
  • Feedback loop: Evaluator's results flow back to Generator, guiding the next iteration

This design solves a fundamental problem: When the model is asked to evaluate its own output, it tends to give itself high scores.

The solution: Let another Agent evaluate.

Currently, this capability (Self-evaluation) is still in research preview and requires application to use. But the approach is confirmed: You define success criteria, let the system judge whether it's done well.

If your scenario is quality-sensitive, this matters more than anything else.


My Assessment

Writing this, I have a few assessments to share:

Assessment One: Infrastructure being productized is a trend, but "value" is still your own.

Managed Agents solves "how to make AI work continuously." But "what to make AI do," "what level counts as good"—these are your own business problems.

Engineering capability can be productized. Business value cannot.

Assessment Two: Don't reject managed solutions just because "you already have it."

Many teams think "we can do it ourselves, the key is being more customized."

But your time is also a cost. In Rakuten's case, completing deployment across five departments in a week—can you handle this in three months?

Assessment Three: Evaluation is the watershed for Agents.

Agents that don't evaluate are like interns who hand in work—you can't say they didn't work, but you wouldn't trust them with important matters.

Automating "doing well" is the key capability for Agents to level up.


To Conclude

Back to the opening question: 10 minutes vs. three months.

The stuff you spent three months building, Anthropic packaged as a product. This is a trend: Infrastructure is becoming as callable as APIs.

But what's being API-fied is just engineering capability. Your business logic, your domain knowledge, your definition of "good"—these you still have to do yourself.

The best Agent infra is one that makes you forget infra exists. But the premise is, you first need to figure out what to have the Agent do.


References


Based on Anthropic's April 8, 2026 product and engineering documentation.