The safest agent systems are quietly removing a step everyone still talks about

The market is starting to admit something it did not want to admit six months ago: human approval is often a ceremonial control.

That is the phenomenon worth naming. Approval theater is the practice of presenting repeated permission prompts, browser checkpoints, and nominal human signoffs as if they constitute real supervision, even when the human approves almost everything, cannot inspect the underlying action chain, and is not positioned to intervene usefully in the moment that matters.

Three recent signals make the shift visible. Cloudflare now lets an agent run `wrangler deploy --temporary` and deploy a Worker into a temporary account that stays live for 60 minutes, after which a human can claim it and make it permanent. The company states the reason plainly: background AI sessions have no human in the loop, and agents need a tight write, deploy, and verify loop. Anthropic reports that users approved roughly 93% of permission prompts and that the more approvals a user sees, the less attention they pay to each. DeepMind frames untrusted AI agents as potential insider threats and describes supervisor systems that can step in to block an action before damage occurs, with metrics such as coverage, recall, and time-to-response.

1. The common diagnosis is wrong

Most teams still explain agent oversight failure in one of two ways. They say users need better training so they will approve less carelessly, or they say the platform needs cleaner prompts so the human can make better decisions. That diagnosis is comforting because it preserves the old story: the interface remains the control surface, the human remains in the loop, and the loop merely needs refinement.

That is not what these signals point to.

If users approve 93% of prompts, the problem is not primarily judgment quality. It is control design. If background agents cannot pause constantly for browser-based approvals, the problem is not primarily awareness. It is workflow physics. If safety work is measuring coverage, recall, and time-to-response for supervisory intervention, the problem is not whether a warning modal appeared. It is whether a containment and interruption system exists at runtime.

Approval theater survives because it flatters everyone involved. Vendors can say a human remained involved. Buyers can say governance exists. Operators can point to a prompt log as evidence that caution happened. But a prompt is not oversight just because a person clicked it.

Acknowledgment is not supervision. Presence is not control. A permission prompt is only meaningful if it sits inside a system that makes the human decision consequential.

2. The mechanism sits below the prompt

The oversight problem is usually described at the interface layer, but the mechanism is lower.

An agent does not simply ask for permission and then perform a single visible action. It moves through a sequence: task framing, context retrieval, tool selection, intermediate outputs, downstream calls, retries, and side effects. Risk accumulates across that chain. The meaningful question is not whether a human saw a checkpoint. The meaningful question is whether the system preserved a legible boundary between reversible work and consequential work, and whether it could stop the run when the sequence began to drift.

That is why approval theater becomes more dangerous as agents become more capable. A stronger agent does not just generate better text. It compresses more action into each run. It can move faster across tools, chain more steps, and exploit more valid authority in less time. Under those conditions, repeated approvals do not increase control. They often degrade it. The human gets trained into rhythm, not scrutiny.

Anthropic's observation matters here because it exposes the fatigue mechanism directly. The more approvals a user sees, the less attention they pay to each. That is not a minor usability problem. It means the nominal control surface decays as usage scales. The system generates the appearance of caution while steadily reducing the quality of the only human judgment it claims to rely on.

Cloudflare's temporary account pattern reveals a second part of the mechanism. For low-risk iteration, the right question is often not how to interrupt the agent more often. It is how to give the agent a disposable lane where fast action is acceptable because permanence has been deferred. That is a structural solution. The identity expires unless claimed. The work can proceed without constant friction because the substrate is already limiting blast radius.

DeepMind's framing reveals the third part. Once an agent is treated as a potential insider threat, oversight becomes a runtime supervision problem. A supervisor must be able to detect risky action, block it before damage occurs, and measure whether the control actually worked. That is not theater. That is a protocol.

3. Approval theater appears when organizations confuse authorization with supervision

This confusion is older than AI.

Companies have always known that giving someone access is not the same as supervising their work. A login is not management. A role is not review. A written policy is not an intervention path. Human organizations survive because they separate standing authority from high-consequence authority, reversible work from irreversible work, and routine execution from exception handling.

Agents force the same separation into software.

When an organization says it wants a human in the loop, it often means something vague: keep a person near the system so responsibility feels retained. That is not enough. A real loop has to specify where the person enters, what evidence they see, what action they can stop, how quickly they can stop it, and what automatically happens if nobody responds in time. Without those properties, the human is not supervising. The human is decorating the workflow.

This is why the current market language is shifting away from static permission design and toward runtime control, spend controls, replay, containment, and supervisory metrics. The market is slowly learning that oversight has to be engineered where the work actually unfolds.

4. The protocol is simpler than the debate

The practical response does not require a giant ethics program. It requires a clearer operating protocol.

First, stop treating prompts for approval as the main control surface. Keep them where they are useful, but demote them. A prompt can support oversight. It cannot be the definition of oversight.

Second, create a temporary lane for cheap, reversible work. If the agent is drafting, testing, deploying into a short-lived environment, or performing exploratory tasks whose outputs can expire safely, optimize for speed inside a disposable boundary. Cloudflare's temporary account design is useful because it turns low-risk experimentation into an expiring identity problem instead of an endless approval problem.

Third, make identity claimable and expiring before it becomes permanent. This is the hinge between iteration and ownership. The agent can move quickly, but permanence requires an explicit human act of adoption. That is a much cleaner control than constant interruption because it concentrates judgment at the moment permanence begins.

Fourth, move high-risk actions into synchronous controls and containment boundaries. If an action can move money, alter regulated records, expose sensitive data, or trigger external commitments, the system should not rely on attention-intensive prompt review. It should rely on containment, predeclared lanes, and supervisory paths that can block execution before the action lands.

Fifth, measure oversight with runtime metrics, not policy claims. DeepMind's language is useful here because it treats supervision as something measurable: coverage, recall, time-to-response. Those metrics ask the only questions that matter. Did the system see the risky behavior? Did it catch enough of it? Did it intervene fast enough to matter?

That is the protocol. It is not glamorous. It is operational.

5. What this looks like in practice

A useful way to think about agent oversight is to divide work into lanes.

Lane one is temporary and reversible. The agent can write, test, inspect, stage, and verify inside an expiring identity or sandbox. The goal is not to force a person to babysit every step. The goal is to make the steps cheap enough that speed does not create unacceptable downside.

Lane two is claimable. Something valuable has been produced, but it is not yet permanent. A human adopts the artifact, the account, the configuration, or the decision point. This is where accountability attaches. The claim is not a decorative click. It is the transfer from provisional action into durable ownership.

Lane three is high consequence. The action affects customers, money, regulated data, or critical systems. Here the control surface changes. The system needs hard boundaries, not soft reminders. It needs containment, narrow authority, and supervisory interruption that can actually stop the run.

That three-lane model explains why approval theater feels so common right now. Many organizations are trying to run all three lanes through the same interface pattern. They are using prompts where expiration would work better, using prompts where claimable ownership would work better, and using prompts where only containment and intervention are serious enough.

The result is the worst of both worlds: too much friction in low-risk work and too little control in high-risk work.

6. Why this matters now

The agent market is entering a phase where delegated action matters more than model novelty. OpenAI's recent enterprise messaging around usage analytics, spend controls, and record-and-replay points in the same direction. Once a system is acting on behalf of a business, observability, spending boundaries, replayability, and intervention stop being secondary features. They become deployment conditions.

That is the broader meaning of approval theater. It is a category name for a governance pattern that breaks under production conditions.

Once named, it becomes easier to design against.

The contribution here is not the claim that humans matter. Of course they do. The contribution is a more precise statement of where humans matter. Human judgment should concentrate at the transitions that change risk class: from temporary to permanent, from reversible to irreversible, from delegated execution to supervisory intervention. Everywhere else, the substrate should be doing more of the control work than the interface.

That is how agent oversight becomes real. Not by asking for approval more often, but by building systems where approval is rare, legible, and decisive.

Sources

Cloudflare, “Temporary Cloudflare Accounts for AI agents,” published June 19, 2026. https://blog.cloudflare.com/temporary-accounts/
Google DeepMind, “Securing the future of AI agents,” published June 18, 2026. https://deepmind.google/blog/securing-the-future-of-ai-agents/
Anthropic Engineering, “How we contain Claude across products,” published May 25, 2026. https://www.anthropic.com/engineering/how-we-contain-claude
OpenAI, “New usage analytics and updated spend controls for enterprises,” referenced in OpenAI News RSS on June 18, 2026. https://openai.com/index/chatgpt-enterprise-spend-controls
OpenAI YouTube, “What Codex Unlocks for NTT Data,” uploaded June 19, 2026. https://www.youtube.com/watch?v=0JIbgZ544wU
OpenAI YouTube, “Record & Replay in Codex,” uploaded June 18, 2026. https://www.youtube.com/watch?v=ZK3JhU73W18

Stephen Nickerson.
Built for operators who need AI agents they can test, trust, and improve.