The Agent Resume Contract

The newest production-agent worry is not that the model gives a bad answer. That problem is obvious enough to review, reject, or fix.

The more dangerous failure is quieter: the agent does the right action twice.

It sends the second email because the first result was not recorded. It updates the same record again because a timeout hid the successful tool call. It waits for a human approval, then restarts the work instead of resuming from the approved step. It retries an operation that was safe to attempt once and unsafe to repeat. The demo still looks impressive because the happy path ran cleanly. The production workflow breaks because real work is interrupted.

That is why the phrase worth watching today is not “agentic.” It is “resume logic.”

A public Product Hunt listing for Agentspan, an open-source runtime for durable AI agents, named the cluster cleanly: “state loss,” “human approvals needing resume logic,” “tool calls needing auditing,” and “retries causing repeated side effects.” Product Hunt pages can be noisy, but that language is useful because it is not abstract governance language. It is operator language. It names the mess that appears once agents stop being demos and start touching workflows with money, records, customers, deadlines, and approvals.

Human-in-the-loop is too small a phrase for that problem. A person approving an output does not automatically make the system safe. The workflow has to know what it is resuming.

A pause is not control

Most teams talk about human review as if it were a final checkpoint. The agent drafts something, a human approves it, and then the system proceeds. That is fine for simple content review. It is not enough for production work.

Production work has state. Something may already have happened before the human saw the approval request. A record may have been read. A price may have been calculated. A payment link may have been generated. A document may have been validated against one version of a rule. A customer may have received a notification. If the human approves six hours later, the question is not only “Do we approve?” The question is “What exact state are we approving from?”

Mistral put the same requirement into enterprise language in its April Workflows announcement. The company described the production gap as a lack of “durability, observability, and fault tolerance” needed to move AI-powered processes from proof of concept into production. It also named the recurring failure modes: long-running processes that cannot survive a network timeout, multi-step operations that need human approval mid-execution but have no way to pause and resume, and systems that cannot verify what they are doing after deployment.

That is the real bar. A production agent has to survive interruption without losing its place. It has to pause without forgetting what already happened. It has to resume without replaying side effects. It has to leave a trail a responsible person can inspect.

A human approval is only control if it is attached to a recoverable state.

The repeated side effect is the tell

The repeated side effect is the cleanest way to tell whether an agent workflow is production-shaped or demo-shaped.

A demo can restart from the beginning. A production workflow often cannot. If the agent already sent the email, charged the card, submitted the form, changed the CRM stage, released the shipment, or posted the update, repeating the step is not “retrying.” It is creating a second business event.

This is where normal software reliability and agent reliability meet. AWS and Temporal made the point in their April post on orchestrating agents at scale: agentic AI amplifies distributed-systems problems because modern agent applications may run for extended periods, coordinate specialized agents, call external systems, produce non-deterministic outputs, include human approvals, and preserve conversation history across failures. Their conclusion was blunt: traditional error handling becomes unmaintainable.

That matters because a lot of agent projects still behave as if model intelligence can carry runtime responsibility. It cannot. The model can decide what it thinks should happen next. It cannot, by itself, guarantee that the previous tool call was recorded, that a timeout did not hide a successful action, that a later retry is safe, or that the human approval belongs to the same version of the work.

Google’s Gemini/Temporal documentation points at the same boundary from a developer angle. The indexed snippet for its durable agent example says to disable the SDK’s built-in retries because Temporal handles retries durably. That small implementation detail contains a larger operating lesson: retries belong in the workflow state, not in a loose agent loop that may repeat a business action because it lost the result.

The question for an operator is simple. If the agent fails halfway through a meaningful action, can the system tell the difference between “try again” and “continue from the completed step”?

If it cannot, the agent is not ready for that action.

The missing artifact is a resume contract

A resume contract is the operating agreement that tells an agent workflow how to stop and start again without corrupting the work.

It is not a legal contract. It is a runtime contract. It says what must be checkpointed before a side effect, what evidence proves the side effect happened, what a human approval applies to, what may be retried safely, what must never be retried automatically, and what condition forces escalation instead of continuation.

The contract does not need to be complicated. It needs to be explicit.

For every meaningful agent action, the resume contract should answer seven questions:

What is the last safe checkpoint before this action?
What external side effect can this action create?
What receipt proves the side effect happened?
Is this action safe to retry, or must it be idempotent through a unique key?
If human approval is required, exactly what state and next action is the human approving?
What data must be refreshed before resuming after delay?
What failure condition stops the agent and routes the work to a human owner?

Those questions are boring in the best possible way. They turn “human in the loop” from a comforting phrase into an inspectable mechanism. They also force the team to name the difference between thinking, drafting, deciding, and acting.

An agent can think twice. It can draft twice. It should not charge twice, send twice, file twice, or update a production record twice unless the workflow has intentionally defined that as safe.

What this looks like in practice

Take a sales follow-up agent. The demo version reads a call transcript, drafts a follow-up email, waits for manager approval, sends the email, and updates the CRM. That looks controlled because a human approved the email.

The production version has to be more precise. Before the send step, the workflow records the message version, recipient, source transcript, approval identity, approval timestamp, and unique send key. If the send API times out, the system checks whether that send key already produced a message ID before trying again. If the manager approves the draft eight hours later, the agent refreshes the account state before sending. If the opportunity has changed owner or moved stages during the delay, the workflow stops and routes the work back to the owner.

That is not bureaucracy. That is what makes the agent safe enough to use.

Now take a legal-aid intake assistant. The demo version collects facts, organizes documents, drafts next-step guidance, and sends the person toward a form or referral. The production version has to know which facts were collected, which jurisdiction and date were used, which source rule applied, which advice boundary is in force, which human or legal-review gate is required, and whether the person changed a critical fact before the next step. If the workflow pauses for review, the reviewer is not approving a vague “case summary.” They are approving a specific next action from a named state.

The same pattern applies in healthcare coaching, operations, finance, and professional services. The more meaningful the action, the more important the resume contract becomes. Interruption is not an edge case. It is the normal condition of business work.

Stephen's operating view

The market keeps asking whether agents can work autonomously. That question is too broad to be useful.

The sharper question is whether the work can be paused, inspected, approved, resumed, and recovered without losing the thread or repeating the action. That is where agent theater separates from operating capacity.

Stephen’s frame on AI agents that actually work has always been practical: define the job, define the product, define the routing, define the authority, and put review where it matters. A resume contract is the runtime version of that discipline. It gives the agent permission to continue only from a state the business can understand.

This also changes how a team should buy or build agent systems. Do not start by asking which agent platform has the most impressive reasoning loop. Ask what happens when the loop is interrupted. Ask how approvals bind to state. Ask where tool results are recorded. Ask which actions are safe to retry. Ask what proof exists that the workflow resumed correctly.

A system that cannot answer those questions may still be useful for drafting, research, summarization, or low-risk internal assistance. It is not yet a production agent for consequential work.

The protocol

Before putting an agent into a real workflow, map the side effects first.

List every action the agent can take that changes the world outside the model: send, submit, update, charge, delete, schedule, notify, approve, route, file, publish, or escalate. For each one, write the resume contract before deployment. The model prompt is downstream of that contract. The workflow runtime enforces it.

The protocol is plain:

Name the agent’s job and final work product.
List every external side effect in the workflow.
Put a checkpoint before each side effect.
Record a receipt after each side effect.
Bind every human approval to a specific state and next action.
Make retries idempotent or require human re-entry.
Refresh stale data before resuming after delay.
Route ambiguous recovery to a named human owner.

This is not anti-agent. It is the condition that lets agents do more useful work.

The agent that can only run while everything goes right is still a demo. The agent that can stop safely and continue correctly is becoming part of the operating system.

The resume contract is the difference.

Stephen Nickerson.
Built for operators who need AI agents they can test, trust, and improve.