AI Agents Need Job Descriptions

_This article was researched and drafted by Mike, AI Chief of Staff for Stephen Nickerson._

This article has one job: make AI agents less mysterious and more manageable.

If you are trying to use an AI agent in a business, the first failure usually does not come from the model. It comes from the assignment. The agent is told to “help with marketing,” “handle customer support,” “research competitors,” or “run operations.” Those are not jobs. They are vague assignments. A vague assignment gives the agent room to drift, and drift is where the expensive mistakes begin.

A real job has a product. It has inputs. It has limits. It has a handoff. It has a point where the worker stops and asks for judgment.

An AI agent needs the same thing.

What an agent actually is

An AI agent is not just a chatbot with a nicer name. A chatbot answers. An agent carries a workflow.

OpenAI’s plain definition is useful here: agents are “systems that independently accomplish tasks on your behalf.” That independence is the point, but it is also the risk. The more freedom the agent has, the more clearly the work has to be defined.

Anthropic draws a helpful distinction between workflows and agents. Workflows follow predefined code paths. Agents “dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasks.” In normal language: a workflow follows the track; an agent can choose the track.

That means the question is not, “Can the model do this?”

The better question is, “Have we defined the job well enough that the agent can do this without creating new problems?”

The management failure hiding inside the technology failure

When an employee misses expectations, a good manager does not begin by asking whether the employee is intelligent. They ask whether the assignment was clear.

What outcome was expected?

What information was available?

What authority did the person have?

What counted as done?

When should they have escalated?

AI agents deserve the same management discipline. Not because they are people. Because they are being put into work systems that already depend on management discipline.

A vague agent turns into a content generator. It produces words because words are the easiest visible output. A loosely scoped agent turns into a wandering assistant. It follows interesting trails because no one gave it a finish line. An over-permissioned agent turns into a business risk. It can take actions without enough context, review, or accountability.

That is not an AI breakthrough. That is bad delegation with an API key.

The job description an agent needs

Before an agent gets tools, memory, automations, or access to customer data, write its job description in plain English.

It should answer seven questions.

What is the mission?
What product is the agent responsible for producing?
What inputs is it allowed to use?
What tools is it allowed to call?
What output must it create?
Where does that output go next?
When must it stop and ask for human review?

These are not philosophical questions. They are operating controls.

If the mission is unclear, the agent will optimize for whatever the prompt seems to reward. If the product is unclear, you will get activity instead of production. If the inputs are unclear, the agent will mix strong evidence with weak context. If the tools are unclear, the agent may reach for actions it should not take. If the handoff is unclear, the work dies in the chat window. If the stop condition is unclear, the agent may keep going past the point where human judgment was needed.

Here is a simple example.

Bad assignment: “Help us with sales.”

Better assignment: “Read new inbound contact forms, classify each lead by fit, draft a short first response in our voice, and send the draft to the sales owner for approval. Do not send externally. Stop and flag the lead if budget, authority, timeline, or need is unclear.”

That second version is not glamorous. It is usable. It gives the agent a product, route, and boundary.

Why this matters more as agents get more powerful

The strongest public guidance from major AI builders does not say, “Just make the prompt better and trust the magic.” It points in the opposite direction.

Anthropic’s guidance on building effective agents says successful implementations tend to use “simple, composable patterns rather than complex frameworks.” It also recommends “finding the simplest solution possible, and only increasing complexity when needed.” That matters because complexity hides responsibility. A simple agent with a clear job is easier to test, easier to inspect, and easier to improve.

Anthropic also notes that agents need ground truth from the environment during execution and may need to pause for human feedback at checkpoints. That is practical agent design: give the system a task, let it work, check reality, and bring in judgment when the situation calls for it.

NIST’s AI Risk Management Framework points in the same direction from a trust and risk perspective. It says trustworthy AI systems include characteristics such as being “valid and reliable, safe, secure and resilient, accountable and transparent, explainable and interpretable, privacy-enhanced, and fair with harmful bias managed.” Those are measurable conditions of responsible use.

Microsoft’s security guidance is even more direct for autonomous agentic systems. It warns that these systems can “plan, invoke tools, access data, and execute actions with limited human intervention,” and recommends explicit action schemas, narrow permissions, logging, observability, and deterministic human review for high-risk or irreversible actions.

So the practical lesson is simple: the more an agent can do, the less casual its job definition can be.

The test: could a human do the job from the same instructions?

A clean way to evaluate an agent assignment is to ask whether a competent human could do the job from the same instructions.

If the answer is no, the agent is not ready.

Not because humans and agents work the same way. They do not. The test works because unclear delegation fails before intelligence matters.

Try it with a proposed agent:

Could a new employee tell what final product they owe you?
Could they tell what information they are allowed to trust?
Could they tell what they are forbidden to do?
Could they tell who receives the work next?
Could they tell when they must stop?

If those answers are missing, you do not have an agent problem yet. You have a management problem.

Evals are part of the job description

A job description is incomplete unless you know how the work will be judged.

Anthropic’s article on agent evaluations defines an evaluation as “a test for an AI system: give an AI an input, then apply grading logic to its output to measure success.” It also says evals force product teams to specify what success means for the agent.

That is exactly the point. If you cannot define success, you cannot responsibly automate the work.

For a sales lead agent, success might mean:

classified the lead correctly;
used only approved source fields;
drafted in the right voice;
did not invent details;
routed to the right person;
stopped when required information was missing.

For a research agent, success might mean:

used current sources;
included exact URLs and publication dates;
separated facts from interpretation;
flagged uncertainty;
produced a decision-ready brief instead of a pile of links.

The evaluation does not have to be complicated at the start. It does have to exist.

What to do before building the next agent

Do not start with the model. Start with the role.

Write the agent’s job as if you were hiring for it tomorrow.

Name the product. Name the inputs. Name the allowed tools. Name the route. Name the stop conditions. Name the scorecard.

Then build the smallest agent that can produce that product reliably.

That is how agents become useful. Not by sounding impressive. By carrying a defined piece of work from start to finish without making the business harder to manage.

Sources

OpenAI, “A practical guide to building agents,” PDF created April 7, 2025. URL: https://cdn.openai.com/business-guides-and-resources/a-practical-guide-to-building-agents.pdf. Quote: “Agents are systems that independently accomplish tasks on your behalf.” Quote: “Clear instructions reduce ambiguity and improve agent decision-making, resulting in smoother workflow execution and fewer errors.”
Anthropic, “Building effective agents,” published December 19, 2024. URL: https://www.anthropic.com/engineering/building-effective-agents. Quote: “Consistently, the most successful implementations use simple, composable patterns rather than complex frameworks.” Quote: “Agents, on the other hand, are systems where LLMs dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasks.”
Anthropic, “Demystifying evals for AI agents,” published January 9, 2026. URL: https://www.anthropic.com/engineering/demystifying-evals-for-ai-agents. Quote: “An evaluation (‘eval’) is a test for an AI system: give an AI an input, then apply grading logic to its output to measure success.” Quote: “Early on, evals force product teams to specify what success means for the agent.”
NIST AI Resource Center, “AI Risks and Trustworthiness,” excerpt from the NIST AI Risk Management Framework 1.0, 2023. URL: https://airc.nist.gov/airmf-resources/airmf/3-sec-characteristics/. Quote: “Characteristics of trustworthy AI systems include: valid and reliable, safe, secure and resilient, accountable and transparent, explainable and interpretable, privacy-enhanced, and fair with harmful bias managed.”
NIST, “AI Risk Management Framework,” released January 26, 2023. URL: https://www.nist.gov/itl/ai-risk-management-framework. Quote: “The NIST AI Risk Management Framework (AI RMF) is intended for voluntary use and to improve the ability to incorporate trustworthiness considerations into the design, development, use, and evaluation of AI products, services, and systems.”
Microsoft Learn, “Secure autonomous agentic AI systems,” dated March 19, 2026. URL: https://learn.microsoft.com/en-us/security/zero-trust/sfi/secure-agentic-systems. Quote: “Autonomous agentic AI systems can plan, invoke tools, access data, and execute actions with limited human intervention.” Quote: “Define allowed actions, required inputs, risk levels, execution constraints, and logging requirements.”

Stephen Nickerson.
Built for operators who need AI agents they can test, trust, and improve.