OpenAI Agents SDK Human in the Loop: The Missing Production Layer

The OpenAI Agents SDK’s needs_approval flag works exactly as advertised. Set it on a tool, the run pauses, interruptions surfaces the pending call, and you resume after calling state.approve(). The OpenAI Agents SDK human in the loop problem surfaces in production, not in development, because in development, the developer is the approver.

You’re in the same process. You see the interruption. You approve it and the run continues.

Then you deploy. The agent calls a tool flagged with needs_approval. It waits. Your approver — the finance lead, the legal reviewer, the customer success manager — has no idea anything is pending. The thread sits until someone notices the workflow has stalled, traces the logs, and either manually resumes or restarts the whole run.

This is the gap between a working demo and a working approval system. This article maps it precisely.

Key takeaways

needs_approval pauses tool execution and exposes pending calls via RunResult.interruptions — it does not notify, time out, escalate, or produce an audit record.
Building the surrounding infrastructure (notifications, timeouts, escalation, audit log) involves seven distinct components and represents real engineering work.
Replacing the approval call with a single SDK method adds all four missing capabilities without changing the agent or tool structure.
Both patterns work in the same agent: needs_approval for in-session decisions, a dedicated approval API for out-of-band approvals.
dev_mode=True lets you test the full approval flow without reaching your approver during development.

What OpenAI Agents SDK needs_approval gives you

The mechanics are documented in depth in the official human-in-the-loop guide. The short version:

@function_tool(needs_approval=True) marks a tool for unconditional approval. Pass an async callable to make approval conditional — useful when only high-value or sensitive calls need a human check:

from agents import Agent, Runner, function_tool

async def requires_review(ctx, params, tool_call_id) -> bool:
    return params.get("amount", 0) > 1000

@function_tool(needs_approval=requires_review)
async def send_payment(recipient: str, amount: float) -> str:
    return f"Payment of ${amount} sent to {recipient}"

When the tool fires and approval is required, the run pauses. RunResult.interruptions contains ToolApprovalItem entries with the agent name, tool name, and arguments. Call result.to_state() to capture a resumable RunState, approve or reject each item, and pass the state back to Runner.run() to continue.

State serialises to JSON with state.to_json() and restores with RunState.from_json(). This makes long-running approvals possible: serialize to a database, restore when the approver responds, resume the run. The SDK also traces every run automatically — tool calls, arguments, model reasoning, and timing are all captured.

needs_approval is available on function_tool, Agent.as_tool, ShellTool, ApplyPatchTool, and both local and hosted MCP tools. Sticky decisions (always_approve=True passed to state.approve()) persist through serialization.

This is the right mechanism for:

In-session approvals where the user is active in the same application
Developer-facing tools where you are the reviewer
Flows where the approval decision can be computed in code

The limitation appears when the approver is somewhere else. That is where OpenAI Agents SDK human in the loop requirements exceed what the SDK ships with.

What needs_approval doesn’t include

When the approver is not watching the same terminal, the OpenAI Agents SDK human in the loop flow has three significant gaps. The official guardrails guide makes this explicit: “decision logic, review interfaces, persistence, and policy enforcement” are left to the developer.

No notification. needs_approval pauses execution and surfaces an interruption. No email is sent. No Slack message fires. If your approver is Priya in compliance and she needs to sign off before the agent files a regulatory report, you need to build that notification layer yourself. There is no built-in channel to tell her the agent is waiting.

No timeout or escalation. Paused runs wait indefinitely. There is no built-in mechanism to say: “If nobody responds within 30 minutes, escalate to the backup approver. If nobody responds in two hours, auto-deny and log the expiry.” GitHub issue #636, opened by developers calling for production-grade HITL, explicitly confirms the absence: “no timeout or escalation mechanisms when human input isn’t received within expected windows.” Getting this requires external scheduling logic, retry handling, fallback routing, and a way to resume the run with a denial.

No structured audit trail. The SDK’s tracing captures run history for debugging — which tools ran, with what arguments, how long each step took. Useful for engineers investigating a failed run. Not a decision record. It does not produce a structured log of who was asked, when they responded, what they chose, and any notes they added. For most production deployments — and specifically for EU AI Act Article 14 compliance — that record is required.

These are not edge cases. They are the core requirements of any OpenAI Agents SDK HITL production deployment.

The cost of building the missing layer yourself

The SDK’s own documentation signals what you are taking on. The state serialization story — state.to_json(), store in a database, RunState.from_json() when the approver responds — describes a persistence layer you are expected to build and maintain. The developer community summarised the result in issue #636: without native HITL scaffolding, teams end up with “brittle, custom logic” that “breaks the elegance and composability the SDK otherwise offers.”

Consider what Dani’s team at a legal-tech startup discovered in early 2026. They had needs_approval working in development, where Dani herself approved tool calls in the terminal during testing. After deploying to production, the agent would pause waiting for their compliance officer, Marcus, to respond. Marcus had no way of knowing. Runs sat pending for hours. Dani’s team spent several weeks adding a webhook notification system, a cron-based timeout handler, and a decision-log schema before the workflow was fit for production use. Those weeks came directly out of the feature backlog.

The seven components needed (the same seven documented in our guide to preventing irreversible agent actions):

Interception — detect the interruption and extract action context from the ToolApprovalItem
Notification — send the approver an email, Slack message, or webhook with enough detail to decide
Context delivery — format the tool name, arguments, and agent state so the decision is informed
Decision capture — receive a structured approved, denied, or modified response
Timeout handling — define what happens if nobody responds in 30 minutes or two hours
Audit log — record who was asked, when, what they chose, and any notes
Resume or abort — translate the decision back into state.approve() or state.reject() and rerun

Components 1 and 7 are agent code. Components 2 through 6 are infrastructure. Most teams build 1 and 7 during development and discover the rest the first time a production approver has no idea the agent is waiting.

How The Handover fills the gap (with code)

The agent structure stays identical. The approval call changes.

Here is the needs_approval version, with the TODO comments marking what production requires:

from agents import Agent, Runner, function_tool
from agents.run_state import RunState

@function_tool(needs_approval=True)
async def send_contract(recipient: str, amount: float) -> str:
    """Send a contract to the recipient."""
    return f"Contract sent to {recipient} for ${amount}"

agent = Agent(name="ContractAgent", tools=[send_contract])
result = await Runner.run(agent, "Send contract to acme@corp.com for $50,000")

while result.interruptions:
    state = result.to_state()
    # TODO: notify approver (email, Slack, webhook)
    # TODO: serialize state, wait for async response
    # TODO: escalate if no response in 30 minutes
    # TODO: log who approved, when, and why
    for interruption in result.interruptions:
        approved = input(f"Approve {interruption.tool_name}? (y/n): ")
        if approved.lower() == "y":
            state.approve(interruption)
        else:
            state.reject(interruption)
    result = await Runner.run(agent, state)

Here is the same agent using the Handover SDK. The TODOs become the implementation:

from agents import Agent, Runner, function_tool
from the_handover import HandoverClient

client = HandoverClient(api_key="ho_your_key_here")

@function_tool
async def send_contract(recipient: str, amount: float) -> str:
    """Send a contract to the recipient."""
    decision = client.decisions.create(
        action=f"Send contract to {recipient} for ${amount}",
        context="Agent requesting sign-off before sending.",
        approver="legal@yourcompany.com",
        urgency="high",
        timeout_minutes=30,
    )
    return f"Contract sent to {recipient}" if decision.approved else "Contract send denied."

agent = Agent(name="ContractAgent", tools=[send_contract])
result = await Runner.run(agent, "Send contract to acme@corp.com for $50,000")

No needs_approval decorator. No custom resume loop. No RunState juggling. The approval gate lives inside the tool function. The approver receives an email with the action description, full context, and one-click Approve, Deny, or Modify buttons. The decision — including any notes the approver added — comes back structured. Every decision is logged automatically.

This is what the OpenAI agent approval API provides that needs_approval does not: the full out-of-band approval lifecycle in a single method call.

This version does not require managing RunState serialization for the approval itself. The agent polls until the approver responds. If you would rather fire and move on, pass a callback_url — the POST /decisions endpoint returns immediately and POSTs to your server the moment the approver decides.

For testing during development, initialise with dev_mode=True. Decisions auto-approve without sending notifications, so you can run the full agent workflow in CI or locally without reaching your approver.

client = HandoverClient(api_key="ho_your_key_here", dev_mode=True)

If you want the tool to raise on denial rather than checking a flag, set enforce=True. The SDK raises ActionDenied and the run stops cleanly. See the full API reference for the complete options. Install with pip install the-handover or npm install @the-handover/sdk.

OpenAI Agents SDK approval workflow: needs_approval vs. a dedicated API

Need	OpenAI SDK `needs_approval`	The Handover
Approver is in the same session	Sufficient	Overkill
External approver (email notification)	Build yourself	Built in
Slack DM notifications	Build yourself	Scale+
Timeout + auto-escalation	Build yourself	Built in
Structured audit trail	Build yourself	Built in
EU AI Act decision record	Build yourself	Built in
Long-running approval (persist state)	Manual (`state.to_json()`)	Built in
Dev mode for testing	Build yourself	`dev_mode=True`
Raise on denial	Build yourself	`enforce=True`
Rich structured response	Raw resume value	`choose`, `number`, `text`, `file_upload`

The same pattern applies across frameworks. We documented it for LangGraph’s interrupt(), and in our human approval guide for LangChain agents, and it holds here: the SDK provides the pause primitive, not the surrounding workflow. Use needs_approval when the approver is interactive and in the same session. Use The Handover when the approver needs to be notified, their decision needs to be tracked, and their response needs to be logged.

Both work in the same agent. needs_approval for in-session decisions where the user is present. The Handover for out-of-band approvals that require notification, timeout handling, and an audit record.

OpenAI Agents SDK human in the loop: FAQ

Does OpenAI Agents SDK needs_approval send notifications?

No. needs_approval pauses the run and surfaces pending calls in RunResult.interruptions. No notification of any kind is sent. If your approver needs to know the agent is waiting, you must build that notification layer yourself or use a dedicated approval API.

What happens if nobody approves an OpenAI Agents SDK interruption?

The run waits indefinitely. The SDK has no built-in timeout or escalation. For OpenAI Agents SDK HITL production deployments, you need external scheduling logic that monitors for stale pending approvals and either escalates or auto-denies after a defined window. OpenAI agent human approval flows that need automatic fallback require you to build that logic yourself.

Can I use needs_approval and The Handover in the same agent?

Yes. Use needs_approval for tools where the approver is active in the same session. Use The Handover for tools requiring out-of-band approval, where the approver receives a notification and responds asynchronously. A single agent can use both patterns across different tools.

How do I test an approval flow without interrupting my approver?

Pass dev_mode=True when initialising HandoverClient. Decisions auto-approve without sending notifications, so you can run the full agent workflow in CI or local development without reaching your approver.

The needs_approval flag is the right primitive for pausing tool execution in the OpenAI Agents SDK. A dedicated human approval API is the right wrapper for turning that pause into a complete OpenAI Agents SDK approval workflow with notifications, timeout handling, escalation, and an audit trail. They solve different problems at different layers.

Ready to add human oversight to your agent?

Free to start. No credit card required. Takes five minutes.

Get Started Free