The EU AI Act came into force in August 2024 and will apply in full from August 2026. For developers building AI agents that act on behalf of users or businesses, one requirement shows up repeatedly across the risk categories: human oversight. Not as a nice-to-have — as a technical and organisational obligation.
This post breaks down what that actually means in practice for agent developers, without the legal fog.
What the Act classifies as high-risk
The EU AI Act sorts AI systems into risk tiers. General-purpose chatbots are largely unregulated. But AI systems that make or influence consequential decisions in specific sectors are classified as high-risk and face strict requirements. High-risk categories (Annex III) include:
- Credit scoring and financial risk assessment
- Employment decisions (screening, evaluation, management)
- Access to education and vocational training
- Administration of justice and legal interpretation
- Critical infrastructure management
- Essential services, public benefits, and migration decisions
- Remote biometric categorisation of individuals
Note: real-time remote biometric surveillance in public spaces is in the prohibited category — a stricter classification. If you're building an agent that operates in any of the Annex III domains, or produces outputs that feed into decisions in those domains, you're likely in scope.
The human oversight requirement, specifically
Article 14 of the Act mandates that high-risk AI systems be designed so that humans can "effectively oversee" them during operation. The requirements include:
- Understanding and monitoring: Operators must be able to understand the system's capabilities and limitations, and monitor for signs of malfunction or unexpected behaviour.
- Override capability: The system must allow authorised persons to override, interrupt, or shut it down through appropriate interfaces.
- Preventing undue reliance on automation: Systems must be designed to avoid operators becoming over-reliant on outputs without proper review — the "automation bias" problem.
What this means for agents specifically
The challenge with agents is that they act across multiple steps, often autonomously. A single user prompt can trigger a chain of tool calls — querying databases, drafting documents, calling external APIs — before any human sees the output.
Under Article 14, that chain needs to have human oversight points built in, not bolted on. In practice, this typically means:
- Approval gates before irreversible actions. An agent that can modify records, send communications, or trigger financial transactions needs a mechanism for a human to review and approve those actions before they execute — not just log them after the fact.
- Audit trails. You need to be able to reconstruct what the agent decided, why, who was asked for approval, when they responded, and what they said. This needs to be stored and accessible.
- Clear escalation paths. If the agent is uncertain or the action is above a defined risk threshold, it must be able to escalate to a human rather than proceed.
GPAI models and systemic risk
General-purpose AI models (like GPT-4, Claude, Gemini) that are used as foundations for agents face additional rules under the Act — particularly if they have systemic risk (roughly: models trained with more than 10^25 FLOPs). These models' providers must maintain technical documentation, report serious incidents, and support downstream providers in meeting their obligations.
If you're building on top of a GPAI model, the compliance burden is split: the model provider handles the model-level requirements, and you handle the application-level requirements — including human oversight of what your agent does with the model's output.
The practical gap most teams miss
Most agent implementations have logging. Few have actionable oversight. There's a significant difference between:
- Logging: "The agent sent this email at 14:32."
- Oversight: "Before sending the email, a human reviewed and approved it. Here is the decision record."
The Act cares about the second one. A log that shows what happened after the fact doesn't satisfy the Article 14 requirement for humans to be able to oversee, intervene, and override during operation.
Building compliant oversight into your agent
The simplest pattern that satisfies the core requirements:
- Define what requires approval. List the actions your agent can take that are irreversible, consequential, or in-scope for high-risk classification. Be specific.
- Gate those actions on a human decision. Before the agent executes them, it requests approval — describing the action, the context, and the urgency. A human reviews and responds.
- Record everything. Who was asked, when, what they decided, any notes they added. This is your audit trail.
- Escalate when uncertain. If the agent's confidence is low or the action is above a threshold, route to a senior approver.
The Handover API provides all of this in a single integration — one API call creates a decision, notifies the approver by email or Slack, records the response, and gives you a full audit trail. The docs cover the setup in detail.
Timeline
The Act applies progressively:
- February 2025: Prohibited AI practices enforceable
- August 2025: GPAI model transparency rules apply
- August 2026: Full applicability — most high-risk system obligations (Annex III) apply here
- August 2027: High-risk systems embedded in existing product safety legislation (Annex I) apply
The time to build oversight in is now — retrofitting compliance into a deployed agent is significantly harder than including it from the start.
Note: This post is a technical overview, not legal advice. If your system is in scope for the EU AI Act, review the official text and consult a legal professional for obligations specific to your situation.
Ready to add human oversight to your agent?
Free to start. No credit card required. Takes five minutes.
Get Started Free