Most teams building with LLMs start the same way: one prompt, one response. That works fine until the task is complex enough to need multiple steps, external tools, or branching logic. At that point you need to think about agentic patterns — the architectural shapes that reliable AI systems tend to take.
Anthropic's internal research team identified five patterns that cover the vast majority of real-world agentic workloads. Understanding them — and knowing when to reach for each — is the difference between a brittle prototype and a system you can actually put in production.
A note on simplicity first
Before diving in: the right number of agents is usually fewer than you think. Every additional agent or loop adds latency, cost, and failure surface. Anthropic's guidance is direct — "success in the LLM space isn't about building the most sophisticated system" but about using the right tool for the task. Start simple. Add complexity only when it demonstrably improves outcomes.
Pattern 1: Prompt chaining
The simplest multi-step pattern. You break a task into sequential subtasks, where each LLM call processes the output of the previous one.
When to use it: When a task decomposes cleanly into fixed ordered steps — generate, then review, then translate. Each step can have its own model, system prompt, and validation.
Classic example: Draft a marketing email → extract key claims → fact-check each claim → rewrite with corrections.
Trade-off: Errors compound. A mistake in step 2 propagates through steps 3 and 4. Add a validation gate between steps when the downstream cost of an error is high.
Pattern 2: Routing
A classifier model (or prompt) reads the input and directs it to the right specialist handler.
When to use it: When your inputs fall into distinct categories that are genuinely better handled differently — a customer support triage that routes billing questions to one agent and technical issues to another, or a query router that sends simple questions to a fast/cheap model and complex ones to a powerful model.
Trade-off: Routing adds a round-trip and can mislabel edge cases. Make the category boundaries explicit in the router's prompt and log misclassifications so you can improve coverage.
Pattern 3: Parallelization
Run multiple LLM calls simultaneously and aggregate their outputs. Two common variants:
- Sectioning: Divide a large task into independent chunks and process them in parallel. Summarise 20 documents simultaneously, then aggregate. Dramatically reduces wall-clock time.
- Voting / ensembling: Run the same task multiple times with different prompts or temperatures, then pick the majority answer or have a meta-LLM synthesise the best response. Improves reliability for high-stakes outputs.
When to use it: When subtasks genuinely don't depend on each other's outputs, or when you need higher confidence than a single model call provides.
Pattern 4: Orchestrator-workers
A central "orchestrator" LLM breaks a complex task into subtasks dynamically, delegates them to specialised worker agents, and synthesises the results.
When to use it: For open-ended tasks where you can't predefine the steps. A coding agent that needs to read files, write tests, run them, interpret failures, and iterate. A research agent that searches, reads sources, identifies gaps, and searches again.
Trade-off: This pattern is powerful but expensive and hard to debug. The orchestrator's quality determines everything — unclear delegation instructions lead to confused workers. Invest heavily in the orchestrator's system prompt and test edge cases explicitly.
Where human approval fits: This is the pattern where a human-in-the-loop gate is most valuable. Before the orchestrator delegates an irreversible action (send this email, delete these records, deploy this build), insert an approval step. The orchestrator describes what it wants to do and waits for sign-off before the worker executes. The Handover API is designed for exactly this.
Pattern 5: Evaluator-optimizer
One LLM generates a response while a separate evaluator LLM scores it against defined criteria and provides feedback. The generator iterates until the evaluator is satisfied (or a max-iteration limit is hit).
When to use it: When you have clear, articulable quality criteria and iterative refinement measurably helps — literary translation, code generation with automated tests, structured data extraction with validation rules.
Trade-off: Works best when the evaluation criteria are explicit and the evaluator is genuinely better at judging than the generator. If the evaluator is the same model with a slightly different prompt, you're often just adding latency for marginal gains.
Choosing the right pattern
A practical decision tree:
- Task has fixed ordered steps? → Prompt chaining
- Inputs fall into distinct categories? → Routing
- Subtasks are independent? → Parallelization
- Steps are unpredictable or open-ended? → Orchestrator-workers
- Quality criteria are explicit and iterative refinement helps? → Evaluator-optimizer
Most real systems combine patterns. A customer support agent might route by topic (pattern 2), then use orchestrator-workers (pattern 4) for complex cases, with an evaluator (pattern 5) checking that final responses meet tone guidelines before sending.
The one thing all patterns share
Every pattern above can cause real harm if an agent takes an irreversible action without oversight. Building in approval checkpoints — especially at the boundaries between agents and between steps — is how you get the benefits of autonomy without the blast radius. Start with the most consequential actions your system takes, add an approval gate there, and expand from there as confidence grows.
Ready to add human oversight to your agent?
Free to start. No credit card required. Takes five minutes.
Get Started Free