Multi-agent research: Anthropic
How Anthropic Built a Multi-Agent Research System for Claude
Anthropic recently introduced Research capabilities in Claude, enabling it to explore complex topics by coordinating multiple AI agents. In a blog post, the engineering team shared key insights from developing this system. Here’s a summary of their approach and lessons learned.
Why Multi-Agent Systems?
Research tasks are inherently dynamic — unlike structured workflows, they require adapting strategies based on new findings. A single AI agent struggles with:
- Token limits (context windows constrain deep exploration).
- Sequential bottlenecks (slow, one-step-at-a-time searches).
- Lack of parallelization (difficulty exploring multiple angles simultaneously).
A multi-agent system solves these issues by:
- Delegating tasks (a lead agent spawns specialized subagents).
- Running parallel searches (subagents explore different aspects at once).
- Compressing insights (each subagent summarizes findings for the lead agent).
Results:
- 90.2% improvement over single-agent Claude Opus 4 in research tasks.
- Faster execution (parallel tool calls cut research time by 90%).
Key Engineering Challenges
1. Prompt Engineering for Coordination
- Teach delegation: Lead agents must clearly define subagent tasks to avoid duplication.
- Scale effort to complexity: Simple queries use 1 agent; complex ones use 10+.
- Guide search strategy: Start broad, then narrow down (like human researchers).
2. Tool Design & Reliability
- Bad tool descriptions mislead agents — clear documentation is critical.
- Agents can improve their own tools: Claude 4 models helped refine prompts and tool descriptions, reducing errors by 40%.
3. Evaluation & Debugging
- LLMs as judges: Grading outputs on accuracy, citations, and completeness.
- Human testing catches edge cases (e.g., agents favoring SEO-optimized junk over authoritative sources).
- Emergent behaviors require monitoring — small prompt changes can drastically alter agent interactions.
4. Production Challenges
- Stateful execution: Errors compound, so checkpoints and retries are essential.
- Asynchronous bottlenecks: Current synchronous execution slows research; future versions may allow dynamic subagent spawning.
- Rainbow deployments prevent disruptions when updating live agents.
Conclusion
Multi-agent systems unlock new capabilities for AI research but require careful engineering to handle coordination, reliability, and efficiency. Anthropic’s approach — combining parallel agents, smart tool use, and iterative prompting — demonstrates how AI can tackle open-ended problems at scale.
What’s next?
- Asynchronous agent coordination for faster workflows.
- Better memory management for long-horizon tasks.
- Expanding use cases beyond research (e.g., coding, business strategy).