AI Agents in Production: MCP, Tool Use, and Orchestration
Beyond the demos: how to architect autonomous agents with MCP, tool use, and multi-agent orchestration for real enterprise workloads.
The agent hype cycle of 2024 produced a lot of impressive demos and very few production systems. In 2025, that gap is closing. Anthropic's Model Context Protocol (MCP), maturing orchestration frameworks like LangGraph and CrewAI, and OpenAI's Agents SDK have given engineering teams a concrete vocabulary for building autonomous systems. This post is a field guide for tech leads who want to move past the prototype.
What actually changed in 2024-2025
Three shifts matter:
- Standardized tool interfaces. MCP, released by Anthropic in late 2024 and adopted by OpenAI in early 2025, is becoming the USB-C of agent tooling. Instead of writing a custom integration per model, you expose a tool server once and any MCP-compatible client (Claude Desktop, Cursor, your in-house agent) can use it.
- Stateful orchestration. Frameworks have moved from linear chains to graphs with explicit state, checkpointing, and human-in-the-loop interrupts. LangGraph and Temporal-style durable execution are converging.
- Reasoning models as planners. Models like Claude 3.7 Sonnet, GPT-5, and o3 can be used as a dedicated planner that delegates execution to cheaper, faster models — a pattern formalized as the orchestrator-worker topology.
The MCP primer
MCP defines three primitives: tools (functions the model can call), resources (read-only context), and prompts (reusable templates). A minimal MCP server in Python:
from mcp.server.fastmcp import FastMCP
mcp = FastMCP("invoice-service")
@mcp.tool()
def get_invoice(invoice_id: str) -> dict:
"""Fetch an invoice by ID from the ERP."""
return erp_client.fetch(invoice_id)
@mcp.tool()
def approve_invoice(invoice_id: str, approver: str) -> bool:
"""Approve an invoice. Requires manager role."""
return erp_client.approve(invoice_id, approver)
if __name__ == "__main__":
mcp.run(transport="stdio")
The payoff: this server now plugs into any MCP host without modification. We have clients running the same internal MCP servers across Claude Code for developer workflows and a custom LangGraph agent for back-office automation.
Choosing an orchestration pattern
Not every problem needs multi-agent. Most don't. Here's how we decide:
| Pattern | When to use | Example | |---|---|---| | Single agent + tools | Linear task, <10 tool calls, one domain | Customer support triage | | Orchestrator-worker | Parallelizable subtasks, shared goal | Document analysis across 200 contracts | | Hierarchical multi-agent | Distinct domains with specialized prompts | Sales ops: lead enrichment + CRM update + email draft | | Swarm / peer agents | Open-ended exploration, research | Competitive intelligence gathering |
A common mistake is jumping straight to swarm architectures. They're harder to debug, more expensive, and rarely outperform a well-designed single agent with good tools.
Enterprise use cases that actually work
From recent client engagements, these are the deployments delivering measurable ROI:
- Tier-1 IT support automation. An agent with MCP access to Jira, Confluence, and Okta resolves 40-60% of password resets, access requests, and how-to tickets without human escalation. Build cost: ~3 weeks.
- Contract review. Orchestrator splits contracts into clauses, dispatches to a worker agent per clause type (liability, IP, termination), aggregates findings. Cuts legal review time by 70% on standard NDAs.
- Data pipeline incident response. Agent reads Datadog alerts, queries Snowflake for affected rows, checks dbt lineage, drafts a Slack post-mortem. Reduces MTTR on data quality issues from hours to minutes.
- Sales research. Daily agent run enriches new leads from public sources, scores them against ICP, writes a personalized first-touch draft for review.
Production checklist
Before shipping an agent to production, verify:
- [ ] Bounded tool surface. Every tool has explicit input validation and a clear failure mode. No
execor unrestricted shell access. - [ ] Cost ceilings per run. Hard cap on tokens and tool calls. We typically set 50 tool calls and $2 per run as default limits.
- [ ] Observability. Traces in LangSmith, Langfuse, or Arize. You cannot debug what you cannot see.
- [ ] Checkpointing. State persisted at each step so a failure mid-run doesn't restart from zero.
- [ ] Human-in-the-loop gates on any write operation with financial or compliance impact.
- [ ] Evaluation set. At least 50 labeled scenarios run on every prompt or model change. Vibes-based testing fails in production.
- [ ] Permission model. Agents act with their own service identity, not the requesting user's credentials. Audit logs tie back to the originating request.
- [ ] Prompt injection defenses. Treat all tool outputs as untrusted input. Separate planner context from tool result context where possible.
What we're watching in 2026
Two trends worth tracking: computer-use agents (Anthropic's Computer Use API, OpenAI's Operator) are crossing from demo to selective production use for legacy systems without APIs. And agent-to-agent protocols — Google's A2A and the emerging MCP extensions — will likely standardize how agents from different vendors collaborate, the way REST standardized service integration a decade ago.
Key takeaways
- MCP is the integration standard to bet on. Build tools once, reuse across hosts and models.
- Start with a single agent. Add orchestration only when you have evidence of parallelism or domain specialization needs.
- Observability and evals are non-negotiable. Without them, every model update is a production incident waiting to happen.
- Constrain tool surfaces aggressively. The most reliable agents have fewer, better tools — not more.
- Pick use cases where 80% accuracy with human review beats 100% manual. That's where agents pay back fast.
Read also
- Agents IA & automatisationJune 4, 2026
Building Production-Grade AI Agents: MCP, Tools & Orchestration
Autonomous agents are moving from demos to production. Here's what actually works in 2025: MCP, tool-use patterns, and multi-agent orchestration.
Read article - Agents IA & automatisationMay 11, 2026
AI Agents in Production: MCP, Tool Use, and Orchestration
From autonomous agents to multi-agent orchestration with MCP and LangGraph — what actually works in enterprise settings, with patterns, pitfalls and code.
Read article - Agents IA & automatisationApril 30, 2026
Building Production AI Agents: MCP, Tools & Orchestration
Autonomous agents are leaving the demo phase. Here's what actually works in production: MCP, tool use patterns, and multi-agent orchestration.
Read article
