Agents IA & automatisationMay 11, 2026

AI Agents in Production: MCP, Tool Use, and Orchestration

From autonomous agents to multi-agent orchestration with MCP and LangGraph — what actually works in enterprise settings, with patterns, pitfalls and code.

The hype around AI agents peaked in 2024, but 2025 is when engineering teams started shipping them to production. The difference between a flashy demo and a reliable enterprise agent comes down to three things: a clean tool-use contract, robust orchestration, and observability. Let's break down what's working today.

From chatbots to agents: what actually changed

A classic LLM call is stateless and read-only. An agent is an LLM in a loop that can decide to call tools, observe results, and iterate until a goal is reached. Two shifts made this practical at enterprise scale:

  1. Function calling / tool use became reliable. GPT-4.1, Claude 3.5/4 Sonnet, and Gemini 2.0 all produce structured tool calls with >95% schema adherence on well-specified APIs.
  2. MCP (Model Context Protocol), open-sourced by Anthropic in late 2024 and now adopted by OpenAI, Google DeepMind, and most agent frameworks, standardized how agents discover and call tools across processes.

Before MCP, every integration was a bespoke adapter. After MCP, a server exposes tools, resources, and prompts over a JSON-RPC contract — and any compliant client can use them. Think of it as LSP for AI agents: the same server works for Claude Desktop, Cursor, your internal copilot, or a LangGraph workflow.

A minimal MCP tool server

Here's a Python MCP server exposing a single tool to query an internal CRM:

from mcp.server.fastmcp import FastMCP
import httpx

mcp = FastMCP("crm-tools")

@mcp.tool()
async def get_customer(customer_id: str) -> dict:
    """Fetch a customer record by ID from the internal CRM."""
    async with httpx.AsyncClient() as client:
        r = await client.get(
            f"https://crm.internal/api/v1/customers/{customer_id}",
            headers={"Authorization": f"Bearer {os.environ['CRM_TOKEN']}"},
        )
        r.raise_for_status()
        return r.json()

if __name__ == "__main__":
    mcp.run(transport="stdio")

Any MCP-compatible agent now has typed access to your CRM. No glue code, no per-vendor adapter. That's the whole point.

Single agent vs. multi-agent: pick the right shape

Multi-agent systems are seductive but often overkill. Use this as a decision matrix:

| Pattern | When to use | Example | |---|---|---| | Single agent + tools | Linear task, <15 tool calls, one domain | Invoice extraction, ticket triage | | Supervisor + workers | Parallelizable subtasks, clear router | Research assistant, code migration | | Hierarchical teams | Distinct expertise, long horizons | Full SDLC automation, M&A due diligence | | Swarm / handoff | Dynamic role switching | Customer support across departments |

Anthropic's June 2025 post-mortem on their multi-agent research system is worth reading: they found that a supervisor with parallel sub-agents outperformed a single agent by ~90% on research tasks — but consumed ~15x more tokens. Multi-agent is a cost-quality trade, not a free lunch.

Orchestration frameworks: the current landscape

  • LangGraph (LangChain): graph-based, stateful, strong checkpointing. Best for complex workflows with human-in-the-loop.
  • OpenAI Agents SDK (March 2025): lean Python SDK with handoffs, guardrails, and tracing. Good default if you're already on OpenAI.
  • CrewAI: role-based abstractions, fast prototyping, less control over execution.
  • Microsoft AutoGen v0.4: actor-model, async-first, solid for research and complex agent topologies.
  • PydanticAI: type-safe, FastAPI-style ergonomics for Python shops.

For production at DCT, we typically pair LangGraph for orchestration with MCP servers for tools. LangGraph gives us deterministic state transitions and replay; MCP keeps the tool layer portable across models.

Three enterprise use cases that pay back fast

1. Tier-1 support deflection

An agent with read access to the knowledge base, ticketing system, and customer account. Measured impact at a B2B SaaS client: 38% of L1 tickets resolved without human touch, average handle time down 22%. Key: a strict guardrail forbidding any write action without confirmation.

2. Internal data Q&A over governed sources

Replaces "can someone pull this report?" Slack messages. The agent calls MCP servers wrapping Snowflake, dbt metadata, and Looker. ROI shows up in analyst hours, not in flashy demos.

3. Code modernization at scale

Multi-agent setup: a planner reads the codebase, workers refactor file-by-file, a reviewer agent runs tests and proposes diffs. Used recently to migrate a 400k-LOC Java 8 monolith to Java 21 in weeks rather than quarters.

The production checklist

Before shipping any agent, verify:

  • [ ] Tool schemas are typed and validated (Pydantic, Zod, JSON Schema). Bad schemas = hallucinated arguments.
  • [ ] Every tool call is idempotent or has a confirmation step for destructive actions.
  • [ ] Token and step budgets are enforced (max iterations, max cost per session).
  • [ ] Traces are captured with LangSmith, Langfuse, or OpenTelemetry GenAI semantic conventions.
  • [ ] Eval suite runs in CI — golden tasks + LLM-as-judge for regression detection on model upgrades.
  • [ ] Secrets never reach the model context (use tool-side auth, not prompt-injected tokens).
  • [ ] Prompt injection defenses on any tool that ingests external content (emails, web pages, PDFs).
  • [ ] Human-in-the-loop gates on actions above a defined risk threshold.

The last two are non-negotiable. The OWASP LLM Top 10 (2025) ranks prompt injection and excessive agency as the top two risks for agentic systems — and we see both regularly in audits.

Key takeaways

  • MCP is becoming the default integration layer for AI agents — design new tool integrations as MCP servers, not framework-specific adapters.
  • Start with a single agent. Move to multi-agent only when you have measurable evidence the orchestration cost is justified.
  • Tool design dominates agent quality. Typed schemas, idempotency, and clear descriptions matter more than prompt engineering.
  • Treat agents as software, not magic: CI evals, tracing, budget enforcement, and security gates are mandatory.
  • The fast ROI lives in internal workflows (support, data Q&A, code modernization) — not in customer-facing autonomous agents, where the failure cost is still too high in most regulated industries.
Share this article

Read also