
How AI Agents Actually Work Behind the Scenes
A technical deep-dive into AI agent architectures: planning loops, tool use, memory, orchestration, and the hard problems nobody talks about.
I asked an AI agent to book a meeting, research a competitor, and draft a proposal, all in one prompt. It confidently did all three. One of the results was fabricated. One was three months out of date. And the meeting invite went to the wrong person.
The agent wasn't broken. It was working exactly as designed. The problem was I didn't understand what "working as designed" actually meant under the hood.
That changed when I stopped treating agents as black boxes and started reading what they were actually doing: the prompts, the tool calls, the memory lookups, the planning loops. What I found was fascinating, occasionally terrifying, and completely explainable once you know the architecture.
This is that explanation. And if you want to see these patterns applied in a real production system, check out InsightPilot and SOM.ai, two projects where I built multi-step agent pipelines from scratch.
What an AI Agent Actually Is
Most definitions of AI agents are either too vague ("an AI that takes actions") or too narrow ("an LLM with tool access"). Here's the one I've settled on:
An AI agent is a system that perceives inputs, reasons about them using a language model, decides which actions to take, executes those actions through tools, observes the results, and repeats until a goal is achieved or a stopping condition is met.
The key word is loop. A single LLM call is not an agent. An agent is what happens when you put a model inside a feedback cycle with the world.
The four core components of that loop:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β AGENT LOOP β
β β
β Perception βββΊ Reasoning βββΊ Action βββΊ Observation β
β β² β β
β ββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
- Perception: what the agent receives: user input, tool results, memory retrievals, environment state
- Reasoning: what the LLM does with that input: plans, decides, reflects
- Action: what the agent does: calls a tool, writes to memory, sends a message, terminates
- Observation: what comes back: tool output, error, confirmation, new state
Everything else, including orchestration layers, memory systems, and safety controls, is scaffolding built around this loop.
The Full System Architecture
Here's how a production AI agent system actually fits together end to end:
βββββββββββββββ
β User Input β
ββββββββ¬βββββββ
β
ββββββββΌβββββββ
β Orchestratorβ βββ system prompt + agent config
ββββββββ¬βββββββ
β
ββββββββββββββββββΌββββββββββββββββββ
β LLM Core β
β (planning + reasoning + output) β
βββββ¬βββββββββββββββ¬ββββββββββββββββ
β β
ββββββββββΌβββ ββββββββΌβββββββββ
β Tool Routerβ β Memory Managerβ
ββββββ¬ββββββββ ββββββββ¬βββββββββ
β β
βββββββββββΌβββββββ ββββββββββΌβββββββββ
β Tool Executionβ β Vector Store / β
β (APIs, code, β β Episodic Cache β
β search, DBs) β βββββββββββββββββββ
ββββββββββ¬βββββββββ
β
ββββββββββΌβββββββββ
β Observation β
β (results + ββββββββββββββββββββΊ back to Orchestrator
β error state) β
βββββββββββββββββββ
Every box in this diagram is a real engineering decision. Let's go through each one.
Component Breakdown
The LLM Core: Reasoning Engine, Not Magic Oracle
The language model is the reasoning engine. It reads the full context window, including the system prompt, conversation history, tool results, and memory retrievals, and produces either a text response or a structured action (tool call, plan step, final answer).
What the model doesn't have: persistent state, real-time data, the ability to actually execute anything. It only produces tokens. Everything else is infrastructure interpreting and acting on those tokens.
The model's context window is the agent's working memory. Everything the agent "knows" in a given reasoning step lives in that window. This makes context management (what goes in, what gets compressed, what gets dropped) one of the most consequential engineering decisions in any agent system.
If you're building a RAG-based agent, chunking strategy directly determines what ends up in that context window. I wrote a deep-dive on exactly that: The Definitive Guide to Chunking Strategies for LLMs.
Tool Use: Where Agents Touch the World
Tool use is what separates an agent from a chatbot. The model outputs a structured tool call; the orchestrator intercepts it, executes the tool, and feeds the result back into the context.
A minimal tool definition looks like this:
tools = [
{
"name": "search_web",
"description": "Search the web for current information. Use when the user asks about recent events or facts you may not have.",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "The search query"
}
},
"required": ["query"]
}
},
{
"name": "run_python",
"description": "Execute a Python code snippet and return stdout.",
"parameters": {
"type": "object",
"properties": {
"code": {
"type": "string",
"description": "Valid Python code to execute"
}
},
"required": ["code"]
}
}
]The LLM doesn't call the tool directly. It outputs something like:
{
"tool": "search_web",
"parameters": { "query": "Q1 2026 SaaS churn benchmarks" }
}The orchestrator parses this, runs the actual search, and appends the result to the context before the next LLM call. The model never "runs" anything. It only describes what to run.
This indirection is both a strength (safety, observability) and a source of failure (the model can hallucinate tool names, parameters, or assume tools exist that don't).
Orchestration: The Traffic Controller
The orchestrator manages the agent loop: sending prompts to the model, routing tool calls to executors, handling errors, enforcing stop conditions, and maintaining message history.
A minimal ReAct-style orchestrator in Python:
import anthropic
import json
client = anthropic.Anthropic()
def run_agent(user_query: str, tools: list, max_steps: int = 10) -> str:
messages = [{"role": "user", "content": user_query}]
for step in range(max_steps):
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=4096,
system=SYSTEM_PROMPT,
tools=tools,
messages=messages
)
# Agent produced a final answer
if response.stop_reason == "end_turn":
return extract_text(response)
# Agent wants to use a tool
if response.stop_reason == "tool_use":
tool_calls = [b for b in response.content if b.type == "tool_use"]
tool_results = []
for call in tool_calls:
result = execute_tool(call.name, call.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": call.id,
"content": json.dumps(result)
})
# Append assistant turn + tool results to history
messages.append({"role": "assistant", "content": response.content})
messages.append({"role": "user", "content": tool_results})
return "Max steps reached without resolution."This loop is the skeleton of almost every production agent. The complexity lives in execute_tool, error handling, and what happens when the model loops without making progress.
Memory: The Four Layers
Memory in agent systems isn't one thing. It's four distinct layers with different scopes and engineering requirements:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β MEMORY LAYERS β
β β
β In-context β Current conversation + tool results β
β (working) β Scope: single run. Fast. Limited. β
β β β
β Episodic β Past conversations, stored + indexed β
β (external) β Scope: across runs. Vector search. β
β β β
β Semantic β Facts, documents, knowledge bases β
β (external) β Scope: static or slowly updated. β
β β β
β Procedural β Tool definitions, system prompts, β
β (baked-in) β few-shot examples, fine-tune weights β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
The most common mistake in early agent systems is treating in-context memory as the only memory. It works until the context fills up or the conversation ends. Production systems need all four layers, with explicit logic for when to read from and write to each.
Safety Controls: The Layer Nobody Builds First (And Should)
Safety controls sit between the orchestrator and the outside world. They include:
- Input validation: sanitize user inputs before they reach the model
- Output filtering: intercept model outputs before tool execution
- Tool scope limiting: whitelist which tools each agent persona can access
- Action confirmation: require human approval for irreversible actions (writes, sends, deletes)
- Rate limiting and cost caps: prevent runaway loops from burning budget
- Audit logging: record every tool call, input, and output for post-hoc debugging
Here's the full guard pattern in Python:
import asyncio
import logging
from dataclasses import dataclass, field
from typing import Any, Optional
logger = logging.getLogger(__name__)
IRREVERSIBLE_TOOLS = {"send_email", "delete_record", "push_to_production", "charge_payment"}
@dataclass
class AgentConfig:
allowed_tools: list[str]
auto_approve: bool = False
tool_timeout_ms: int = 5000
max_cost_usd: float = 1.0
total_cost_usd: float = field(default=0.0, init=False)
@dataclass
class ToolResult:
output: Optional[Any] = None
error: Optional[str] = None
cost_usd: float = 0.0
async def safe_tool_execute(
tool_name: str,
params: dict,
config: AgentConfig,
approve_fn=None,
execute_fn=None,
) -> ToolResult:
# 1. Scope check: is this tool allowed for this agent?
if tool_name not in config.allowed_tools:
logger.warning(f"Blocked tool call: '{tool_name}' not in allowed list.")
return ToolResult(error=f"Tool '{tool_name}' not permitted for this agent.")
# 2. Cost cap: prevent runaway spend
if config.total_cost_usd >= config.max_cost_usd:
return ToolResult(error=f"Cost cap of ${config.max_cost_usd:.2f} reached. Halting.")
# 3. Human-in-the-loop for irreversible actions
if tool_name in IRREVERSIBLE_TOOLS and not config.auto_approve:
if approve_fn is None:
return ToolResult(error=f"'{tool_name}' requires human approval but no approve_fn provided.")
approved = await approve_fn(tool_name, params)
if not approved:
logger.info(f"User rejected irreversible action: '{tool_name}'")
return ToolResult(error="Action rejected by user.")
# 4. Execute with timeout
try:
timeout_secs = config.tool_timeout_ms / 1000
result: ToolResult = await asyncio.wait_for(
execute_fn(tool_name, params),
timeout=timeout_secs
)
# 5. Track cumulative cost
config.total_cost_usd += result.cost_usd
logger.info(f"Tool '{tool_name}' succeeded. Session cost: ${config.total_cost_usd:.4f}")
return result
except asyncio.TimeoutError:
logger.error(f"Tool '{tool_name}' timed out after {config.tool_timeout_ms}ms.")
return ToolResult(error=f"Tool '{tool_name}' timed out.")
except Exception as e:
logger.exception(f"Tool '{tool_name}' raised an unexpected error.")
return ToolResult(error=f"Unexpected error in '{tool_name}': {str(e)}")
# --- Usage example ---
async def mock_approve(tool_name: str, params: dict) -> bool:
print(f"\n[APPROVAL REQUIRED] Tool: {tool_name}\nParams: {params}")
return input("Approve? (y/n): ").strip().lower() == "y"
async def mock_execute(tool_name: str, params: dict) -> ToolResult:
# Replace with real tool dispatch logic
return ToolResult(output=f"Result from {tool_name}", cost_usd=0.002)
async def main():
config = AgentConfig(
allowed_tools=["search_web", "run_python", "send_email"],
auto_approve=False,
tool_timeout_ms=3000,
max_cost_usd=0.50
)
result = await safe_tool_execute(
tool_name="send_email",
params={"to": "team@company.com", "subject": "Agent Report"},
config=config,
approve_fn=mock_approve,
execute_fn=mock_execute,
)
print(result)Interaction Patterns
Global Planning vs. Reactive Execution
Two dominant patterns, and knowing when to use each is a real architectural decision.
Global planning (Plan-then-Execute): The agent first produces a full plan, a sequence of steps toward the goal, then executes each step. Good for well-defined tasks with predictable tool behavior. Brittle when the environment changes mid-execution.
User Goal
β
βΌ
[PLAN STEP] βββΊ Step 1: search market data
Step 2: run analysis script
Step 3: draft report
Step 4: send summary email
β
βΌ
[EXECUTE each step sequentially]
Reactive execution (ReAct-style): The agent reasons and acts one step at a time, using each observation to decide the next action. More adaptive. More token-expensive. Better for open-ended, exploratory tasks.
Observe β Think β Act β Observe β Think β Act β ...
Most production systems use a hybrid: global planning for task decomposition, reactive execution within each sub-task.
State Management Across Steps
Agent state, including what the agent knows, what it has done, and what it is waiting on, needs to be explicitly tracked and serialized. Don't rely on the model to remember it across turns.
from dataclasses import dataclass, field
from typing import Optional
import json
@dataclass
class AgentState:
session_id: str
goal: str
plan: list[str] = field(default_factory=list)
completed_steps: list[str] = field(default_factory=list)
tool_results: dict[str, str] = field(default_factory=dict)
current_step_index: int = 0
status: str = "running" # running | waiting | complete | failed
error: Optional[str] = None
def advance(self):
self.completed_steps.append(self.plan[self.current_step_index])
self.current_step_index += 1
if self.current_step_index >= len(self.plan):
self.status = "complete"
def to_json(self) -> str:
return json.dumps(self.__dict__, indent=2)
@classmethod
def from_json(cls, data: str) -> "AgentState":
return cls(**json.loads(data))Serialize this state to a database between agent turns. It's what enables pause-and-resume, human-in-the-loop approval, and post-mortem debugging.
Real-World Scenarios
Scenario 1: Customer Support Agent
A Tier-1 support agent for a SaaS product. The agent handles inbound tickets, looks up account data, checks known issues, drafts responses, and escalates when it can't resolve.
This is exactly what I built at R Systems for Edgenta: an intelligent support desk that streamlined ticket handling and query resolution end to end. You can read more about that work on my experience page.
Tool set: lookup_account, search_knowledge_base, check_open_incidents, create_ticket, escalate_to_human
Critical design decisions:
- System prompt includes explicit escalation triggers (billing disputes, data loss, detected frustration)
escalate_to_humanis always available and never blocked by safety controls- All tool calls are logged to the CRM with the agent's reasoning step attached
- The agent never sends emails directly. It drafts, and a human confirmation step triggers send
Failure mode to guard against: The model confidently drafts a resolution based on outdated knowledge base articles. Guard: always call check_open_incidents before drafting. If a known issue exists, reference it explicitly. Never synthesize a fix from first principles when a documented answer exists.
Scenario 2: Autonomous Data Analyst
An internal agent that accepts a natural language analysis request, writes and runs Python to query a database, interprets results, and returns a structured report with caveats.
I built a production version of this exact pattern. InsightPilot transforms natural language questions into interactive charts and insights, with 95% faster time-to-insight for non-technical teams. The agent architecture underneath it follows the design below almost exactly.
Tool set: run_sql_query, execute_python, write_to_report, request_clarification
System prompt design:
You are a data analyst agent. When given an analysis request:
1. Clarify any ambiguous metrics or time ranges before querying.
2. Write SQL to retrieve the relevant data. Always add LIMIT 10000 as a safeguard.
3. Use Python to analyze and visualize the result.
4. Summarize findings in plain English. Flag any anomalies or data quality issues.
5. State confidence level and list assumptions explicitly.
You do NOT have access to production write operations. Read-only only.
Never interpret missing data as zero. Flag it as unknown.
Critical design decisions:
- SQL queries run against a read-only replica. Write operations are blocked at infrastructure level, not just prompt level.
- The agent is required to state assumptions and confidence explicitly in every report.
request_clarificationis a first-class tool. The agent is rewarded for asking rather than guessing.
Evaluation and Debugging
What to Measure
Evaluating agents is harder than evaluating single LLM calls because failures compound across steps. The metrics that matter in production:
- Task completion rate: Did the agent actually achieve the stated goal, end to end? This is the only metric that truly matters to users.
- Step efficiency: Did it take 12 tool calls to do a 3-step task? High step counts signal planning failures or prompt issues, not tool failures.
- Hallucination rate: Did the agent fabricate tool results, cite sources that don't exist, or invent intermediate facts? Measure this separately from task completion.
- Latency per step: Break down wall-clock time by step: LLM call vs. tool execution vs. orchestrator overhead. The bottleneck is usually not where you expect.
- Error recovery rate: When a tool fails or returns empty, does the agent adapt and recover, or does it spiral into retries and then give a confident wrong answer?
- Human escalation rate: For agents with an escalation path, is it escalating appropriately? Too low means it's overconfident. Too high means the system prompt or tool set needs work.
Debugging Strategies That Actually Work
Trace every step. Log the full context sent to the model at each step, the model's output, the tool called, and the tool's result. Don't log just the final answer. The failure is almost always in the middle.
Replay broken runs. Store agent state snapshots so you can replay a failed run from any step with a modified prompt or tool response. Invaluable for fixing edge cases without running the whole task again.
Inject deliberate failures. Test what happens when a tool returns an error, returns empty results, or times out. Most agent loops handle the happy path well. The failure modes reveal the real weaknesses.
Watch for infinite loops. The most common production failure: the agent calls the same tool repeatedly with slightly different parameters, making no progress. Add a loop detector:
def detect_loop(tool_calls: list[dict], window: int = 5) -> bool:
if len(tool_calls) < window:
return False
recent = tool_calls[-window:]
tool_names = [c["name"] for c in recent]
most_common = max(set(tool_names), key=tool_names.count)
# Flag if more than 80% of recent calls are the same tool
return tool_names.count(most_common) / window > 0.8Current Limitations and Where This Goes Next
Context window as working memory. Even at 200K tokens, long-running agents hit limits. Summarization, compression, and hierarchical memory are active research areas, but every approach involves lossy compression of the agent's history. There's no clean solution yet. The chunking strategies I covered in this post directly affect how much useful signal you can pack into that window.
Reliability compounds. A single-step LLM call might be 95% reliable. A 10-step agent with tool calls compounds that probability. At 95% per step, a 10-step task succeeds roughly 60% of the time. Reliability engineering for agents means designing each step to fail gracefully, not just optimizing each step in isolation.
Goal drift. Agents can drift from the original goal over many steps, especially when tool results introduce new information. Maintaining goal fidelity across a long execution trace is harder than it sounds and an open alignment problem.
Multimodal agents. The next generation of agents perceive images, audio, video, and structured data natively. Architecturally this adds perception modules before the reasoning loop, but the core loop stays the same. The hard problems shift to grounding (connecting what the model sees to what it should do) and latency.
Edge deployment. Running agent loops on-device requires model compression and rethinking which parts of the architecture live locally vs. in the cloud. The orchestrator can run locally; the LLM core probably can't yet for most real tasks.
What This Changes About How You Build
The mental model shift that mattered most for me: stop thinking about agents as smart assistants and start thinking about them as distributed systems with an LLM as the decision node.
All the lessons from distributed systems apply: fail gracefully, make state explicit, design for idempotency, log everything, test failure modes before happy paths. The LLM doesn't change these fundamentals. It just adds a new kind of nondeterminism to manage.
Build the observability layer before you build the capabilities. You can't debug what you can't see.
And when your agent does something unexpected β and it will β the answer is almost always in the trace, sitting there in the context window, waiting for you to read it.
Thanks for reading ! Until next time , Stay curious. ~ Vansh Garg

Comments (0)
Loading commentsβ¦
Sign in to comment
We use Google for quick, secure sign-in.
Be the first to comment.