Skip to content

Ai-engineering Series

RAG to Agents - From Retrieval to Action

Deep dive into AI agents: the agent loop, tools, ReAct pattern, memory systems, when agents are wrong, and agent failure modes you'll encounter in production

Why This Matters

RAG answers questions. Agents solve problems.

When a user asks “What’s the status of order #12345?”, RAG retrieves a document. But what if answering requires:

  • Querying an order database
  • Checking shipping status from an API
  • Calculating estimated delivery based on location
  • Composing a response with all that information

RAG can’t do this. RAG retrieves static documents. Agents take actions.

If you try to build multi-step systems with RAG patterns, you’ll create brittle pipelines that break on variation. Understanding the agent mental model lets you build flexible systems that adapt.

What Goes Wrong Without This:

Agent Failure Patterns

Pipelines vs Agents

There are two ways to build multi-step AI systems:

Pipelines vs Agents

The key question: Who decides the next step—your code or the model?

  • Pipeline: You enumerate all paths. Reliable for known scenarios. Fails on novel scenarios.
  • Agent: Model reasons about what to do. Handles variation. Can make mistakes.

Neither is better. They solve different problems.


The Agent Loop

An agent is a loop. The LLM decides what to do, executes it, observes the result, and decides again.

The Agent Loop

Each iteration:

  1. Observe: What information do I have? What just happened?
  2. Think: Given my goal and current state, what’s the best next action?
  3. Act: Execute the chosen action
  4. Evaluate: Am I done? If not, loop.

The magic: the model decides the action at step 2. This is what makes it an agent, not a pipeline.


Tools: The Agent’s Capabilities

An agent without tools is just a chatbot. Tools are functions the agent can call.

Tools Give Agents Capabilities

Common tool categories:

CategoryExamplesWhat it enables
Data retrievalsearch_docs, query_databaseAccess information
External APIsget_weather, check_inventoryReal-time data
Actionssend_email, create_ticketSide effects
Computationcalculate, run_codeComplex logic
User interactionask_user, show_optionsClarification

Tool descriptions are prompts. Good descriptions = agent uses tools correctly. Bad descriptions = agent guesses wrong.

RAG as a Tool

Here’s the insight: RAG doesn’t get replaced by agents—it becomes a tool.

RAG as Agent Tool

RAG is retrieval. Agents can use retrieval as one capability among many.


The ReAct Pattern

The most common agent architecture is ReAct (Reason + Act). The model explicitly reasons before acting.

ReAct Example

The THOUGHT step makes the agent’s reasoning visible. This helps with:

  • Debugging (you can see why it chose an action)
  • Guidance (you can provide examples of good reasoning)
  • Error recovery (model realizes when it’s stuck)

Agent Memory

Agents without memory forget everything between turns. Production agents need memory.

Memory Types

Without memory, agents can’t handle multi-turn conversations, learn user preferences, or maintain context across sessions.


When Agents Are Wrong

Agents are not always the answer. Sometimes they’re the problem.

When to Use What

The “agent for everything” anti-pattern:

Over-Engineering Anti-Pattern

Agents add:

  • Latency: Multiple LLM calls per request
  • Cost: Each thought/action cycle costs tokens
  • Non-determinism: Same input can produce different paths
  • New failure modes: Wrong tool selection, hallucinated arguments, infinite loops

Don’t use an agent when a simpler approach works.


Agent Failure Modes

Agents introduce new ways to fail:

Agent-Specific Failures

Code Example

Minimal agent loop demonstrating the observe-think-act cycle:

from openai import OpenAI
import json

client = OpenAI()

# Define tools
tools = [
    {
        "type": "function",
        "function": {
            "name": "search_orders",
            "description": "Search for orders by user email or order ID",
            "parameters": {
                "type": "object",
                "properties": {
                    "email": {"type": "string", "description": "User email"},
                    "order_id": {"type": "string", "description": "Order ID"},
                },
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "check_refund",
            "description": "Check refund status for an order",
            "parameters": {
                "type": "object",
                "properties": {
                    "order_id": {"type": "string", "description": "Order ID"},
                },
                "required": ["order_id"],
            },
        },
    },
]

# Mock tool implementations
def search_orders(email=None, order_id=None):
    return [{"order_id": "456", "status": "refund_requested", "amount": 99}]

def check_refund(order_id):
    return {"status": "approved", "processed_date": "Dec 3"}

def execute_tool(name, arguments):
    """Route tool calls to implementations."""
    if name == "search_orders":
        return search_orders(**arguments)
    elif name == "check_refund":
        return check_refund(**arguments)
    return {"error": f"Unknown tool: {name}"}

def run_agent(user_message: str, max_iterations: int = 5) -> str:
    """Run the agent loop."""
    messages = [
        {"role": "system", "content": "You are a helpful customer service agent."},
        {"role": "user", "content": user_message},
    ]

    for i in range(max_iterations):
        # THINK: Model decides what to do
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=messages,
            tools=tools,
        )

        message = response.choices[0].message

        # Check if done (no tool calls)
        if not message.tool_calls:
            return message.content

        # ACT: Execute each tool call
        messages.append(message)

        for tool_call in message.tool_calls:
            name = tool_call.function.name
            arguments = json.loads(tool_call.function.arguments)

            # Execute tool
            result = execute_tool(name, arguments)

            # OBSERVE: Add result to context
            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": json.dumps(result),
            })

    return "Max iterations reached"

# Test
result = run_agent("What's the refund status for alice@example.com?")
print(result)

Key Takeaways

Key Takeaways

Verify Your Understanding

Before proceeding:

Explain the difference between a pipeline and an agent to someone who hasn’t read this document. If you say “an agent uses an LLM,” that’s insufficient.

Given this task: “Summarize the top 3 news articles about AI today”

  • Could this be done with RAG?
  • When would this need an agent?
  • What tools would the agent need?

Your agent has these tools: [search_docs, query_database, send_email, calculate]. User asks: “What’s our revenue this quarter?” Which tool(s) should the agent use? What if query_database fails?

Identify the error in this statement: “I built an agent with 30 tools so it can handle any request.”


What’s Next

After this, you can:

  • Continue → Agents → Evaluation — measuring what matters in multi-step systems
  • Build → Production agent with proper guardrails

Go Deeper: Production Agents

This article covers the agent mental model. For production patterns (idempotency, checkpointing, HITL, cost control), see the Production Agents Deep Dive series:

PartTopicWhat You’ll Learn
0OverviewWhy 98% of orgs haven’t deployed agents at scale
1IdempotencySafe retries, the Stripe pattern
2State & MemoryCheckpointing, memory systems
3Human-in-the-LoopConfidence routing, escalation
4Cost ControlToken budgets, circuit breakers
5ObservabilitySilent failure detection
6Durable ExecutionTemporal, Inngest, Restate
7SecuritySandboxing, prompt injection
8TestingGolden datasets, evaluation