RAG to Agents - From Retrieval to Action | Intentional / Deliberate / Engineering

Why This Matters

RAG answers questions. Agents solve problems.

When a user asks “What’s the status of order #12345?”, RAG retrieves a document. But what if answering requires:

Querying an order database
Checking shipping status from an API
Calculating estimated delivery based on location
Composing a response with all that information

RAG can’t do this. RAG retrieves static documents. Agents take actions.

If you try to build multi-step systems with RAG patterns, you’ll create brittle pipelines that break on variation. Understanding the agent mental model lets you build flexible systems that adapt.

What Goes Wrong Without This:

Agent Failure Patterns

Pipelines vs Agents

There are two ways to build multi-step AI systems:

Pipelines vs Agents

The key question: Who decides the next step—your code or the model?

Pipeline: You enumerate all paths. Reliable for known scenarios. Fails on novel scenarios.
Agent: Model reasons about what to do. Handles variation. Can make mistakes.

Neither is better. They solve different problems.

The Agent Loop

An agent is a loop. The LLM decides what to do, executes it, observes the result, and decides again.

The Agent Loop

Each iteration:

Observe: What information do I have? What just happened?
Think: Given my goal and current state, what’s the best next action?
Act: Execute the chosen action
Evaluate: Am I done? If not, loop.

The magic: the model decides the action at step 2. This is what makes it an agent, not a pipeline.

Tools: The Agent’s Capabilities

An agent without tools is just a chatbot. Tools are functions the agent can call.

Tools Give Agents Capabilities

Common tool categories:

Category	Examples	What it enables
Data retrieval	search_docs, query_database	Access information
External APIs	get_weather, check_inventory	Real-time data
Actions	send_email, create_ticket	Side effects
Computation	calculate, run_code	Complex logic
User interaction	ask_user, show_options	Clarification

Tool descriptions are prompts. Good descriptions = agent uses tools correctly. Bad descriptions = agent guesses wrong.

RAG as a Tool

Here’s the insight: RAG doesn’t get replaced by agents—it becomes a tool.

RAG as Agent Tool

RAG is retrieval. Agents can use retrieval as one capability among many.

The ReAct Pattern

The most common agent architecture is ReAct (Reason + Act). The model explicitly reasons before acting.

ReAct Example

The THOUGHT step makes the agent’s reasoning visible. This helps with:

Debugging (you can see why it chose an action)
Guidance (you can provide examples of good reasoning)
Error recovery (model realizes when it’s stuck)

Agent Memory

Agents without memory forget everything between turns. Production agents need memory.

Memory Types

Without memory, agents can’t handle multi-turn conversations, learn user preferences, or maintain context across sessions.

When Agents Are Wrong

Agents are not always the answer. Sometimes they’re the problem.

When to Use What

The “agent for everything” anti-pattern:

Over-Engineering Anti-Pattern

Agents add:

Latency: Multiple LLM calls per request
Cost: Each thought/action cycle costs tokens
Non-determinism: Same input can produce different paths
New failure modes: Wrong tool selection, hallucinated arguments, infinite loops

Don’t use an agent when a simpler approach works.

Agent Failure Modes

Agents introduce new ways to fail:

Agent-Specific Failures

Code Example

Minimal agent loop demonstrating the observe-think-act cycle:

from openai import OpenAI
import json

client = OpenAI()

# Define tools
tools = [
    {
        "type": "function",
        "function": {
            "name": "search_orders",
            "description": "Search for orders by user email or order ID",
            "parameters": {
                "type": "object",
                "properties": {
                    "email": {"type": "string", "description": "User email"},
                    "order_id": {"type": "string", "description": "Order ID"},
                },
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "check_refund",
            "description": "Check refund status for an order",
            "parameters": {
                "type": "object",
                "properties": {
                    "order_id": {"type": "string", "description": "Order ID"},
                },
                "required": ["order_id"],
            },
        },
    },
]

# Mock tool implementations
def search_orders(email=None, order_id=None):
    return [{"order_id": "456", "status": "refund_requested", "amount": 99}]

def check_refund(order_id):
    return {"status": "approved", "processed_date": "Dec 3"}

def execute_tool(name, arguments):
    """Route tool calls to implementations."""
    if name == "search_orders":
        return search_orders(**arguments)
    elif name == "check_refund":
        return check_refund(**arguments)
    return {"error": f"Unknown tool: {name}"}

def run_agent(user_message: str, max_iterations: int = 5) -> str:
    """Run the agent loop."""
    messages = [
        {"role": "system", "content": "You are a helpful customer service agent."},
        {"role": "user", "content": user_message},
    ]

    for i in range(max_iterations):
        # THINK: Model decides what to do
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=messages,
            tools=tools,
        )

        message = response.choices[0].message

        # Check if done (no tool calls)
        if not message.tool_calls:
            return message.content

        # ACT: Execute each tool call
        messages.append(message)

        for tool_call in message.tool_calls:
            name = tool_call.function.name
            arguments = json.loads(tool_call.function.arguments)

            # Execute tool
            result = execute_tool(name, arguments)

            # OBSERVE: Add result to context
            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": json.dumps(result),
            })

    return "Max iterations reached"

# Test
result = run_agent("What's the refund status for alice@example.com?")
print(result)

Key Takeaways

Verify Your Understanding

Before proceeding:

Explain the difference between a pipeline and an agent to someone who hasn’t read this document. If you say “an agent uses an LLM,” that’s insufficient.

Given this task: “Summarize the top 3 news articles about AI today”

Could this be done with RAG?
When would this need an agent?
What tools would the agent need?

Your agent has these tools: [search_docs, query_database, send_email, calculate]. User asks: “What’s our revenue this quarter?” Which tool(s) should the agent use? What if query_database fails?

Identify the error in this statement: “I built an agent with 30 tools so it can handle any request.”

What’s Next

After this, you can:

Continue → Agents → Evaluation — measuring what matters in multi-step systems
Build → Production agent with proper guardrails

Go Deeper: Production Agents

This article covers the agent mental model. For production patterns (idempotency, checkpointing, HITL, cost control), see the Production Agents Deep Dive series:

Part	Topic	What You’ll Learn
0	Overview	Why 98% of orgs haven’t deployed agents at scale
1	Idempotency	Safe retries, the Stripe pattern
2	State & Memory	Checkpointing, memory systems
3	Human-in-the-Loop	Confidence routing, escalation
4	Cost Control	Token budgets, circuit breakers
5	Observability	Silent failure detection
6	Durable Execution	Temporal, Inngest, Restate
7	Security	Sandboxing, prompt injection
8	Testing	Golden datasets, evaluation

Ai-engineering Series

Why This Matters

Pipelines vs Agents

The Agent Loop

Tools: The Agent’s Capabilities

RAG as a Tool

The ReAct Pattern

Agent Memory

When Agents Are Wrong

Agent Failure Modes

Code Example

Key Takeaways

Verify Your Understanding

What’s Next

Go Deeper: Production Agents

Table of Contents