Skip to content

Production-agents Series

State Persistence & Agent Memory - The Complete Domain

Deep dive into agent memory systems: working memory, episodic memory, semantic memory, checkpointing patterns, context management, and long-running workflow persistence

Prerequisite: This is Part 2 of the Production Agents Deep Dive series. Start with Part 0: Overview for context.

Why This Matters

Your agent is 45 minutes into a complex research task. User closes their browser. Server restarts. All progress lost.

Or worse: agent crashes mid-booking. User refreshes. Agent starts over. Now there’s an orphaned booking in your system.

The Core Problem (from Anthropic, November 2025):

“The core challenge of long-running agents is that they must work in discrete sessions, and each new session begins with no memory of what came before.”

Imagine a software project staffed by engineers working in shifts, where each new engineer arrives with no memory of what happened on the previous shift. That’s what happens when context windows fill up or processes crash.

What Goes Wrong Without This:

STATE PERSISTENCE FAILURE PATTERNS

Why Context Windows Aren’t Enough

Even with 200k token windows (Claude) or 1M tokens (Gemini):

  • Complex tasks overflow: Software development, research, financial modeling require more context than any window holds
  • Token costs scale linearly: Keeping everything in context = expensive
  • Latency increases: Larger context = slower inference
  • Attention degrades: Very long contexts hurt model performance

Most production tasks require work across many sessions.


The Three Challenges

ChallengeProblemSolution
PersistenceState lost on crash/restartCheckpoint to durable storage
RecoveryDon’t know what completedTrack progress explicitly
Context BridgingNew session lacks contextProgress files, structured handoff

Agent Memory Systems: The Complete Picture

State management is really about memory. Understanding the different types of memory helps you design robust agents.

The Memory Taxonomy

AGENT MEMORY SYSTEMS

Memory Type Comparison

Memory TypePersistenceScopeStorageRetrieval
WorkingNone (context window)Current turnLLM contextAutomatic
EpisodicSessionCurrent taskCheckpointer (Postgres)By thread_id
SemanticPermanentAll tasksVector DBSimilarity search
ProceduralPermanentAll tasksPrompts / Fine-tuningAlways loaded

How They Map to Implementation

class AgentMemory:
    def __init__(self):
        # Working Memory: Current context window
        self.working_memory = []  # Just conversation turns

        # Episodic Memory: Checkpointed session state
        self.episodic = PostgresSaver.from_conn_string(DB_URL)

        # Semantic Memory: Long-term learned knowledge
        self.semantic = VectorDB(embedding_model="text-embedding-3-small")

        # Procedural Memory: Baked into the system prompt
        self.procedural = load_system_prompt("agent_instructions.md")

    def process_turn(self, user_input, thread_id):
        # 1. Load episodic memory (session state)
        session_state = self.episodic.load(thread_id)

        # 2. Query semantic memory (relevant long-term knowledge)
        relevant_knowledge = self.semantic.search(user_input, k=3)

        # 3. Build working memory (context for this turn)
        self.working_memory = [
            {"role": "system", "content": self.procedural},
            *session_state.get("conversation_history", []),
            {"role": "context", "content": format_knowledge(relevant_knowledge)},
            {"role": "user", "content": user_input}
        ]

        # 4. Get response
        response = llm.chat(self.working_memory)

        # 5. Update episodic memory
        session_state["conversation_history"].append(
            {"role": "user", "content": user_input}
        )
        session_state["conversation_history"].append(
            {"role": "assistant", "content": response}
        )
        self.episodic.save(thread_id, session_state)

        # 6. Optionally update semantic memory with learned insights
        if self.should_memorize(response):
            self.semantic.insert(extract_insight(response))

        return response

The Context Management Problem

The core tradeoff: More context = better understanding, but also:

  • Higher token costs
  • Increased latency
  • Attention degradation on very long contexts

The solution hierarchy:

CONTEXT MANAGEMENT STRATEGIES

Memory Flow Diagram

MEMORY ORCHESTRATION FLOW

Common Memory Anti-patterns

Anti-patternProblemFix
Everything in contextToken explosion, attention degradationUse semantic memory for stable knowledge
No session continuityAgent forgets mid-conversationCheckpoint episodic memory
Context as databaseSlow, expensive, fragileStore data externally, retrieve what’s needed
No memory pruningUnbounded growthTTL on episodic, compaction on working
Ignoring proceduralAgent reinvents wheelsBake patterns into system prompt

Solution 1: LangGraph Checkpointers

LangGraph is the industry standard for agent state management. Here’s how to use it in production.

Basic Setup

from langgraph.checkpoint.postgres import PostgresSaver

# Production: PostgreSQL for durability
checkpointer = PostgresSaver.from_conn_string(
    "postgresql://user:pass@host:5432/db"
)

# Create graph with checkpointing
graph = StateGraph(AgentState)
graph.add_node("think", think_node)
graph.add_node("act", act_node)
# ... add edges ...

app = graph.compile(checkpointer=checkpointer)

# Execute with thread_id for persistence
config = {"configurable": {"thread_id": "user-123-task-456"}}
result = app.invoke({"input": "Book flight to NYC"}, config)

# Later: resume from checkpoint
# Same thread_id = same state
result = app.invoke({"input": "Make it morning flight"}, config)

What StateSnapshot Captures

# Every checkpoint stores:
{
    "channel_values": {...},     # Current state data
    "next_nodes": ["act"],       # What to execute next
    "config": {...},             # Configuration
    "metadata": {
        "writes": {...},         # Recent modifications
        "step": 5                # Progress counter
    },
    "pending_tasks": [...]       # Incomplete work
}

Storage Options

StorageUse CaseTradeoffs
MemorySaverDevelopmentFast, lost on restart
SQLiteSaverSingle-nodeLocal persistence, limited scale
PostgresSaverProductionMulti-node, ACID guarantees
S3ArchivalLong-term storage, slower access

Production rule: Always use PostgresSaver (or equivalent) in production. MemorySaver is for local development only.


Solution 2: Checkpoint Timing

This is where most teams get it wrong. The timing of checkpoints matters.

Wrong: Checkpoint After Execution

# WRONG: If crash happens between execute and checkpoint,
# you don't know if step ran
def execute_step(self, step):
    result = step.run()           # Execute
    self.state['completed'].append(step.id)
    self.checkpoint()             # Save state
    # ^ If crash happens before checkpoint, step ran but state doesn't show it
    return result

Right: Checkpoint Before AND After

def execute_step(self, step):
    # BEFORE: Mark intent (crash here = know step was attempted)
    self.state['in_progress'] = step.id
    self.checkpoint()

    # Execute
    result = step.run()

    # AFTER: Mark completion
    self.state['completed'].append(step.id)
    del self.state['in_progress']
    self.state['last_result'] = result
    self.checkpoint()

    return result

Resume Logic

def resume(self):
    state = self.load_checkpoint()

    if 'in_progress' in state:
        # Crashed during execution
        step_id = state['in_progress']

        # Check if step actually completed (idempotent read)
        if self.check_step_completed_externally(step_id):
            # Step ran, just didn't checkpoint
            state['completed'].append(step_id)
            del state['in_progress']
            self.checkpoint()
        else:
            # Step didn't complete — re-execute with idempotency key
            step = self.get_step(step_id)
            self.execute_step(step)

    # Continue from last known good state
    return state

Why this works: If you crash between the two checkpoints, the in_progress marker tells you exactly what was happening. You can check if it completed and act accordingly.


Solution 3: Progress Tracking Files (Anthropic Pattern)

For multi-session tasks, explicit progress files bridge context gaps.

The Two-Agent Pattern

# Initializer Agent (first run only)
def initialize_project(task):
    # Set up environment
    setup_environment()

    # Create progress file
    progress = {
        "goal": task.description,
        "completed_steps": [],
        "blockers": [],
        "next_action": "Analyze requirements",
        "context": {"files": [], "apis": []}
    }

    write_file("claude-progress.txt", format_progress(progress))
    git_commit("Initial project setup")

# Coding Agent (every session)
def continue_work():
    # Read progress from last session
    progress = read_file("claude-progress.txt")

    # Make incremental progress
    result = work_on_next_action(progress)

    # Update progress for next session
    progress["completed_steps"].append(result.action)
    progress["next_action"] = result.next_step

    write_file("claude-progress.txt", format_progress(progress))
    git_commit(f"Completed: {result.action}")

Progress File Structure

# Progress: Book Flight to NYC

## Current Goal

Book morning flight to NYC for tomorrow

## Completed Steps

1. [x] Parsed user intent: destination=NYC, date=tomorrow
2. [x] Inferred departure: SFO (from calendar)
3. [x] Searched flights: 47 options found
4. [x] User clarified: wants LaGuardia, not JFK
5. [x] Filtered to LGA: 18 options

## Current Blocker

8am flight sold out while user was deciding

## Next Action

Present 9am alternative ($12 more)

## Context

- User prefers aisle seats
- Corporate travel policy: max $500
- Departure: SFO
- Arrival: LGA

Why this works: New session reads progress file first. Immediate context on what’s done, what’s blocked, what’s next. No wasted tokens re-discovering state.


Solution 4: Hybrid Memory

For sophisticated agents, combine short-term checkpoints with long-term vector memory.

class HybridMemory:
    def __init__(self, checkpointer, vector_db):
        self.checkpointer = checkpointer  # Short-term
        self.vector_db = vector_db        # Long-term

    def save_session_state(self, thread_id, state):
        """Short-term: current conversation, active task"""
        self.checkpointer.save(thread_id, state)

    def save_insight(self, insight):
        """Long-term: learned patterns, preferences"""
        embedding = embed(insight)
        self.vector_db.insert(embedding, insight)

    def recall_relevant(self, query, k=5):
        """Retrieve relevant long-term memories"""
        return self.vector_db.search(embed(query), k=k)

    def load_context(self, thread_id, current_input):
        """Combine short-term state + relevant long-term memories"""
        state = self.checkpointer.load(thread_id)
        memories = self.recall_relevant(current_input)
        return {**state, "relevant_memories": memories}

When to Use Each

Memory TypeUse ForDon’t Use For
Short-term (Checkpointer)Current conversation, active task statePreferences learned months ago
Long-term (Vector DB)User preferences, learned patterns, domain knowledgeEphemeral conversation turns

Key insight: Query long-term memory as a tool (retrieve when needed), don’t jam everything into context.


Observation Masking

For software engineering agents, most tokens in a turn are observation (test output, file contents). This explodes context fast.

def compact_history(history):
    compacted = []
    for turn in history:
        if turn.type == "observation":
            # Compress verbose output
            compacted.append({
                "type": "observation_summary",
                "content": summarize(turn.content, max_tokens=100)
            })
        else:
            # Keep action/reasoning in full
            compacted.append(turn)
    return compacted

# Before: 50k tokens of test output
# After: 100 token summary of test results

Result: Targets the token-heavy part while preserving decision history.


Common Gotchas

GotchaSymptomFix
Checkpoint too largeSave/load becomes bottleneckPrune old observations, limit history depth
Checkpoint corruptionState lost or inconsistentAtomic writes, versioning, backup checkpoints
Session resume confusionAgent repeats completed tasksExplicit progress files, structured state schema
No checkpoint before executionCan’t tell if step ran on crashCheckpoint intent BEFORE execution
No atomic writesPartial checkpoint on crashUse database transactions, write-ahead logging

Multi-Agent State (Still Fragile)

2025 Reality Check (from research):

“Multi-agent systems are not yet capable of engaging in long-context, proactive discourse with significantly more reliability than a single agent.”

Why Multi-Agent State Is Hard:

  • Context fragmentation across agents
  • Synchronization overhead
  • Network latency disrupts state updates
  • Error compounding from fragmented information

Claude Code’s Solution: Single-threaded subtasking

  • Spawns subtasks but never runs parallel work
  • Main agent retains comprehensive context
  • Prevents error compounding from fragmented state

Recommendation: Start with single-agent, add multi-agent only when necessary.


The Checkpointing Checklist

Before deploying an agent with persistent state:

CHECKPOINTING DEPLOYMENT CHECKLIST

Key Takeaways

  1. Context windows aren’t enough. Complex tasks require state that survives sessions.

  2. Checkpoint timing matters. Checkpoint BEFORE execution to know what was attempted. Checkpoint AFTER to know what succeeded.

  3. Progress files bridge sessions. New session reads progress first. No wasted tokens rediscovering state.

  4. Hybrid memory separates concerns. Short-term state in checkpointer. Long-term knowledge in vector DB.

  5. Multi-agent state is fragile. Start single-agent. Add complexity only when necessary.


Next Steps

State persists. But what happens when the agent needs human judgment?

Part 3: Human-in-the-Loop Patterns

Or jump to another topic:

Concepts covered in this article