Memory Systems for AI

TL;DR

AI memory enables personalization and context retention across conversations. Session memory is raw message history, short-term is compressed summaries, long-term stores facts about entities, and episodic tracks timestamped events. Production systems need all four types.

Visual Overview

Without Memory

┌───────────────────────────────────────────────────────────┐
│                                                           │
│   User (Monday): "I am allergic to shellfish"             │
│   Agent: "Got it, I will note that."                      │
│                                                           │
│   User (Tuesday): "What should I order?"                  │
│   Agent: "The shrimp scampi is excellent!"  ← FAILURE     │
│                                                           │
└───────────────────────────────────────────────────────────┘

Business cost of no memory:

Support: 40% of tickets are repeat issues (wasted agent time)
Sales: Lost context = lost deals ($50K avg deal, 15% close rate drop)
Product: Users churn when AI “forgets” them (12% higher churn)

Memory is not a feature. Memory is table stakes.

Memory Types

┌─────────────┬─────────────────────────┬─────────────────┐
│ TYPE        │ DEFINITION              │ BOUNDARY        │
├─────────────┼─────────────────────────┼─────────────────┤
│ SESSION     │ Raw message history     │ Clears on:      │
│ (Buffer)    │ in current conversation │ • Session end   │
│             │                         │ • Context limit │
│             │ Storage: Context window │                 │
├─────────────┼─────────────────────────┼─────────────────┤
│ SHORT-TERM  │ Compressed/summarized   │ Triggers when:  │
│ (Working)   │ version of session      │ • Session > 4K  │
│             │                         │ • Explicit sum  │
│             │ Storage: Prompt or cache│ Clears: Sess end│
├─────────────┼─────────────────────────┼─────────────────┤
│ LONG-TERM   │ FACTS about entities    │ Persists until: │
│ (Semantic)  │ (user, org, domain)     │ • Explicit upd  │
│             │                         │ • Contradiction │
│             │ Storage: Vector DB      │ • TTL expiry    │
│             │                         │                 │
│             │ Examples:               │                 │
│             │ • "User prefers email"  │                 │
│             │ • "Company has 50 emps" │                 │
│             │ • "Budget is $100K"     │                 │
├─────────────┼─────────────────────────┼─────────────────┤
│ EPISODIC    │ EVENTS that happened    │ Persists until: │
│ (Temporal)  │ (timestamped, queryable)│ • Retention pol │
│             │                         │ • User deletion │
│             │ Storage: Vector DB +    │                 │
│             │ timestamp index         │                 │
│             │                         │                 │
│             │ Examples:               │                 │
│             │ • "On 3/15, user asked  │                 │
│             │    about refund"        │                 │
│             │ • "Last week, discussed │                 │
│             │    pricing concerns"    │                 │
└─────────────────────────────────────────────────────────┘

The Critical Distinction

Long-term vs Episodic

┌───────────────────────────────────────────────────────────┐
│                                                           │
│   LONG-TERM: "User is allergic to shellfish"  ← FACT      │
│              (no timestamp needed, always true)           │
│                                                           │
│   EPISODIC:  "On 3/15, user mentioned shellfish allergy"  │
│              ← EVENT (when it happened matters)           │
│                                                           │
│   TEST: Can you answer "WHEN did you learn this?"         │
│         YES → Episodic │ NO → Long-term                   │
│                                                           │
└───────────────────────────────────────────────────────────┘

Memory Operations

WRITE — When & What Gets Stored

Trigger	What to Store	Memory Type
User states fact	Extracted fact	Long-term
User states preference	Preference + confidence	Long-term
Conversation ends	Summary of key points	Short-term
Significant event	Event + timestamp	Episodic
Entity mentioned	Entity attributes	Long-term

Extraction prompt example:

Extraction Prompt

From this conversation, extract:
1. Facts about the user (preferences, constraints, attributes)
2. Significant events (decisions made, problems discussed)
3. Action items or commitments

Format: {type: "fact"|"event", content: "...", confidence: 0-1}

READ — Retrieval Strategies

Strategy	How	When to Use
Recency	Last N memories	Continuation context
Relevance	Semantic similarity search	Topic-specific recall
Temporal	”Last week”, “In March”	Time-referenced query
Entity	All facts about X	Entity-focused task
Hybrid	Relevance + Recency boost	General retrieval

Retrieval prompt injection:

Retrieval Prompt Injection

Context from memory:
• User preference: Prefers email communication (confidence: 0.9)
• Recent event: Discussed billing issue on 3/15 (resolved)
• Fact: Company size is 50 employees

[Rest of prompt...]

FORGET — Critical for Production

Mechanism	Trigger	Implementation
Explicit delete	User requests “forget X”	Hard delete + audit
Contradiction	New fact contradicts old	Update, keep history
Decay	Memory not accessed in N	Reduce retrieval weight
Consolidation	Many similar memories	Merge into summary
TTL	Retention policy expiry	Hard delete
GDPR request	”Right to be forgotten”	Full user purge

Memory Conflicts

THE PROBLEM
┌───────────────────────────────────────────────────────────┐
│                                                           │
│   January: User says "I hate spicy food"                  │
│   March:   User says "I love spicy food"                  │
│                                                           │
│   Now what?                                               │
│                                                           │
└───────────────────────────────────────────────────────────┘

RESOLUTION STRATEGIES
┌───────────────────────────────────────────────────────────┐
│                                                           │
│ 1. LAST WRITE WINS                                        │
│ Simple: most recent fact replaces old                     │
│ Risk: Loses nuance ("I am on a diet this month")          │
│                                                           │
│ 2. KEEP BOTH WITH TIMESTAMPS                              │
│ Store both, retrieve most recent by default               │
│ Allows: "You mentioned hating spicy food in Jan..."       │
│                                                           │
│ 3. ASK FOR CLARIFICATION                                  │
│ "I see you mentioned both. Which is current?"             │
│ Best UX but interrupts flow                               │
│                                                           │
│ 4. CONFIDENCE DECAY                                       │
│ Older facts have lower confidence                         │
│ Retrieve based on recency-weighted confidence             │
│                                                           │
└───────────────────────────────────────────────────────────┘

Architecture Patterns

SIMPLE: SESSION ONLY
┌───────────────────────────────────────────────────────────┐
│                                                           │
│   User → [Context Window] → LLM → Response                │
│                                                           │
│   Pros: No infrastructure, works today                    │
│   Cons: Forgets everything between sessions               │
│   Use for: Stateless Q&A, simple chatbots                 │
│                                                           │
└───────────────────────────────────────────────────────────┘

INTERMEDIATE: SUMMARIZATION
┌───────────────────────────────────────────────────────────┐
│                                                           │
│ User → [Recent messages + Summary of older] → LLM         │
│                                                           │
│ When context fills:                                       │
│ 1. Summarize older messages                               │
│ 2. Keep summary + recent N messages                       │
│                                                           │
│ Pros: Handles long conversations                          │
│ Cons: Loses detail, still per-session                     │
│ Use for: Long-form chat, customer support                 │
│                                                           │
└───────────────────────────────────────────────────────────┘

PRODUCTION: FULL MEMORY STACK
┌───────────────────────────────────────────────────────────┐
│                                                           │
│ ┌────────────────┐                                        │
│ User ──────────→│ Memory Layer │                          │
│ └───────┬────────┘                                        │
│ │                                                         │
│ ┌───────────────┼───────────────┐                         │
│ │ │ │                                                     │
│ ┌─────▼─────┐ ┌─────▼─────┐ ┌─────▼─────┐                 │
│ │ Session │ │ Long-term │ │ Episodic │                    │
│ │ Buffer │ │ Facts │ │ Events │                           │
│ └─────┬─────┘ └─────┬─────┘ └─────┬─────┘                 │
│ │ │ │                                                     │
│ └───────────────┼───────────────┘                         │
│ │                                                         │
│ ┌───────▼────────┐                                        │
│ │ Retrieval │                                             │
│ └───────┬────────┘                                        │
│ │                                                         │
│ ┌───────▼────────┐                                        │
│ │ LLM │                                                   │
│ └────────────────┘                                        │
│                                                           │
└───────────────────────────────────────────────────────────┘

Implementation Checklist

MINIMUM VIABLE MEMORY
┌───────────────────────────────────────────────────────────┐
│                                                           │
│   [ ] Session persistence (Redis/DB, not just context)    │
│   [ ] Summary generation on session end                   │
│   [ ] User fact storage (vector DB or KV store)           │
│   [ ] Retrieval on conversation start                     │
│   [ ] Explicit forget mechanism                           │
│                                                           │
└───────────────────────────────────────────────────────────┘

PRODUCTION MEMORY
┌───────────────────────────────────────────────────────────┐
│                                                           │
│ [ ] All of above, plus:                                   │
│ [ ] Episodic memory with timestamps                       │
│ [ ] Conflict resolution strategy                          │
│ [ ] Confidence scores on facts                            │
│ [ ] Decay/consolidation for old memories                  │
│ [ ] GDPR compliance (deletion, export)                    │
│ [ ] Memory debugging UI for support                       │
│ [ ] Metrics: retrieval latency, relevance scores          │
│                                                           │
└───────────────────────────────────────────────────────────┘

When This Matters

Situation	What to implement
Simple chatbot	Session buffer only
Customer support	+ Summaries + User facts
Sales assistant	+ Episodic (deal history matters)
Personal assistant	Full stack with long-term memory
Enterprise deployment	+ Compliance, audit, deletion
Multi-turn conversations	Session + summarization
Personalization	Long-term user preferences
”Remember when” queries	Episodic memory required