TL;DR
AI memory enables personalization and context retention across conversations. Session memory is raw message history, short-term is compressed summaries, long-term stores facts about entities, and episodic tracks timestamped events. Production systems need all four types.
Visual Overview
┌───────────────────────────────────────────────────────────┐ │ │ │ User (Monday): "I am allergic to shellfish" │ │ Agent: "Got it, I will note that." │ │ │ │ User (Tuesday): "What should I order?" │ │ Agent: "The shrimp scampi is excellent!" ← FAILURE │ │ │ └───────────────────────────────────────────────────────────┘
Business cost of no memory:
- Support: 40% of tickets are repeat issues (wasted agent time)
- Sales: Lost context = lost deals ($50K avg deal, 15% close rate drop)
- Product: Users churn when AI “forgets” them (12% higher churn)
Memory is not a feature. Memory is table stakes.
Memory Types
┌─────────────┬─────────────────────────┬─────────────────┐ │ TYPE │ DEFINITION │ BOUNDARY │ ├─────────────┼─────────────────────────┼─────────────────┤ │ SESSION │ Raw message history │ Clears on: │ │ (Buffer) │ in current conversation │ • Session end │ │ │ │ • Context limit │ │ │ Storage: Context window │ │ ├─────────────┼─────────────────────────┼─────────────────┤ │ SHORT-TERM │ Compressed/summarized │ Triggers when: │ │ (Working) │ version of session │ • Session > 4K │ │ │ │ • Explicit sum │ │ │ Storage: Prompt or cache│ Clears: Sess end│ ├─────────────┼─────────────────────────┼─────────────────┤ │ LONG-TERM │ FACTS about entities │ Persists until: │ │ (Semantic) │ (user, org, domain) │ • Explicit upd │ │ │ │ • Contradiction │ │ │ Storage: Vector DB │ • TTL expiry │ │ │ │ │ │ │ Examples: │ │ │ │ • "User prefers email" │ │ │ │ • "Company has 50 emps" │ │ │ │ • "Budget is $100K" │ │ ├─────────────┼─────────────────────────┼─────────────────┤ │ EPISODIC │ EVENTS that happened │ Persists until: │ │ (Temporal) │ (timestamped, queryable)│ • Retention pol │ │ │ │ • User deletion │ │ │ Storage: Vector DB + │ │ │ │ timestamp index │ │ │ │ │ │ │ │ Examples: │ │ │ │ • "On 3/15, user asked │ │ │ │ about refund" │ │ │ │ • "Last week, discussed │ │ │ │ pricing concerns" │ │ └─────────────┴─────────────────────────┴─────────────────┘
The Critical Distinction
┌───────────────────────────────────────────────────────────┐ │ │ │ LONG-TERM: "User is allergic to shellfish" ← FACT │ │ (no timestamp needed, always true) │ │ │ │ EPISODIC: "On 3/15, user mentioned shellfish allergy" │ │ ← EVENT (when it happened matters) │ │ │ │ TEST: Can you answer "WHEN did you learn this?" │ │ YES → Episodic │ NO → Long-term │ │ │ └───────────────────────────────────────────────────────────┘
Memory Operations
WRITE — When & What Gets Stored
| Trigger | What to Store | Memory Type |
|---|---|---|
| User states fact | Extracted fact | Long-term |
| User states preference | Preference + confidence | Long-term |
| Conversation ends | Summary of key points | Short-term |
| Significant event | Event + timestamp | Episodic |
| Entity mentioned | Entity attributes | Long-term |
Extraction prompt example:
From this conversation, extract: 1. Facts about the user (preferences, constraints, attributes) 2. Significant events (decisions made, problems discussed) 3. Action items or commitments Format: {type: "fact"|"event", content: "...", confidence: 0-1}
READ — Retrieval Strategies
| Strategy | How | When to Use |
|---|---|---|
| Recency | Last N memories | Continuation context |
| Relevance | Semantic similarity search | Topic-specific recall |
| Temporal | ”Last week”, “In March” | Time-referenced query |
| Entity | All facts about X | Entity-focused task |
| Hybrid | Relevance + Recency boost | General retrieval |
Retrieval prompt injection:
Context from memory: • User preference: Prefers email communication (confidence: 0.9) • Recent event: Discussed billing issue on 3/15 (resolved) • Fact: Company size is 50 employees [Rest of prompt...]
FORGET — Critical for Production
| Mechanism | Trigger | Implementation |
|---|---|---|
| Explicit delete | User requests “forget X” | Hard delete + audit |
| Contradiction | New fact contradicts old | Update, keep history |
| Decay | Memory not accessed in N | Reduce retrieval weight |
| Consolidation | Many similar memories | Merge into summary |
| TTL | Retention policy expiry | Hard delete |
| GDPR request | ”Right to be forgotten” | Full user purge |
Memory Conflicts
THE PROBLEM ┌───────────────────────────────────────────────────────────┐ │ │ │ January: User says "I hate spicy food" │ │ March: User says "I love spicy food" │ │ │ │ Now what? │ │ │ └───────────────────────────────────────────────────────────┘ RESOLUTION STRATEGIES ┌───────────────────────────────────────────────────────────┐ │ │ │ 1. LAST WRITE WINS │ │ Simple: most recent fact replaces old │ │ Risk: Loses nuance ("I am on a diet this month") │ │ │ │ 2. KEEP BOTH WITH TIMESTAMPS │ │ Store both, retrieve most recent by default │ │ Allows: "You mentioned hating spicy food in Jan..." │ │ │ │ 3. ASK FOR CLARIFICATION │ │ "I see you mentioned both. Which is current?" │ │ Best UX but interrupts flow │ │ │ │ 4. CONFIDENCE DECAY │ │ Older facts have lower confidence │ │ Retrieve based on recency-weighted confidence │ │ │ └───────────────────────────────────────────────────────────┘
Architecture Patterns
SIMPLE: SESSION ONLY ┌───────────────────────────────────────────────────────────┐ │ │ │ User → [Context Window] → LLM → Response │ │ │ │ Pros: No infrastructure, works today │ │ Cons: Forgets everything between sessions │ │ Use for: Stateless Q&A, simple chatbots │ │ │ └───────────────────────────────────────────────────────────┘ INTERMEDIATE: SUMMARIZATION ┌───────────────────────────────────────────────────────────┐ │ │ │ User → [Recent messages + Summary of older] → LLM │ │ │ │ When context fills: │ │ 1. Summarize older messages │ │ 2. Keep summary + recent N messages │ │ │ │ Pros: Handles long conversations │ │ Cons: Loses detail, still per-session │ │ Use for: Long-form chat, customer support │ │ │ └───────────────────────────────────────────────────────────┘ PRODUCTION: FULL MEMORY STACK ┌───────────────────────────────────────────────────────────┐ │ │ │ ┌────────────────┐ │ │ User ──────────→│ Memory Layer │ │ │ └───────┬────────┘ │ │ │ │ │ ┌───────────────┼───────────────┐ │ │ │ │ │ │ │ ┌─────▼─────┐ ┌─────▼─────┐ ┌─────▼─────┐ │ │ │ Session │ │ Long-term │ │ Episodic │ │ │ │ Buffer │ │ Facts │ │ Events │ │ │ └─────┬─────┘ └─────┬─────┘ └─────┬─────┘ │ │ │ │ │ │ │ └───────────────┼───────────────┘ │ │ │ │ │ ┌───────▼────────┐ │ │ │ Retrieval │ │ │ └───────┬────────┘ │ │ │ │ │ ┌───────▼────────┐ │ │ │ LLM │ │ │ └────────────────┘ │ │ │ └───────────────────────────────────────────────────────────┘
Implementation Checklist
MINIMUM VIABLE MEMORY ┌───────────────────────────────────────────────────────────┐ │ │ │ [ ] Session persistence (Redis/DB, not just context) │ │ [ ] Summary generation on session end │ │ [ ] User fact storage (vector DB or KV store) │ │ [ ] Retrieval on conversation start │ │ [ ] Explicit forget mechanism │ │ │ └───────────────────────────────────────────────────────────┘ PRODUCTION MEMORY ┌───────────────────────────────────────────────────────────┐ │ │ │ [ ] All of above, plus: │ │ [ ] Episodic memory with timestamps │ │ [ ] Conflict resolution strategy │ │ [ ] Confidence scores on facts │ │ [ ] Decay/consolidation for old memories │ │ [ ] GDPR compliance (deletion, export) │ │ [ ] Memory debugging UI for support │ │ [ ] Metrics: retrieval latency, relevance scores │ │ │ └───────────────────────────────────────────────────────────┘
When This Matters
| Situation | What to implement |
|---|---|
| Simple chatbot | Session buffer only |
| Customer support | + Summaries + User facts |
| Sales assistant | + Episodic (deal history matters) |
| Personal assistant | Full stack with long-term memory |
| Enterprise deployment | + Compliance, audit, deletion |
| Multi-turn conversations | Session + summarization |
| Personalization | Long-term user preferences |
| ”Remember when” queries | Episodic memory required |
Production signal
Why this concept matters
Interview 50% of AI product interviews
Production Essential for conversational AI
Performance 12% higher churn without memory