I/D/E · Generative AI

Memory Systems for AI

Summary

Session, short-term, long-term, and episodic memory for AI agents and chatbots

TL;DR

AI memory enables personalization and context retention across conversations. Session memory is raw message history, short-term is compressed summaries, long-term stores facts about entities, and episodic tracks timestamped events. Production systems need all four types.

Visual Overview

Without Memory

                                                           
   User (Monday): "I am allergic to shellfish"             
   Agent: "Got it, I will note that."                      
                                                           
   User (Tuesday): "What should I order?"                  
   Agent: "The shrimp scampi is excellent!"   FAILURE     
                                                           

Business cost of no memory:

  • Support: 40% of tickets are repeat issues (wasted agent time)
  • Sales: Lost context = lost deals ($50K avg deal, 15% close rate drop)
  • Product: Users churn when AI “forgets” them (12% higher churn)

Memory is not a feature. Memory is table stakes.


Memory Types

Memory Types

 TYPE         DEFINITION               BOUNDARY        

 SESSION      Raw message history      Clears on:      
 (Buffer)     in current conversation Session end   
                                      Context limit 
              Storage: Context window                  

 SHORT-TERM   Compressed/summarized    Triggers when:  
 (Working)    version of session       • Session > 4K  
                                       • Explicit sum  
              Storage: Prompt or cache Clears: Sess end

 LONG-TERM    FACTS about entities     Persists until: 
 (Semantic)   (user, org, domain)      • Explicit upd  
                                       • Contradiction 
              Storage: Vector DB       • TTL expiry    
                                                       
              Examples:                                
              • "User prefers email"                   
              • "Company has 50 emps"                  
              • "Budget is $100K"                      

 EPISODIC     EVENTS that happened     Persists until: 
 (Temporal)   (timestamped, queryable) • Retention pol 
                                       • User deletion 
              Storage: Vector DB +                     
              timestamp index                          
                                                       
              Examples:                                
              • "On 3/15, user asked                   
                 about refund"                         
              • "Last week, discussed                  
                 pricing concerns"                     

The Critical Distinction

Long-term vs Episodic

                                                           
   LONG-TERM: "User is allergic to shellfish"   FACT      
              (no timestamp needed, always true)           
                                                           
   EPISODIC:  "On 3/15, user mentioned shellfish allergy"  
               EVENT (when it happened matters)           
                                                           
   TEST: Can you answer "WHEN did you learn this?"         
         YES  Episodic  NO  Long-term                   
                                                           

Memory Operations

WRITE — When & What Gets Stored

TriggerWhat to StoreMemory Type
User states factExtracted factLong-term
User states preferencePreference + confidenceLong-term
Conversation endsSummary of key pointsShort-term
Significant eventEvent + timestampEpisodic
Entity mentionedEntity attributesLong-term

Extraction prompt example:

Extraction Prompt
From this conversation, extract:
1. Facts about the user (preferences, constraints, attributes)
2. Significant events (decisions made, problems discussed)
3. Action items or commitments

Format: {type: "fact"|"event", content: "...", confidence: 0-1}

READ — Retrieval Strategies

StrategyHowWhen to Use
RecencyLast N memoriesContinuation context
RelevanceSemantic similarity searchTopic-specific recall
Temporal”Last week”, “In March”Time-referenced query
EntityAll facts about XEntity-focused task
HybridRelevance + Recency boostGeneral retrieval

Retrieval prompt injection:

Retrieval Prompt Injection
Context from memory:
• User preference: Prefers email communication (confidence: 0.9)
• Recent event: Discussed billing issue on 3/15 (resolved)
• Fact: Company size is 50 employees

[Rest of prompt...]

FORGET — Critical for Production

MechanismTriggerImplementation
Explicit deleteUser requests “forget X”Hard delete + audit
ContradictionNew fact contradicts oldUpdate, keep history
DecayMemory not accessed in NReduce retrieval weight
ConsolidationMany similar memoriesMerge into summary
TTLRetention policy expiryHard delete
GDPR request”Right to be forgotten”Full user purge

Memory Conflicts

Memory Conflicts
THE PROBLEM

                                                           
   January: User says "I hate spicy food"                  
   March:   User says "I love spicy food"                  
                                                           
   Now what?                                               
                                                           


RESOLUTION STRATEGIES

 
 1. LAST WRITE WINS 
 Simple: most recent fact replaces old 
 Risk: Loses nuance ("I am on a diet this month") 
 
 2. KEEP BOTH WITH TIMESTAMPS 
 Store both, retrieve most recent by default 
 Allows: "You mentioned hating spicy food in Jan..." 
 
 3. ASK FOR CLARIFICATION 
 "I see you mentioned both. Which is current?" 
 Best UX but interrupts flow 
 
 4. CONFIDENCE DECAY 
 Older facts have lower confidence 
 Retrieve based on recency-weighted confidence 
 



Architecture Patterns

Architecture Patterns
SIMPLE: SESSION ONLY

                                                           
   User  [Context Window]  LLM  Response                
                                                           
   Pros: No infrastructure, works today                    
   Cons: Forgets everything between sessions               
   Use for: Stateless Q&A, simple chatbots                 
                                                           


INTERMEDIATE: SUMMARIZATION

 
 User  [Recent messages + Summary of older]  LLM 
 
 When context fills: 
 1. Summarize older messages 
 2. Keep summary + recent N messages 
 
 Pros: Handles long conversations 
 Cons: Loses detail, still per-session 
 Use for: Long-form chat, customer support 
 


PRODUCTION: FULL MEMORY STACK

 
  
 User  Memory Layer  
  
  
  
    
    
  Session   Long-term   Episodic  
  Buffer   Facts   Events  
    
    
  
  
  
  Retrieval  
  
  
  
  LLM  
  
 

Implementation Checklist

Implementation Checklist
MINIMUM VIABLE MEMORY

                                                           
   [ ] Session persistence (Redis/DB, not just context)    
   [ ] Summary generation on session end                   
   [ ] User fact storage (vector DB or KV store)           
   [ ] Retrieval on conversation start                     
   [ ] Explicit forget mechanism                           
                                                           


PRODUCTION MEMORY

 
 [ ] All of above, plus: 
 [ ] Episodic memory with timestamps 
 [ ] Conflict resolution strategy 
 [ ] Confidence scores on facts 
 [ ] Decay/consolidation for old memories 
 [ ] GDPR compliance (deletion, export) 
 [ ] Memory debugging UI for support 
 [ ] Metrics: retrieval latency, relevance scores 
 

When This Matters

SituationWhat to implement
Simple chatbotSession buffer only
Customer support+ Summaries + User facts
Sales assistant+ Episodic (deal history matters)
Personal assistantFull stack with long-term memory
Enterprise deployment+ Compliance, audit, deletion
Multi-turn conversationsSession + summarization
PersonalizationLong-term user preferences
”Remember when” queriesEpisodic memory required

Production signal

Why this concept matters

Interview 50% of AI product interviews
Production Essential for conversational AI
Performance 12% higher churn without memory