AI Agent Memory Management: Complete Solution from Short-term to Persistent Memory
Complete guide to implementing context memory for AI Agents - covers vector database storage, summarization techniques, and persistent storage solutions.
The biggest pain point in conversational AI: every new conversation starts from scratch. The AI forgets everything from previous sessions. This guide covers a complete memory architecture that solves this problem.
Three-Layer Memory Architecture
The solution uses three distinct memory layers:
- Long-term Memory - stores preferences and key decisions (weeks to years retention)
- Mid-term Memory - stores current project context (days to weeks retention)
- Short-term Memory - stores immediate conversation (current session only)
Implementation
Short-term Memory: Recent Message Buffer
from collections import deque
class ShortTermMemory:
def __init__(self, max_items=10):
self.buffer = deque(maxlen=max_items)
def add(self, role: str, content: str):
self.buffer.append({"role": role, "content": content})
def get_context(self, system_prompt: str) -> list:
messages = [{"role": "system", "content": system_prompt}]
messages.extend(self.buffer)
return messages
Mid-term Memory: Vector Database Storage
import chromadb
from chromadb.config import Settings
class MidTermMemory:
def __init__(self, collection_name="memories"):
self.client = chromadb.Client(Settings(
anonymized_telemetry=False,
allow_reset=True
))
self.collection = self.client.get_or_create_collection(
name=collection_name,
metadata={"hnsw:space": "cosine"}
)
def add_memory(self, text: str, metadata: dict):
self.collection.add(
documents=[text],
metadatas=[metadata],
ids=[f"mem_{metadata.get('timestamp', 0)}"]
)
def search(self, query: str, top_k=3) -> list:
results = self.collection.query(
query_texts=[query],
n_results=top_k
)
return results.get("documents", [[]])[0]
Long-term Memory: Key Information Summary
class LongTermMemory:
def __init__(self, db_path="./memory.db"):
import sqlite3
self.conn = sqlite3.connect(db_path)
self._init_db()
def _init_db(self):
self.conn.execute("""
CREATE TABLE IF NOT EXISTS memories (
id INTEGER PRIMARY KEY,
content TEXT,
importance TEXT,
created_at INTEGER
)
""")
self.conn.commit()
def save_key_memory(self, content: str, importance: str = "medium"):
import time
self.conn.execute(
"INSERT INTO memories (content, importance, created_at) VALUES (?, ?, ?)",
(content, importance, int(time.time()))
)
self.conn.commit()
Complete Agent Memory System
class AgentMemory:
def __init__(self):
self.short = ShortTermMemory(max_items=10)
self.mid = MidTermMemory()
self.long = LongTermMemory()
def remember(self, role: str, content: str):
self.short.add(role, content)
# Sync to mid-term when buffer is full
if len(self.short) >= 10:
summary = self._summarize(self.short.buffer)
self.mid.add_memory(summary, {"source": "conversation"})
def get_full_context(self, query: str) -> str:
preferences = self.long.get_preferences()
related = self.mid.search(query)
recent = self.short.get_context("")
return f"Preferences: {preferences}\nRelated: {related}\nRecent: {recent}"
Tech Stack Recommendations
| Scenario | Recommended Solution |
|---|---|
| Simple chatbot | Short + Mid memory only |
| Personal AI assistant | All three layers |
| Enterprise客服 | Mid + Summary (no long-term for privacy) |
| Multi-user system | Per-user vector indexes |
Start simple and add layers as needed. Most projects only need short-term memory with keyword search.
Related Articles
Gemma 4 Good Hackathon: Kaggle Competition for Global Impact
Google's Kaggle challenge leverages Gemma 4 open models to address world-pressing issues
The Rise of Claude Code: How Autonomous AI Coding Agents Are Reshaping Development
An in-depth look at Claude Code's autonomous capabilities, Auto Mode, and how AI coding agents are transforming software development workflows.
AI in Gaming: Beyond NPCs to Intelligent Game Worlds
How artificial intelligence is transforming video games from scripted interactions to dynamic, adaptive experiences—and what comes next
