Designing Persistent Memory for AI Agents

AI agents without memory are like employees who forget everything between shifts. They can't build on previous work, can't learn user preferences, and can't maintain context across interactions. Every conversation starts from zero.

Aphelion's memory system gives agents persistent identity and context that spans sessions, users, and time. This post explains how we designed it.

The Memory Model

We think about agent memory in three layers, each with different persistence characteristics and access patterns.

Working memory is the immediate context of the current task. What is the agent trying to accomplish right now? What steps has it taken? What's the next action? This is ephemeral—it exists only for the duration of a task execution.

Session memory spans a single user interaction. If a user asks an agent to book a flight, then changes their mind about the dates, the agent needs to remember what was already discussed. Session memory persists for hours or days, then fades.

Long-term memory is the permanent record. User preferences. Important facts. Learned behaviors. This persists indefinitely and forms the agent's accumulated knowledge.

Storage Architecture

Each memory layer has different storage requirements. Working memory needs to be blazingly fast—we're talking single-digit millisecond access times—because agents query it constantly during task execution. We use in-memory data structures with optional persistence to disk.

Session memory needs to be fast but also durable. A user stepping away for lunch shouldn't lose their context. We use a distributed key-value store with automatic expiration.

Long-term memory is where it gets interesting. Raw storage is easy—the hard part is retrieval. When an agent needs to remember something relevant, it can't scan through gigabytes of historical data. It needs semantic search.

Semantic Retrieval

We embed memories into a high-dimensional vector space using language models. Semantically similar memories cluster together. When an agent needs to recall something, we embed the query and find the nearest neighbors.

But pure semantic similarity isn't enough. A memory from yesterday is probably more relevant than a semantically similar memory from six months ago. A memory about the current user is more relevant than a memory about a different user.

We combine semantic similarity with recency, relevance weights, and access patterns to produce a final ranking. The exact formula is continuously tuned based on feedback from agent developers.

Memory Formation

Not everything should become a long-term memory. If an agent processed a thousand customer support tickets, storing every detail would be wasteful and make retrieval slower. We need to be selective.

Agents can explicitly create memories using our SDK—things they've determined are important to remember. But we also have automatic memory formation that watches for patterns: repeated user preferences, successful task completions, error conditions to avoid.

Memories are also consolidated over time. Multiple similar memories get merged into a single, stronger memory. Old memories that are never accessed gradually decay in relevance (but aren't deleted—storage is cheap, and you never know what might become relevant).

Privacy and Isolation

Memory systems create serious privacy considerations. An agent shouldn't accidentally leak one user's information to another. An agent used by multiple companies shouldn't cross-pollinate their data.

We enforce strict isolation at the infrastructure level. Each agent has a separate memory namespace. Each user's memories within an agent are isolated. Cross-boundary access is impossible—not just policy-prohibited, but architecturally prevented.

Developers can also mark memories as sensitive, which adds encryption at rest and more aggressive access logging. For regulated industries, we offer data residency controls to ensure memories stay within geographic boundaries.

The API

We wanted the memory API to feel natural to work with. Creating a memory is a single line of code. Retrieval is equally simple—query with natural language and get back relevant memories ranked by importance.

For developers who want more control, we expose the underlying mechanisms: explicit embedding, custom relevance functions, manual memory management. But most developers never need to touch this—the defaults work well for common use cases.

Performance at Scale

As agents accumulate more memories, retrieval can't slow down. We use approximate nearest neighbor algorithms that give us constant-time lookups regardless of memory count. The trade-off is a small chance of missing the absolute best match—but we compensate by retrieving slightly more candidates and re-ranking them.

We also shard memories across multiple servers. Hot memories (frequently accessed) get replicated for faster access. Cold memories get compressed and archived. The system automatically manages this based on access patterns.

What We Learned

Building memory systems for AI agents is different from traditional databases. The data is unstructured. Queries are fuzzy. Relevance depends on context that's constantly shifting. There's no schema, no indexes (in the traditional sense), no SQL.

But the payoff is significant. Agents with good memory systems feel intelligent in a way that memoryless agents don't. They build relationships with users. They get better over time. They feel less like tools and more like assistants.

We're continuing to invest heavily in memory capabilities—better retrieval, smarter consolidation, more nuanced privacy controls. If you're building agents that need to remember, we'd love to hear what you're working on.