Skip to content

Letta

What it is

Letta (formerly MemGPT) is a framework for creating stateful AI agents with "infinite" memory. It manages memory as a tiered system (long-term, short-term) to overcome LLM context window limits by treating the context window as a "cache" for a larger, persistent memory store.

What problem it solves

Standard LLMs suffer from "forgetfulness" once their context window is exceeded. Letta enables long-lived agents that remember past interactions, user preferences, and project details over extended periods, making them suitable for personal assistants and complex, multi-session software engineering.

Where it fits in the stack

Category: Agent / Memory Layer

Typical use cases

  • Persistent Personal Assistants: Agents that remember months of conversation history and preferences.
  • Multi-session Coding Projects: Agents that maintain state across different days of development.
  • Durable Workflows: Agents that can be paused and resumed without losing task context.

When to use it

  • Long-Lived Agents: When you need an agent to maintain personality, memory, and state over weeks or months of interaction.
  • Context-Exceeding Tasks: When the information needed for a task (e.g., a large codebase or complex user history) exceeds the LLM's raw context window.
  • Stateful Multi-Session Work: For engineering or research tasks that span multiple sessions and require the agent to remember where it left off.

When not to use it

  • Stateless Transactions: For simple, one-off API calls or basic chatbots, the memory management overhead is unnecessary.
  • Low-Latency Requirements: Tiered memory access (searching vector DBs and updating core memory) increases inference time.
  • Serverless Deployments: Letta requires a persistent server and database, making it less suitable for purely serverless architectures.

Virtual Context Management

Letta implements a "Virtual Context" architecture inspired by operating systems: - Core Memory: Fixed-size, high-priority context (e.g., current task, user bio). - Archival Memory: Infinite long-term storage (vector DB) for facts and past logs. - Recall Memory: Searchable history of all past interactions.

Getting started

Installation

pip install letta

Basic Usage

Start the Letta server and interact with a stateful agent that persists its memory in a database.

CLI examples

# Start the interactive Letta Code CLI
letta

# Run a query in headless mode and persist the result
letta -p "Implement a Dockerfile for a Go application and remember my preference for Alpine base images."

# Switch the underlying LLM model for the current agent session
letta /model gpt-4o

API examples

from letta import create_client

client = create_client()

# 1. Create a stateful agent with persistent memory
agent = client.create_agent(
    name="DurableAssistant",
    memory_type="base_memory"
)

# 2. Send a message that updates agent state
response = client.user_message(
    agent_id=agent.id,
    message="My favorite project is the autonomous drone hub."
)

# 3. Retrieve the response (the agent will remember this in the next call)
print(f"Agent: {response[0].text}")

Strengths

  • State Persistence: State is stored in a database (PostgreSQL by default), allowing agents to survive process restarts.
  • Infinite Context: Automatically manages what stays in the active LLM context and what goes to long-term storage.
  • Self-Editing Memory: Agents can be given tools to "write" to their own memory.

Limitations

  • Latency: Tiered memory management adds overhead to each inference step.
  • Complexity: Setting up the server and database infrastructure is more involved than simple stateless agents.
  • Token Usage: Managing the memory buffer requires additional tokens for system prompts and internal reasoning.

Sources / references

Contribution Metadata

  • Last reviewed: 2026-05-29
  • Confidence: high