Modern AI agents rely on memory to go beyond single-turn responses and behave more like intelligent, adaptive assistants. Memory enables agents to understand user context, retain important facts, recall past interactions, and personalize conversations over time. Without memory, an agent starts each interaction from scratch — forgetting preferences, goals, and history — which limits its usefulness and realism.
In AI systems, memory is typically divided into two types:
- Short-term memory maintains context within a single session or conversation thread. This allows the agent to track recent messages, follow up on prior turns, and provide coherent responses.
- Long-term memory stores information across sessions, including facts (semantic memory) and personal experiences or preferences (episodic memory). This allows the agent to “remember” what it has learned about a user or domain, making interactions feel more consistent and personalized.
This demo implements both memory types using Redis and Spring AI, combining the speed and flexibility of Redis with the semantic capabilities of vector embeddings. With vector similarity search, agents can retrieve relevant memories even if a user’s phrasing is different from how the information was originally stored. You’ll also see features like memory deduplication and filtering — all designed to give AI agents a robust and scalable memory system.
- Video: What is an embedding model?
- Video: Exact vs Approximate Nearest Neighbors—What's the difference?
- Video: What is semantic search?
- Video: What is a vector database?
The repository for this demo can be found here
To run this demo, you’ll need the following installed on your system:
- Docker – Install Docker
- Docker Compose – Included with Docker Desktop or available via CLI installation guide
- An OpenAI API Key – You can get one from platform.openai.com
The easiest way to run the demo is with Docker Compose, which sets up all required services in one command.
If you haven’t already:
git clone https://github.com/redis-developer/redis-springboot-recipes.git
cd redis-springboot-recipes/artificial-intelligence/agent-long-term-memory-with-spring-aiYou can pass your OpenAI API key in two ways:
export OPENAI_API_KEY=sk-your-api-keyCreate a .env file in the same directory as the docker-compose.yml file:
OPENAI_API_KEY=sk-your-api-keydocker compose up --buildThis will start:
- redis: for storing both vector embeddings and chat history
- redis-insight: a UI to explore the Redis data
- agent-memory-app: the Spring Boot app that implements the memory-aware AI agent
When all of your services are up and running. Go to localhost:8080 to access the demo:
Type a user ID in the user ID box and then click on start chat to start a new chat:
Type a message and hit send:
The system will reply with the response to your message and, in case it identifies potential memories to be stored, they will be stored either as semantic or episodic memories. You can see the stored memories on the "Memory Management" sidebar.
On top of that, with each message, the system will also return performance metrics.
If you refresh the page, you will see that all memories and the chat history are gone.
If you reenter the same user ID, the long-term memories will be reloaded on the sidebar and the short-term memory (the chat history) will be reloaded as well:
Finally, if we clear the chat, the chatbot won't have access to the short-term memory anymore. But it will still have access to the long-term memory. If we ask something related to one of the long-term memories, we will see that the chatbot retained this information:
RedisInsight is a graphical tool developed by Redis to help developers and administrators interact with and manage Redis databases more efficiently. It provides a visual interface for exploring keys, running commands, analyzing memory usage, and monitoring performance metrics in real-time. RedisInsight supports features like full-text search, time series, streams, and vector data structures, making it especially useful for working with more advanced Redis use cases. With its intuitive UI, it simplifies debugging, optimizing queries, and understanding data patterns without requiring deep familiarity with the Redis CLI.
The Docker Compose file will also spin up an instance of Redis Insight. We can access it by going to localhost:5540:
If we go to Redis Insight, we will be able to see the data stored in Redis.
The short-term memory (chat history) is stored in a List data structure:
And the long-term memory is stored as JSON documents:
If we go to the workbench on the sidebar, and then run the FT.INDEX 'memoryIdx' command. We will be able to see the details of the schema that was created to efficiently search through the persisted memories:
Agents rely on short and long-term memory. Short-term memory is typically the chat history, the list of messages exchanged between the agent and the user or the context the agent is using during its current session.
To implement both of these memories, we're going to rely on the Spring AI Advisors API. Advisors are a way to intercept, modify, and enhance AI-driven interactions.
We are going to create two advisors. The first one, for shot-term memory, is going to rely on the ChatMemory abstraction provided by Spring AI while the second one is going to be implemented from scratch by ourselves.
To see how to implement short-term memory (or chat history) with Spring AI, refer to the dedicated recipe:
Long-term memory is the memory the agent needs to remember across different sessions or interactions. There are two types of long-term memory:
- Episodic: episodic memories are related to past events. Personal experiences and user-specific preferences. E.g. "User went to Paris in 2009 for his honeymoon"
- Semantic: semantic memories are general domain knowledge and facts. E.g. "Americans don't require a visa to travel to Paris"
Different from short-term memory, not all of this memory needs to be accessed at every interaction, and not every information must be remembered in the long term. Because of that, we will rely on semantic search to retrieve long-term memory and LLMs to extract them from current interactions.
The application uses Spring AI's RedisVectorStore to store and search vector embeddings of memories.
@Bean
fun memoryVectorStore(
embeddingModel: EmbeddingModel,
jedisPooled: JedisPooled
): RedisVectorStore {
return RedisVectorStore.builder(jedisPooled, embeddingModel)
.indexName("longTermMemoryIdx")
.contentFieldName("content")
.embeddingFieldName("embedding")
.metadataFields(
RedisVectorStore.MetadataField("memoryType", Schema.FieldType.TAG),
RedisVectorStore.MetadataField("metadata", Schema.FieldType.TEXT),
RedisVectorStore.MetadataField("userId", Schema.FieldType.TAG),
RedisVectorStore.MetadataField("createdAt", Schema.FieldType.TEXT)
)
.prefix("short-term-memory:")
.initializeSchema(true)
.vectorAlgorithm(RedisVectorStore.Algorithm.HSNW)
.build()
}Let's break this down:
- Index Name:
longTermMemoryIdx- Redis will create an index with this name for searching memories - Content Field:
content- The raw memory content that will be embedded - Embedding Field:
embedding- The field that will store the resulting vector embedding - Metadata Fields:
memoryType: TAG field for filtering by memory type (EPISODIC or SEMANTIC)metadata: TEXT field for storing additional context about the memoryuserId: TAG field for filtering by user IDcreatedAt: TEXT field for storing the creation timestamp
Memories are stored as Spring AI Document objects with metadata:
val memory = Memory(
content = content,
memoryType = memoryType,
userId = userId ?: systemUserId,
metadata = validatedMetadata,
createdAt = LocalDateTime.now()
)
val document = Document(
content,
mapOf(
"memoryType" to memoryType.name,
"metadata" to validatedMetadata,
"userId" to (userId ?: systemUserId),
"createdAt" to memory.createdAt.toString()
)
)
memoryVectorStore.add(listOf(document))The memory service uses Spring AI's SearchRequest and FilterExpressionBuilder to perform vector similarity search with filters:
val b = FilterExpressionBuilder()
val filterList = mutableListOf<FilterExpressionBuilder.Op>()
// Add user filter
val effectiveUserId = userId ?: systemUserId
filterList.add(b.eq("userId", effectiveUserId))
// Add memory type filter if specified
if (memoryType != null) {
filterList.add(b.eq("memoryType", memoryType.name))
}
// Combine filters
val filterExpression = when (filterList.size) {
0 -> null
1 -> filterList[0]
else -> filterList.reduce { acc, expr -> b.and(acc, expr) }
}?.build()
// Execute search
val searchResults = memoryVectorStore.similaritySearch(
SearchRequest.builder()
.query(query)
.topK(limit)
.filterExpression(filterExpression)
.build()
)This performs a vector similarity search using:
- A semantic query that is embedded into a vector
- A topK setting to limit how many nearest matches to return
- A Redis filter expression to narrow down by memory type, user ID, and thread ID
We will implement two advisors: one for retrieval and another for recorder. These advisors will be plugged in our ChatClient and intercept every interaction with the LLM.
The retrieval advisor runs before your LLM call. It takes the user’s current message, performs a vector similarity search over Redis, and injects the most relevant memories into the system portion of the prompt so the model can ground its answer.
@Component
class LongTermMemoryRetrievalAdvisor(
private val memoryService: MemoryService,
) : CallAdvisor, Ordered {
companion object {
const val USER_ID = "ltm_user_id" // pass per-call
const val TOP_K = "ltm_top_k" // pass per-call (default 5)
}
override fun getOrder() = Ordered.HIGHEST_PRECEDENCE + 40
override fun getName() = "LongTermMemoryRetrievalAdvisor"
override fun adviseCall(req: ChatClientRequest, chain: CallAdvisorChain): ChatClientResponse {
val userId = (req.context()[USER_ID] as? String) ?: "system"
val k = (req.context()[TOP_K] as? Int) ?: 5
val query = req.prompt().userMessage.text
val memories = memoryService.retrieveRelevantMemories(query, userId = userId)
.take(k)
val memoryBlock = buildString {
appendLine("Use the MEMORY below if relevant. Keep answers factual and concise.")
appendLine("----- MEMORY -----")
memories.forEachIndexed { i, m -> appendLine("${i+1}. ${m.memory.content}") }
appendLine("------------------")
}
val enrichedPrompt = req.prompt().augmentSystemMessage { sys ->
val existing = sys.text
sys.mutate()
.text(
buildString {
appendLine(memoryBlock)
if (existing.isNotBlank()) {
appendLine()
append(existing)
}
}
).build()
}
val enrichedReq = req.mutate()
.prompt(enrichedPrompt)
.build()
return chain.nextCall(enrichedReq)
}
}The recorder advisor runs after the assistant responds. It looks at the last user message and the assistant’s reply, asks the model to extract atomic, useful facts (episodic or semantic), deduplicates them, and stores them in Redis.
@Component
class LongTermMemoryRecorderAdvisor(
private val memoryService: MemoryService,
private val chatModel: ChatModel
) : CallAdvisor, Ordered {
data class MemoryCandidate(val content: String, val type: MemoryType, val userId: String?)
data class ExtractionResult(val memories: List<MemoryCandidate> = emptyList())
private val extractorConverter = BeanOutputConverter(ExtractionResult::class.java)
override fun getOrder(): Int = Ordered.HIGHEST_PRECEDENCE + 60
override fun getName(): String = "LongTermMemoryRecorderAdvisor"
override fun adviseCall(req: ChatClientRequest, chain: CallAdvisorChain): ChatClientResponse {
// 1) Proceed with the normal call (other advisors may have enriched the prompt)
val res = chain.nextCall(req)
// 2) Build extraction prompt (user + assistant text of *this* turn)
val userText = req.prompt().userMessage.text
val assistantText = res.chatResponse()?.result?.output?.text
// 3) Ask the model to extract long-term memories as structured JSON
val schemaHint = extractorConverter.jsonSchema // JSON schema string for the POJO
val extractSystem = """
You extract LONG-TERM MEMORIES from a dialogue turn.
A memory is either:
1. EPISODIC MEMORIES: Personal experiences and user-specific preferences
Examples: "User prefers Delta airlines", "User visited Paris last year"
2. SEMANTIC MEMORIES: General domain knowledge and facts
Examples: "Singapore requires passport", "Tokyo has excellent public transit"
Only extract clear, factual information. Do not make assumptions or infer information that isn't explicitly stated.
If no memories can be extracted, return an empty array.
The instance must conform to this JSON Schema (for validation, do not output it):
$schemaHint
Do not include code fences, schema, or properties. Output a single-line JSON object.
""".trimIndent()
val extractUser = """
USER SAID:
$userText
ASSISTANT REPLIED:
$assistantText
Extract up to 5 memories with correct type; set userId if present/known.
""".trimIndent()
val options: ChatOptions = OpenAiChatOptions.builder()
.responseFormat(ResponseFormat.builder().type(ResponseFormat.Type.JSON_OBJECT).build())
.build()
val extraction = chatModel.call(
Prompt(
listOf(
UserMessage(extractUser),
SystemMessage(extractSystem)
),
options
),
)
val parsed = extractorConverter.convert(extraction.result.output.text ?: "")
?: ExtractionResult()
// 4) Persist memories (MemoryService handles dedupe/thresholding)
val userId = (req.context["ltm_user_id"] as? String) // optional per-call param
parsed.memories.forEach { m ->
val owner = m.userId ?: userId
memoryService.storeMemory(
content = m.content,
memoryType = m.type,
userId = owner
)
}
return res
}
}In our ChatConfig class, we will configure our ChatClient as:
@Bean
fun chatClient(
chatModel: ChatModel,
chatMemory: ChatMemory,
longTermRecorder: LongTermMemoryRecorderAdvisor,
longTermMemoryRetrieval: LongTermMemoryRetrievalAdvisor
): ChatClient {
return ChatClient.builder(chatModel)
.defaultAdvisors(
MessageChatMemoryAdvisor.builder(chatMemory).build(),
longTermRecorder,
longTermMemoryRetrieval
).build()
}The agent is configured with a system prompt that explains its capabilities and access two different types of memory:
@Bean
fun travelAgentSystemPrompt(): Message {
val promptText = """
You are a travel assistant helping users plan their trips. You remember user preferences
and provide personalized recommendations based on past interactions.
You have access to the following types of memory:
1. Short-term memory: The current conversation thread
2. Long-term memory:
- Episodic: User preferences and past trip experiences (e.g., "User prefers window seats")
- Semantic: General knowledge about travel destinations and requirements
Always be helpful, personal, and context-aware in your responses.
Always answer in text format. No markdown or special formatting.
""".trimIndent()
return SystemMessage(promptText)
}Since the advisors have been plugged in the ChatClient itself, we don't need to worry about managing memory ourselves when interacting with the LLM. The only thing we need to make sure is that with every interaction we send the expected parameters, namely the session or user ID, so that the advisors know which history to look at.
fun sendMessage(
message: String,
userId: String,
): ChatResult {
// Use userId as the key for conversation history and long-term memory
log.info("Processing message from user $userId: $message")
val response = chatClient
.prompt(
Prompt(
travelAgentSystemPrompt,
UserMessage(message)
)
)
.advisors { it
.param(ChatMemory.CONVERSATION_ID, userId)
.param("ltm_user_id", userId)
}
.call()
return ChatResult(
response = response.chatResponse()!!
)
}This orchestration allows the agent to maintain context across multiple interactions, personalize responses based on user history, and continuously learn from conversations.








