What Sets Them Apart
Both agentmemory and claude-mem tackle the same fundamental problem — AI coding agents that forget everything between sessions — but they approach persistent memory from opposite directions. agentmemory is a protocol-first, agent-agnostic MCP server designed to work across the entire coding agent ecosystem with hybrid vector-graph retrieval and a benchmark behind it. claude-mem is a lightweight, Claude Code-native extension that prioritises zero-configuration simplicity for a single agent and a single workflow.
agentmemory and claude-mem at a Glance
agentmemory ships as an npm package (@agentmemory/agentmemory) that runs as an MCP server, exposes 51 MCP tools, 12 hooks, and 4 skills, and works with any agent that speaks MCP, REST, or hook protocols. Out of the box that means Claude Code, Codex CLI, Cursor, Windsurf, Cline, OpenCode, Kilo Code, Hermes, OpenClaw, pi, and Gemini CLI. Memory is stored in a local markdown file system with a vector index plus a knowledge graph layered on top, and the project ships a published LongMemEval-S recall score of 95.2%.
claude-mem is purpose-built for Claude Code and optimises for the path of least friction in that environment. Installation drops in as a Claude Code memory layer with minimal configuration, and the abstractions match the way Claude Code thinks about context already. Cross-agent support is not the point; the trade-off is zero setup overhead and a workflow that feels native to a single tool rather than universal across all of them.
The shared ground matters. Both are open source, both keep memory data on the local machine, neither requires a cloud account or a hosted database, and neither charges money. The choice is not about cost or privacy — both score equally on those axes — but about whether the agent surface you care about is Claude Code only or the broader MCP ecosystem.
Memory Architecture and Recall Quality
agentmemory's hybrid vector-plus-graph architecture is the technical bet. Pure vector search retrieves memories by semantic similarity, which works for fuzzy queries but loses relational structure — you can find a memory about a function but not necessarily memories about everything that calls it. The knowledge graph adds explicit relationships, so retrieval can follow chains: a query about a bug surfaces the related fix attempts, the affected files, and the decisions made along the way.
The published 95.2% recall on LongMemEval-S is what justifies the complexity. LongMemEval-S is a benchmark built specifically to measure memory quality in long agent conversations, the exact use case persistent memory exists for, and 95.2% is at the upper end of published numbers. The 92% context-token reduction claim says the recall does not come from dumping everything into the prompt — it comes from retrieving only the relevant slice.
claude-mem does not publish a benchmark score and does not need to. The retrieval is tuned for Claude Code's specific context window and tool-call patterns, which means it optimises for the actual workflow rather than a generic benchmark. The trade-off is invisible quality: the system works well in the environment it was designed for, but there is no published metric to compare against agentmemory's number, and no architecture to scale across agent surfaces.