‘Observational memory’ cuts AI agent costs 10x and outscores RAG on long-context benchmarks

by | Feb 10, 2026 | Technology

RAG isn’t always fast enough or intelligent enough for modern agentic AI workflows. As teams move from short-lived chatbots to long-running, tool-heavy agents embedded in production systems, those limitations are becoming harder to work around.In response, teams are experimenting with alternative memory architectures — sometimes called contextual memory or agentic memory — that prioritize persistence and stability over dynamic retrieval.One of the more recent implementations of this approach is “observational memory,” an open-source technology developed by Mastra, which was founded by the engineers who previously built and sold the Gatsby framework to Netlify.Unlike RAG systems that retrieve context dynamically, observational memory uses two background agents (Observer and Reflector) to compress conversation history into a dated observation log. The compressed observations stay in context, eliminating retrieval entirely. For text content, the system achieves 3-6x compression. For tool-heavy agent workloads generating large outputs, compression ratios hit 5-40x.The tradeoff is that observational memory prioritizes what the agent has already seen and decided over searching a broader external corpus, making it less suitable for open-ended knowledge discovery or compliance-heavy recall use cases.The system scored 94.87% on LongMemEval using GPT-5-mini, while maintaining a completely stable, cacheable context window. On the standard GPT-4o model, observational memory scored 84.23% compared to Mastra’s own RAG implementation at 80.05%.”It has this great characteristic of being both simpler and it is more powerful, like it scores better on the benchmarks,” Sam Bhagwat, co-founder and CEO of Mastra, told VentureBeat.How it works: Two agents compress history into observationsThe architecture is simpler than traditional memory systems but delivers better results. Observational memory divides the context window into two blocks. The first contains observations — compressed, dated notes extracted from previous conversations. The second holds raw message history …

Article Attribution | Read More at Article Source