Skip to main content
Tutorial 13 min read

MemClaw Plugin: Add Persistent Memory to OpenClaw Agents

Learn how MemClaw adds long-term memory to OpenClaw agents while cutting token costs by 91%. Step-by-step setup and best practices.

Originally published:

Dev.to by Sopaco

What You'll Learn

  • Why OpenClaw agents lose context between sessions and how to solve it
  • How MemClaw's three-layer memory architecture reduces token consumption by 91%
  • Step-by-step installation and configuration of the MemClaw plugin
  • Best practices for managing agent memory across sessions and conversations
  • Real-world use cases and troubleshooting techniques

Introduction: The Memory Problem in AI Agents

OpenClaw agents face a fundamental limitation: when a conversation ends, all context vanishes. API keys must be re-entered, project goals are forgotten, and previous mistakes repeat in new sessions. This isn't a bug—it's the reality of stateless LLM interactions hitting their context window limits.

MemClaw, a plugin built on Cortex Memory architecture, solves this by adding persistent, multi-layered memory to OpenClaw agents. The results are measurable: 68.42% accuracy on the LoCoMo benchmark (beating OpenViking's 52.08%), 91% token savings compared to traditional approaches, and 18x better efficiency per thousand tokens.

Prerequisites

System Requirements

  • OpenClaw framework (v0.8+) installed and running
  • Node.js 18+ for OpenClaw gateway
  • Docker (to run Qdrant vector database)
  • At least 2GB free disk space for memory storage
  • Familiarity with OpenClaw agent configuration and plugins

Knowledge Prerequisites

You should understand basic OpenClaw concepts: agents, skills, context windows, and plugin architecture. Familiarity with vector databases and semantic search is helpful but not required—MemClaw abstracts this complexity.

Why OpenClaw Agents Need External Memory

The Three Failure Modes

Scenario 1: Repeated Context Loss. Users must re-enter API keys, credentials, and preferences in every new session. An agent asks for the Alibaba Cloud OSS AccessKey on Day 1, then asks again on Day 2 when the session resets.

Scenario 2: Conversation Amnesia. After 50+ rounds of dialogue establishing project goals and constraints, the agent forgets the original objective. When asked to "design the core architecture based on my earlier goal," it responds: "What goal did you mention?"

Scenario 3: Repeating Mistakes. An agent encounters a skill parameter error, learns the correct format during troubleshooting, but repeats the same error in the next session because it never retained the lesson.

Root Cause: Two Types of Memory Loss

OpenClaw's native memory system has two limitations. First, context window exhaustion: older messages get pushed out when the window fills, even within a single session. Second, session reset: all conversational state (including learned patterns) erases when the session ends.

MemClaw vs. Traditional Memory Solutions

How OpenViking and LanceDB Fall Short

The community's existing solution—plugins like OpenViking paired with vector databases like LanceDB—adds long-term memory but creates a new problem: loading all retrieved memories consumes massive tokens. A single query might fetch 100 memory records, loading complete context for each, resulting in ~33,490 tokens per question (per LoCoMo benchmark data).

OpenClaw's native memory system fares worse at 35.65% benchmark accuracy with ~15,982 tokens per question.

Cortex Memory's Three-Layer Architecture

MemClaw solves the "memory precision vs. token cost" paradox through intelligent layering:

  • L0 (Summary Layer): 100-token abstract summaries of each memory. Fast to load, used for initial filtering.
  • L1 (Overview Layer): 2,000-token mid-detail snapshots. Retrieved only if L0 matches are promising.
  • L2 (Full Content Layer): Complete memory with full context. Loaded only when truly necessary.

Traditional approaches load all L2 content upfront. MemClaw's progressive retrieval means querying 100 memories costs only ~10,000 tokens total (100×100 L0 summaries plus selective L1/L2 layers), versus the ~200,000+ tokens of naive full-content loading.

Benchmark Performance (LoCoMo):

  • MemClaw: 68.42% accuracy, ~2,900 tokens/question, 23.6 score per 1K tokens
  • OpenViking + OpenClaw: 52.08% accuracy, ~2,769 tokens/question, 18.8 per 1K tokens
  • OpenClaw + LanceDB: 44.55% accuracy, ~33,490 tokens/question, 1.3 per 1K tokens

Step 1: Install Qdrant Vector Database

MemClaw requires a vector database for semantic search. Qdrant is the default choice—lightweight, fast, and available both locally and as a managed service.

Using Docker (Recommended):

docker run -d \n  --name qdrant \n  -p 6333:6333 \n  -p 6334:6334 \n  -v qdrant_storage:/qdrant/storage \n  qdrant/qdrant:latest

This starts Qdrant on localhost:6333 with persistent storage. The API is immediately available.

Verify Installation:

curl http://localhost:6333/health

You should receive a JSON response with status "ok". If Qdrant is unavailable, MemClaw will fail at runtime.

For Sensitive Data: Run Qdrant on an isolated internal network, not exposed to the internet. All memories remain local unless explicitly configured otherwise.

Step 2: Install the MemClaw Plugin

With Qdrant running, install MemClaw into your OpenClaw environment:

openclaw plugins install @memclaw/memclaw

This downloads the plugin package and registers it with OpenClaw. Installation takes 30–60 seconds depending on network speed.

Verify Installation:

openclaw plugins list

You should see @memclaw/memclaw in the output with status "installed".

Step 3: Configure MemClaw in openclaw.json

MemClaw is disabled by default. Enable it and configure its behavior by editing your openclaw.json configuration file:

Configuration Parameters Explained:

  • qdrantUrl: Connection string for Qdrant. Use http://localhost:6333 for local development.
  • apiKey: Authentication token if Qdrant requires it. Set to null for local unsecured instances.
  • collectionName: Namespace for storing memories. Use different names for different projects.
  • enableAutoCommit: Automatically extract and store memories after conversations. Set to true for production.
  • autoCommitInterval: Time in milliseconds before auto-commit triggers (3600000 = 1 hour).
  • maxMemoriesPerQuery: Maximum number of memories to retrieve per agent query. Lower = faster, higher = more context.
  • l0TokenLimit and l1TokenLimit: Token budgets for each layer. Default values are optimized for most use cases.

Critical Step: Set memorySearch: false under agents.defaults to disable OpenClaw's native memory in favor of MemClaw. Running both simultaneously causes conflicts.

Step 4: Restart OpenClaw and Verify

Restart the OpenClaw gateway for configuration changes to take effect:

openclaw gateway restart

MemClaw initializes background services on startup. This takes 15–30 seconds for the first run.

Check Initialization Logs:

openclaw logs memclaw

Look for messages like:

  • "MemClaw plugin initialized"
  • "Connected to Qdrant at http://localhost:6333"
  • "Memory collection created or verified"

If you see errors about Qdrant connectivity, verify the database is running and accessible.

Step 5: Use MemClaw Memory Tools in Agents

Once installed, agents automatically gain access to four new memory tools. Use them in your agent skills:

Tool 1: cortex_search

Purpose: Retrieve memories using semantic search with layer-aware control.

Usage:

const results = await agent.call('cortex_search', {
  query: 'OSS API credentials',
  maxResults: 5,
  minLayer: 0,
  maxLayer: 2
});

This searches for memories matching "OSS API credentials" and returns up to 5 results, starting from L0 summaries and progressing to L2 if needed. Results include relevance scores and which layer each memory came from.

Tool 2: cortex_recall

Purpose: Retrieve a specific memory by ID with full context.

Usage:

const memory = await agent.call('cortex_recall', {
  memoryId: 'mem_1234567890',
  includeFullContent: true
});

Use this when you've already identified a specific memory and need complete context. Faster than search for known memories.

Tool 3: cortex_add_memory

Purpose: Explicitly store important information for future retrieval.

Usage:

await agent.call('cortex_add_memory', {
  content: 'User OSS AccessKey: xxxxx, SecretKey: yyyyy. Region: cn-hangzhou.',
  category: 'credentials',
  tags: ['oss', 'alibaba-cloud', 'user-preference'],
  ttl: 0
});

The ttl (time-to-live) parameter in milliseconds controls memory retention; 0 means permanent. Tags improve search relevance.

Tool 4: cortex_commit_session

Purpose: Explicitly trigger memory extraction at conversation milestones.

Usage:

await agent.call('cortex_commit_session', {
  summary: 'User goal: build B2B sales tool. Tech stack: Node.js + PostgreSQL.',
  learnings: ['Avoid calling skill X with parameter format A']
});

By default, auto-commit extracts memories periodically. Call this explicitly after important conversation milestones to ensure capture.

Real-World Use Case: Cross-Session API Key Management

Problem

Your agent helps users upload files to Alibaba Cloud OSS. Every new session, the agent asks: "What is your AccessKey?" Users must re-enter credentials repeatedly.

Solution with MemClaw

Session 1: User provides OSS credentials. Your agent skill calls:

await agent.call('cortex_add_memory', {
  content: `OSS Configuration:\nEndpoint: ${endpoint}\nAccessKeyId: ${accessKey}\nAccessKeySecret: ${secret}\nBucket: ${bucket}`,
  category: 'user_credentials',
  tags: ['oss', 'alibaba', 'file-upload']
});

Session 2 (next day): User requests file upload. Your agent skill starts with:

const creds = await agent.call('cortex_search', {
  query: 'OSS AccessKey configuration',
  maxResults: 1
});

if (creds.length > 0) {
  // Use retrieved credentials
  const config = parseMemoryContent(creds[0].content);
  // Proceed directly to upload
} else {
  // Prompt user only if no memory found
  askUserForCredentials();
}

The agent retrieves stored credentials automatically. No re-entry needed. MemClaw's layered architecture means this search costs ~100 tokens (just the L0 summary), not the full credential context.

Use Case 2: Preserving Long Conversation Context

Problem

After 100 rounds of conversation establishing project requirements, constraints, and architecture decisions, the agent forgets the original goal when context window fills.

Solution with MemClaw

Explicit memory extraction at key conversation points:

// After user states project goal
await agent.call('cortex_add_memory', {
  content: 'Project Goal: Build a B2B SaaS sales platform targeting mid-market enterprises. Key constraints: must handle 10k concurrent users, GDPR-compliant, on-premise deployment option.',
  category: 'project_definition',
  tags: ['project-goal', 'architecture-constraint']
});

// Later, after discussing tech decisions
await agent.call('cortex_add_memory', {
  content: 'Approved tech stack: Node.js backend, React frontend, PostgreSQL for operational data, Redis for caching. Rejected: Go (team not familiar), microservices (premature optimization).',
  category: 'technical_decision',
  tags: ['tech-stack', 'architecture']
});

After 100 rounds, when the agent needs to "design the core architecture," it retrieves these foundational memories with one call to cortex_search('project goal and tech stack'), ensuring decisions remain consistent with the original vision.

Troubleshooting: Common Issues and Solutions

Issue 1: "Qdrant connection refused"

Symptom: MemClaw fails to initialize with error: "Connection refused on http://localhost:6333"

Diagnosis: Qdrant is not running or listening on the wrong port.

Solution:

  • Verify Docker is running: docker ps | grep qdrant
  • If not running, restart: docker start qdrant (or re-run the initial docker run command)
  • Check the correct URL in openclaw.json matches your Qdrant deployment
  • For remote Qdrant, verify network connectivity: curl http://your-qdrant-host:6333/health

Issue 2: "cortex_search returns no results despite storing memories"

Symptom: Memories are added but searches return empty results.

Diagnosis: Semantic embedding mismatch or memories not yet committed to the vector database.

Solution:

  • Check auto-commit is enabled: "enableAutoCommit": true in config
  • Manually commit with cortex_commit_session to force memory processing
  • Verify search queries use similar language to stored memory content. "OSS AccessKey" will match "OSS API credentials" but "password" may not.
  • Check maxMemoriesPerQuery isn't set to 0, which disables retrieval

Issue 3: "Token usage unexpectedly high despite using MemClaw"

Symptom: Memory usage hasn't decreased; agent still consumes similar tokens as before.

Diagnosis: Agent is retrieving too many full (L2) memories, or layer limits are set too high.

Solution:

  • Lower maxMemoriesPerQuery from 10 to 5 to reduce retrieved memories
  • Reduce l1TokenLimit and l2TokenLimit to force more aggressive filtering
  • Review agent skills: ensure they're using cortex_search (layer-aware) not loading full contexts manually
  • Add monitoring: log the layer returned by search results to verify L0 filtering is working

Issue 4: "Multi-tenant isolation: memories from Project A visible in Project B"

Symptom: Running multiple projects with shared MemClaw, but memories cross-contaminate.

Diagnosis: All projects using the same collection name or tenant ID.

Solution:

Run separate MemClaw instances or use collection-level isolation:

{
  "plugins": {
    "entries": {
      "memclaw": {
        "enabled": true,
        "config": {
          "collectionName": "openclaw_memories_projecta"
        }
      }
    }
  }
}

Use different collection names per project. Or implement tenant-based filtering in memory tags and search queries.

Best Practices for Production MemClaw Deployments

1. Explicit Memory Tagging Strategy

Don't rely on automatic memory extraction alone. Implement a consistent tagging schema:

  • category: broad type (credentials, project_goal, technical_decision, lesson_learned)
  • tags: specific facets (user-preference, performance-constraint, error-recovery)

This ensures searches return relevant results. "Query by tag:user-preference" is more precise than generic search.

2. Memory Lifecycle Management

Not all memories are permanent. Implement TTL policies:

await agent.call('cortex_add_memory', {
  content: 'User temporary task: analyze Q3 sales report',
  category: 'task',
  ttl: 86400000 // 24 hours
});

await agent.call('cortex_add_memory', {
  content: 'User permanent preference: always query north-america region',
  category: 'user_preference',
  ttl: 0 // Forever
});

Short-lived memories prevent clutter; permanent ones preserve important context.

3. Regular Maintenance

MemClaw includes a maintenance tool for housekeeping:

await agent.call('cortex_maintenance', {
  action: 'cleanup',
  pruneExpired: true,
  rebuildIndex: false
});

Run this weekly to remove expired memories and keep the vector index optimized. rebuildIndex: true should be used sparingly (once per month) as it's resource-intensive.

4. Monitoring and Observability

Track memory system health:

  • Memory Count: Monitor total memories stored; explosive growth suggests extracting noise
  • Search Hit Rate: Track percentage of queries returning results vs. empty results; low rates indicate tagging or query issues
  • Average Retrieval Cost: Monitor tokens per search; should trend toward L0-dominated searches as the system matures
  • Qdrant Index Size: Monitor disk usage; large indices slow down searches

5. Security: Data Privacy by Design

MemClaw stores all data locally by default. To maintain privacy in production:

  • Run Qdrant on an isolated internal network, never expose to the internet
  • Encrypt the qdrant_storage Docker volume if storing sensitive credentials
  • Implement access controls: agent skills should only read memories they created or are explicitly permitted to access
  • Never log raw memory content; use IDs and hashes for debugging instead

6. Performance: Tuning for Your Workload

For High-Frequency Retrieval (e.g., API key lookup every request):

  • Lower maxMemoriesPerQuery to 3–5 for speed
  • Use exact tag matching instead of semantic search
  • Cache frequently accessed memories at the agent level

For High-Precision Retrieval (e.g., architectural decision lookup across 100+ memories):

  • Increase maxMemoriesPerQuery to 15–20
  • Use semantic search; invest in high-quality memory content
  • Accept 50–100ms latency for accuracy

Migrating from OpenClaw Native Memory to MemClaw

If you've used OpenClaw's built-in memory system, MemClaw provides a migration tool:

await agent.call('cortex_migrate', {
  source: 'openclaw_native',
  target: 'memclaw',
  preserveIds: true
});

This converts existing memories to MemClaw's format and stores them in Qdrant. It's a one-time operation; run it once and verify results before disabling native memory.

Next Steps and Advanced Configurations

Enable Remote Qdrant for Scalability

For production with multiple agents, use managed Qdrant cloud or a dedicated server:

"qdrantUrl": "https://your-qdrant-cloud-instance.com",
"apiKey": "your-api-key"

Integrate with Claude Desktop (MCP Protocol)

MemClaw supports the Model Context Protocol, allowing direct integration with Claude Desktop:

{
  "mcpServers": {
    "memclaw": {
      "command": "memclaw-mcp-server",
      "args": ["--config", "/path/to/memclaw-config.json"]
    }
  }
}

This enables Claude to search and manage memories directly without intermediate agent layers.

Implement Custom Memory Extractors

For domain-specific memory extraction, extend the plugin with custom logic to identify and store memories automatically based on conversation patterns.

Summary: Why MemClaw Changes the Game

MemClaw solves the core tradeoff that has plagued AI agents: you can't have both memory precision and token efficiency—until now. Its three-layer architecture retrieves relevant context progressively, loading only what's needed. This yields measurable advantages: 68.42% benchmark accuracy, 91% token savings versus naive approaches, and 18x better cost-per-point efficiency.

For developers building production OpenClaw agents, MemClaw is essential infrastructure. It eliminates context loss between sessions, prevents repeated mistakes, and preserves long-term conversational goals—all while reducing API costs. Setup takes 15 minutes; the value compounds from day one.

Getting Started: Install Qdrant today, add the MemClaw plugin, and configure it in 15 minutes. Your agents will remember.

Share:

Original Source

https://dev.to/sopaco/cortex-memorygei-openclaw-zhuang-shang-chao-ji-da-nao-token-cheng-ben-bao-jiang-91-2g72

View Original

Last updated: