Context Window Tax in AI Agents: Cutting 1,800 Tokens

Medium by Mariano Mattei March 6, 2026

Multi-agent AI systems suffer from a silent performance degradation that has nothing to do with model capabilities: bloated context windows filled with procedural documentation, duplicate configuration blocks, and reference data that should never have been loaded into memory in the first place. A production deployment of a five-agent OpenClaw system revealed that nearly 1,800 tokens of always-loaded context were doing nothing except diluting the behavioral rules that actually mattered—and the team built a classification system to prevent the problem from recurring.

Mariano Mattei, VP of AI Innovation at Mattei Systems, documented how his team's OpenClaw deployment—consisting of Klaus (orchestrator), Valentina (social media), Felix (security), Blue (coding), and Mildred (calendar/email)—had accumulated context bloat over three months of iteration. Valentina's identity file had grown to 222 lines, Felix had two copies of his agent definition file, and the same three-line memory discipline block appeared identically across all five agents. The result was observable degradation: Felix missing privilege constraints buried below 100 lines of procedure, Valentina's content voice drifting because style rules were sandwiched between analytics runbooks, and Klaus applying calendar defaults inconsistently because the same policy was stated twice with different wording.

The Context Window Tax

OpenClaw agents load their identity files (AGENT.md and SOUL.md) into the system prompt on every execution. Every line in those files competes for the same finite context window. When an AGENT.md file contains 80 lines of behavioral rules, each rule carries weight. When it grows to 222 lines with half the content being script paths, cron schedules, and platform configuration tables, the model must prioritize—and sometimes prioritizes incorrectly.

The team conducted a systematic audit using a four-bucket classification taxonomy for every section in every agent identity file:

RULE: Behavioral constraints needed on every run (belongs in AGENT.md or SOUL.md)
PROCEDURE: Repeatable workflows needed only for specific tasks (belongs in skills/ directory, loaded on demand)
FACT: Reference data or current state (belongs in memory/ directory)
DUPLICATE: Content that exists elsewhere (should be deleted or consolidated)

Audit Findings Across Five Agents

The audit revealed three categories of waste. Cross-agent duplicates included the Memory Discipline block appearing identically in all five SOUL.md files (260 tokens of pure duplication), and Felix maintaining both AGENT.md (128 lines, current) and AGENTS.md (112 lines, older version). Klaus had two sections stating the same calendar policy in different words, consuming nine lines that should have been three.

The largest category was procedural content embedded in identity files. Valentina's AGENT.md contained 80 lines of analytics procedures (platform tables, collector commands, data formats) that were only relevant during analytics operations—representing 36% of her always-loaded context. Klaus had 12 lines of newsletter workflow that executed once weekly. Felix had 8 lines of privilege audit steps and 14 lines of tool reference documentation. Blue, notably, was already clean at 86 lines total.

Reference data in identity files included Valentina's 18 lines of approved voice examples, 12 lines of model routing tables, and 25 lines of news scout configuration. This data belonged in the memory/ directory where it could be queried when relevant, not loaded into every execution.

The Extraction Pattern

For every procedure section, the team created a skill file and replaced the always-loaded content with a one-line pointer. Valentina's 38-line analytics operations table moved to skills/analytics-operations.md and was replaced with "See skills/analytics-operations.md for full platform details." The same pattern applied to newsletter operations (12 lines → skills file), news scout workflow (25 lines → skills file), and topic posting workflow (30 lines → skills file).

The resulting directory structure for each agent became:

agents/valentina/
  AGENT.md          # Identity + rules (always loaded)
  SOUL.md           # Values + discipline (always loaded)
  memory/           # Facts, state (loaded on demand)
    model-routing.md
    approved-voice.md
  skills/           # Procedures (loaded on demand)
    analytics-operations.md
    newsletter-operations.md
  tools/            # Scripts
    news_scout.py

The net reduction: approximately 200 lines and 1,800 tokens removed from always-loaded context. Klaus dropped from 105 to 63 lines, Valentina from 222 to 82 lines, Felix from 128 to 98 lines (after removing the duplicate file), and Mildred from 62 to 48 lines.

Preventing Recurrence Through Write Discipline

Cleaning up once solves the immediate problem. Preventing the mess from returning requires architectural discipline. The team added a Write Discipline block to every agent's SOUL.md that enforces classification at write time:

Before writing anything to SOUL.md or AGENT.md, classify it: RULE (behavioral constraint) → SOUL.md or AGENT.md; PROCEDURE (repeatable steps) → skills/[name].md, add pointer in AGENT.md; FACT (reference data) → memory/; DUPLICATE → don't write it, update the canonical location. If SOUL.md exceeds ~80 lines, run a classification pass on anything added since last audit.

This pattern provides three benefits: classification happens before writing (not during cleanup sprints months later), skills emerge naturally as agents document repeatable workflows, and the 80-line threshold acts as a canary triggering self-audit when identity files grow.

Implications for AI Developers

This work highlights a structural problem in agentic AI systems that persists across frameworks. As agents accumulate capabilities and procedures, their identity files bloat unless there's active discipline separating always-needed behavioral rules from sometimes-needed procedural knowledge. The cost isn't immediately visible—agents don't crash or throw errors—but performance degrades as critical rules compete with irrelevant context.

The classification taxonomy (RULE/PROCEDURE/FACT/DUPLICATE) provides a language-agnostic pattern applicable to any agent system that loads identity or configuration into the system prompt. The pattern is particularly relevant for prompt engineering in production environments where agents operate autonomously over extended periods and accumulate organic documentation.

For teams building multi-agent systems, the write discipline pattern offers a preventive architecture: agents self-organize their knowledge base as they work, keeping identity files lean without manual intervention. The 80-line threshold is adjustable but provides a concrete trigger for self-audit rather than relying on human observation of degraded performance.

Analysis based on article by Mariano Mattei, VP of AI Innovation at Mattei Systems. Full audit template available in the ai-infrastructure-playbooks/agent-context-optimization repository.

Read original