OpenClaw Token Optimization Playbook: Cut AI Agent Costs by 97%

Layer 1: Architecture Fix (80% Savings)

The Core Problem: Default OpenClaw loads full context files, entire session history, and runs heartbeats every 30 minutes - all through expensive API calls.

🚨 Discovery Method: Only found through token audit after hitting rate limits. Demo usage never reveals this architectural waste. Production use + daily monitoring required.

Issue #1: Context File Bloat

Problem: Every heartbeat, every message, every prompt loads ALL context files (50KB → 75KB → 100KB+ as memory grows exponentially).

Cost Impact: 2-3M tokens/day while sitting idle.

Solution: Stop loading context files on every message.

Configure selective context loading

Issue #2: Session History Tax

Problem: Slack/WhatsApp integration compiles ENTIRE session history on every API call (111KB text blob observed in audit).

Cost Impact: 1M+ tokens per prompt when using messaging platforms.

Solution: Create "new session" command that dumps history but saves to memory for recall.

Implement session cleanup command

Issue #3: Heartbeat API Waste

Problem: System pings every 30 minutes to check tasks - loads full context each time through paid API.

Cost Impact: On Opus: $5/day just sitting idle. On Sonnet: $2-3/day idle.

Solution: Move heartbeats to local LLM (Olama) = zero API cost.

Install Olama and configure heartbeat routing

📋 Step-by-Step: Context Management Configuration

1. Access your config file location 2. Find agents.default_model section 3. Add context loading rules: - Don't load on heartbeat - Don't load full history on prompt - Selective loading based on task type 4. Test with token audit before/after 5. Verify context size: 50KB → ~5KB on typical operations

Context management configured

📋 Step-by-Step: Session Cleanup Command

1. Create custom command: "new_session" 2. Command logic: - Dump current Slack/WhatsApp session - Save to memory for future recall - Clear session buffer 3. Usage: Type "new session" before expensive operations 4. Result: 111KB → 0KB session bloat 5. Memory still accessible when needed

Session cleanup command created

📋 Step-by-Step: Local LLM Heartbeat Setup

1. Install Olama (latest version) 2. Add to config file: { "name": "olama", "model": "olama-latest", "use_for": ["heartbeat", "file_organization"] } 3. Update heartbeat routing logic 4. Test heartbeat execution 5. Verify: Zero API calls during idle state

Olama installed and heartbeat routed

✅ Layer 1 Success Metric: Idle daily cost drops from $2-3 to $0. Context loading verified at ~5KB vs 50KB+. Token audit shows zero heartbeat API calls.

Layer 2: Multi-Model Routing (15% Additional Savings)

The Insight: 85% of AI agent tasks are "brainless" (file management, list building, data collection) - yet default setups use Opus/Sonnet for everything at 10-50x cost.

🧠 Key Discovery: Most users assume one model per agent. Reality: You can run 3-4 models simultaneously with automatic escalation on failures.

Task Complexity Framework

Task Type	Model	Cost/1M Tokens	Examples
Brainless	Olama (Local)	$0	File org, CSV compile, folder structure, heartbeats
Low-Complexity	Haiku	$0.25	Data collection, web scraping, list building, basic formatting
Medium	Sonnet	$3.00	Writing, email drafting, research synthesis, code generation
High-Complexity	Opus	$15.00	Strategic reasoning, novel problem-solving, complex logic

Task complexity framework documented

Multi-Model Config Template

{ "agents": { "default_model": "haiku", "models": [ { "name": "olama", "use_for": ["heartbeat", "file_ops", "csv_compile"], "cost_tier": "free" }, { "name": "haiku", "use_for": ["data_collection", "web_scrape", "list_building"], "escalate_to": "sonnet", "cost_tier": "cheap" }, { "name": "sonnet", "use_for": ["writing", "email_draft", "research_synthesis"], "escalate_to": "opus", "cost_tier": "medium" }, { "name": "opus", "use_for": ["strategic_reasoning", "complex_logic"], "cost_tier": "premium" } ] } }

Multi-model routing configured

Escalation Logic

How it works: When a model hits a block (error, can't complete task, needs deeper reasoning), it automatically escalates to the next tier.

Default path: Olama → Haiku → Sonnet → Opus
Success: Task completed at lowest viable tier
Failure: Each escalation logged for calibration
Learning: Route prediction improves over time

Escalation logic implemented and tested

📊 Real-World Routing Example: 6-Hour Overnight Job

Task: B2B Lead Gen for Distressed Businesses

Requirements: 1,000 qualified leads with email validation, LinkedIn profiles, decision-maker identification, personalized cold outreach drafts.

Sub-Agent	Model	Task	Time
Agent 1-10	Haiku	Web scraping blogs, reading distressed business signals, finding contact info	4 hours
Agent 11-12	Sonnet	Writing personalized cold outreach emails and follow-up sequences	1.5 hours
Agent 13-14	Olama	Organizing files, compiling CSVs, structuring folder hierarchy	0.5 hours

Total Cost: $6 for 6-hour operation = $1/hour

vs Opus-Only: Would have been $150 (25x more expensive)

Output: 1,000 qualified leads + emails + follow-up sequences + organized deliverable

Studied real-world routing example

✅ Layer 2 Success Metric: Token audit shows 80-85% of operations running on Haiku or Olama. Overnight batch jobs cost $1-2/hour vs $20-25/hour on single model.

Layer 3: Cost-Aware Execution (2% Additional Savings)

The Final Layer: Embed token optimization awareness into the agent's workspace files and execution loop.

Workspace File Optimization

What to add: Success metrics that include "low token usage" and "run efficiently" as core objectives.

WORKSPACE FILE UPDATE: SUCCESS_METRICS: - Complete task accurately - Optimize for token efficiency - Pre-estimate token cost before execution - Post-execution cost reporting - Calibrate estimates vs actuals EXECUTION_PROTOCOL: 1. Before task: "This will use ~X tokens (est. $Y)" 2. During task: Monitor and adjust routing 3. After task: "Used X tokens (actual $Y) vs estimate" 4. Learning: Adjust future estimates based on variance

Workspace files updated with token optimization

Rate Limiting & Pacing

Problem: Initial Anthropic API limits at 30k tokens/min cause 429 errors when context bloat hits.

Solution: Built-in pacing logic that respects rate limits and queues operations.

Monitor token usage per minute
Queue operations when approaching limit
Batch similar operations together
Use cached tokens when available (95% cached = near-zero cost)

Rate limiting and pacing configured

Daily Token Audit Protocol

Why: Only way to catch architectural waste before it compounds.

What to check:

Context size on typical operations
Session history bloat from messaging platforms
Heartbeat API call frequency and cost
Model routing distribution (should be 80%+ Haiku/Olama)
Cached vs fresh token ratio (target 90%+ cached for repeated tasks)

Daily token audit scheduled and documented

🎯 Caching Strategy for Overnight Batch Jobs

Cached Token Exploitation

Discovery: Repeated operations use 95% cached tokens at drastically lower cost.

Example: 6-hour overnight job that cost $6 was 95% cached tokens. Without caching would have been ~$120.

How to maximize caching:

Structure repeated tasks consistently
Use templates for email drafts and outreach
Batch similar operations together
Schedule recurring tasks overnight for maximum cache benefit

Caching strategy implemented for batch jobs

✅ Layer 3 Success Metric: Agent pre-reports token estimates with 95%+ accuracy. Batch jobs show 90%+ cached token usage. Monthly spend predictable within 10%.

Dependencies & Future Unlock Map

Critical insight: Each optimization layer unlocks new capabilities. You can't skip to sub-agent orchestration without fixing the base architecture.

Level 0: Demo Phase (Default OpenClaw)

Cost: $50-150/month

Capability: Simple tasks, real-time interaction

Blocker: Unsustainable economics for production use

Can't proceed to production without optimization

Level 1: Architecture Fix (Context Management)

Unlocks: 80% cost reduction, idle state becomes free

Enables: Can now run overnight jobs without burning budget

Dependency: Token audit capability + log analysis skills

Level 1 achieved - Architecture optimized

Level 2: Multi-Model Routing

Unlocks: 15% additional savings, task-appropriate cost structure

Enables: Sub-agent orchestration becomes economical

Dependency: Understanding of model capabilities + task complexity mapping

Level 2 achieved - Multi-model routing active

Level 3: Local LLM Integration

Unlocks: Zero-cost operations for heartbeats + brainless tasks

Enables: Infinite scaling of simple operations

Dependency: Olama setup + model routing logic

Level 3 achieved - Olama integrated

Level 4: Sub-Agent Orchestration

Unlocks: Parallel execution, specialized agents per task type

Enables: "Virtual research team" economics

Dependency: All above layers + task decomposition framework

Level 4 achieved - Sub-agents orchestrated

Level 5: Production Economics (Service Business)

Unlocks: $1/hour for 14-agent overnight operation

Enables: Service business model (sell lead gen as product)

Dependency: All above + API integrations (Brave, Hunter.io)

Example: $6 for 1,000 B2B leads = $0.006/lead

Level 5 achieved - Production service deployed

🔮 Future Unlocks (Not Discussed But Enabled):

Economic Moat: Competitors burning 30x more = your pricing power or 30x margin advantage
Capability Expansion: Any "impossible to scale manually" research becomes viable (patent analysis, competitive intel, due diligence)
Business Model Shift: From "I use AI agents" → "I sell AI research services"
Full Funnel Automation: Lead gen → outreach → booking → follow-up (each step only economical after previous optimized)

AI Prompts Library: Copy-Paste Implementation

Use these prompts with Claude/ChatGPT to generate exact config files and code needed for each optimization layer.

🤖 Prompt 1: Generate Context Management Config

I'm running OpenClaw and need to optimize context loading to reduce token waste. CURRENT SITUATION: - Every heartbeat loads full context files (~50KB+) - Every user prompt loads all context + session history - Burning 2-3M tokens/day while idle GOAL: Generate OpenClaw config modifications that: 1. Prevent context loading on heartbeats 2. Implement selective context loading based on task type 3. Reduce context size from 50KB to ~5KB on typical operations 4. Maintain necessary context for task completion OUTPUT FORMAT: - JSON config snippet with exact parameters - Explanation of each setting - Test procedure to verify context size reduction - Expected before/after token metrics Generate the config modifications now.

🤖 Prompt 2: Build Session Cleanup Command

I'm running OpenClaw with Slack integration and it's loading my entire Slack session history (111KB) on every prompt, burning massive tokens. GOAL: Create a custom OpenClaw command called "new_session" that: 1. Dumps current Slack/WhatsApp session buffer 2. Saves dumped content to agent memory for future recall if needed 3. Clears the session buffer to prevent loading on next API call 4. Provides confirmation that cleanup completed TECHNICAL REQUIREMENTS: - Command should be callable by typing "new session" in Slack - Memory storage format should allow selective recall - No loss of critical information - Immediate token savings on next operation OUTPUT FORMAT: - Complete command code - Integration instructions for OpenClaw - Usage examples - Expected token savings per operation Generate the command implementation now.

🤖 Prompt 3: Configure Multi-Model Routing

I want to set up multi-model routing in OpenClaw to automatically use cheaper models (Haiku, Olama) for simple tasks and escalate to Sonnet/Opus only when needed. CURRENT SETUP: - Running only Sonnet for all operations - Burning ~$3/day on tasks that could run on Haiku at 10x lower cost GOAL: Generate complete OpenClaw config for 4-tier model routing: TIER 1 - Olama (Local, Free): - Heartbeats - File organization - CSV compilation - Folder structure management TIER 2 - Haiku ($0.25/1M tokens): - Data collection - Web scraping - List building - Basic formatting - Escalate to Sonnet on failure TIER 3 - Sonnet ($3/1M tokens): - Writing and email drafting - Research synthesis - Code generation - Escalate to Opus on failure TIER 4 - Opus ($15/1M tokens): - Strategic reasoning - Complex logic - Novel problem-solving OUTPUT FORMAT: - Complete JSON config with all 4 models - Escalation logic between tiers - Task classification rules - Test procedure to verify routing - Expected cost breakdown (% per tier) Generate the routing configuration now.

🤖 Prompt 4: Token Audit Analysis Script

I need a daily token audit script for OpenClaw that helps me catch architectural waste before it compounds. REQUIREMENTS: Create a script that analyzes my OpenClaw logs and reports: 1. CONTEXT SIZE ANALYSIS: - Average context size per operation type - Context bloat trends over time - Operations exceeding 10KB context 2. SESSION HISTORY TRACKING: - Session buffer size from messaging platforms - History loading frequency - Platforms contributing most bloat 3. HEARTBEAT MONITORING: - Heartbeat frequency - Tokens used per heartbeat - Daily heartbeat cost 4. MODEL ROUTING DISTRIBUTION: - % operations per model tier - Escalation frequency - Cost per model 5. CACHING EFFICIENCY: - Cached vs fresh token ratio - Operations with low cache hit rates - Optimization opportunities OUTPUT FORMAT: - Python script that parses OpenClaw logs - Daily summary report with metrics - Cost trend visualization - Actionable recommendations - Alert thresholds for anomalies Generate the audit script now.

🤖 Prompt 5: Sub-Agent Orchestration Framework

I want to create a sub-agent orchestration system for OpenClaw that can spin up specialized agents for complex overnight batch jobs. USE CASE: B2B Lead Generation Task: Find 1,000 qualified leads for [industry] with email validation, LinkedIn profiles, and personalized outreach drafts. GOAL: Design a sub-agent framework that: AGENT SPECIALIZATION: - Data Collection Agents (Haiku): Web scraping, blog reading, signal detection - Writing Agents (Sonnet): Email personalization, follow-up sequences - Organization Agents (Olama): File management, CSV compilation, folder structure COORDINATION: - Master agent distributes tasks - Sub-agents report completion - Results aggregated and organized - Error handling and escalation COST OPTIMIZATION: - Route by task complexity automatically - Parallel execution where possible - Cached token maximization - Target: $1/hour for full operation OUTPUT FORMAT: - Sub-agent architecture diagram - Task distribution algorithm - Complete implementation code - Test procedure for 100-lead trial - Scaling guidelines for 1,000+ leads Generate the orchestration framework now.

🤖 Prompt 6: Production Deployment Checklist

I've optimized my OpenClaw setup through all 3 layers and want to deploy to production for B2B lead gen services. CURRENT STATE: ✅ Architecture optimized (context managed, session cleanup, Olama heartbeats) ✅ Multi-model routing active (Olama → Haiku → Sonnet → Opus) ✅ Cost-aware execution configured ✅ Daily token audits running ✅ Sub-agent orchestration tested GOAL: Create a comprehensive production deployment checklist that covers: 1. SAFETY & SECURITY: - Dedicated machine isolation - API key rotation schedule - Spending limit configuration - Emergency kill switch - Audit log retention 2. MONITORING: - Token usage dashboards - Cost threshold alerts - Model routing distribution - Error rate tracking - Task completion metrics 3. CLIENT ONBOARDING: - Service scoping templates - Cost estimation tools - Deliverable formats - SLA definitions - Reporting procedures 4. SCALING CONSIDERATIONS: - Multi-client isolation - Task queue management - Rate limit handling at scale - Cost allocation per client - Performance benchmarks 5. DOCUMENTATION: - Configuration backup procedures - Disaster recovery plan - Troubleshooting runbooks - Client-facing service docs OUTPUT FORMAT: - Complete checklist with verification steps - Production readiness scorecard - Risk mitigation strategies - Scaling roadmap - Emergency procedures Generate the production deployment checklist now.

💡 Pro Tip: Use these prompts sequentially. Don't jump to sub-agent orchestration without completing architecture fixes first. Each layer builds on the previous.

Validation Checklist & Resources

📚 Essential Resources

🛠 Required API Integrations

⚠️ Common Failure Modes:

Skipping token audits: Won't catch architectural waste until after burning budget
Not testing escalation: Tasks fail silently instead of escalating to higher model
Auto-billing enabled: Wake up to $500 bill from overnight bloat
Single-platform testing: Context bloat in Slack not discovered until production
Incomplete context management: Fixed heartbeats but not session history = still bleeding tokens

OpenClaw Token Optimization Playbook: Cut AI Agent Costs by 97%

Transform AI Agent Economics from Budget-Killer to Profit Center

⚠️ CRITICAL WARNING

Optimization Layers

Layer 1: Architecture Fix (80% Savings)

Issue #1: Context File Bloat

Issue #2: Session History Tax

Issue #3: Heartbeat API Waste

Layer 2: Multi-Model Routing (15% Additional Savings)

Task Complexity Framework

Multi-Model Config Template

Escalation Logic

Task: B2B Lead Gen for Distressed Businesses

Layer 3: Cost-Aware Execution (2% Additional Savings)

Workspace File Optimization

Rate Limiting & Pacing

Daily Token Audit Protocol

Cached Token Exploitation

Dependencies & Future Unlock Map

Level 0: Demo Phase (Default OpenClaw)

Level 1: Architecture Fix (Context Management)

Level 2: Multi-Model Routing

Level 3: Local LLM Integration

Level 4: Sub-Agent Orchestration

Level 5: Production Economics (Service Business)

AI Prompts Library: Copy-Paste Implementation

🤖 Prompt 1: Generate Context Management Config

🤖 Prompt 2: Build Session Cleanup Command

🤖 Prompt 3: Configure Multi-Model Routing

🤖 Prompt 4: Token Audit Analysis Script

🤖 Prompt 5: Sub-Agent Orchestration Framework

🤖 Prompt 6: Production Deployment Checklist

Validation Checklist & Resources

✅ Complete Optimization Verification

📚 Essential Resources

🛠 Required API Integrations

Need Help Implementing OpenClaw Optimization?