Step 0: Measure Your Current Waste
Before/After Transformation
| ❌ BEFORE (Flying Blind) | ✅ AFTER (Data-Driven) |
|---|---|
| Don't know where tokens go | Exact token breakdown by category |
| Assume cost is just "normal" | Top 3 waste sources identified |
| No visibility into waste sources | Baseline for measuring improvement |
| Can't prioritize fixes | Clear prioritization roadmap |
The Problem
You can't optimize what you don't measure. Most OpenClaw users assume $50-150/month is "normal cost of AI." In reality, 80-95% is architectural waste you can eliminate in one afternoon.
Why This Matters
Matt's Discovery Story:
- Loaded $25 to Anthropic API
- Was on track to spend $20/day IDLE (doing nothing)
- Only found the waste by running token audit after hitting rate limits
- Without measurement, would have just kept burning money
Exactly Do This
- Access your OpenClaw logs directory (location varies by OS)
- Use AI prompt below to generate token audit script
- Run the script and save output to file
- Identify your top 3 waste sources:
- Context bloat (likely 50KB+ per operation)
- Session history (if using Slack/WhatsApp)
- Heartbeat API calls (every 30 min)
- Calculate your daily idle cost (API calls × token cost)
Your Baseline Metrics
MY BASELINE METRICS (from audit): Context size per operation: _____ KB Daily heartbeat cost: $_____ Session history size: _____ KB Top waste source: ___________ Current daily idle cost: $_____ Current model usage: ___________
Matt's Numbers (Comparison)
MATT'S BASELINE: Context size: 50KB+ Daily heartbeat cost: $2-3 Session history: 111KB (Slack) Top waste: Context bloat Daily idle cost: $3 Model: 100% Sonnet
🤖 AI Prompt Library: Token Audit
Prompt 1: Generate Token Audit Script
I'm running OpenClaw and need to audit my token usage to find waste. Generate a Python script that: 1. READS: OpenClaw log files from [SPECIFY YOUR LOG DIRECTORY] 2. EXTRACTS: Token usage data by category 3. CALCULATES: - Context size per operation type - Session history loading frequency and size - Heartbeat token usage - Model routing distribution - Daily/weekly cost projections 4. OUTPUTS: - Summary dashboard with key metrics - Top 5 waste sources ranked by cost - Before/after projection if waste eliminated - CSV export for tracking over time 5. HANDLES: - Different log formats - Missing data gracefully - Rate limit errors separately Generate complete, documented script ready to run. Include installation instructions for dependencies.
Prompt 2: Analyze Audit Results
I ran a token audit on my OpenClaw instance. Help me interpret the results and prioritize fixes. MY AUDIT RESULTS: [Paste your audit output here] Analyze and provide: 1. TOP 3 WASTE SOURCES: - Rank by cost impact - Estimate savings % if eliminated - Difficulty to fix (easy/medium/hard) 2. QUICK WIN IDENTIFICATION: - What can I fix in <30 min for biggest impact? - Which fixes are prerequisites for others? 3. PRIORITIZED FIX SEQUENCE: - Step 1: [Fix + expected savings] - Step 2: [Fix + expected savings] - Step 3: [Fix + expected savings] 4. RED FLAGS: - Any unusual patterns? - Signs of misconfiguration? - Potential security issues?
Quality Gates: What's "Good Enough" vs "Exceptional"?
✅ GOOD ENOUGH (Proceed to Step 1)
- Audit script runs successfully
- You have baseline daily token usage number
- You identified at least 1 waste source >20% of total
- You saved the audit output for comparison later
🌟 EXCEPTIONAL (Nice to Have)
- Automated daily audit reports
- Historical trend tracking
- Breakdown by sub-agent/task type
- Alert thresholds configured
Step 1: Eliminate Context Bloat (80% Savings)
Before/After Transformation
| ❌ BEFORE (Context Explosion) | ✅ AFTER (Selective Loading) |
|---|---|
| 50KB context loaded per message | ~5KB context per operation |
| 75KB after a few days, 100KB+ after a week | Context size stable (doesn't grow) |
| 2-3M tokens/day just from heartbeats | Zero tokens on heartbeats |
| $2-3/day sitting completely idle | $0/day idle cost |
The Problem
Every heartbeat (every 30 min), every message, every prompt loads ALL your context files. As your memory grows, this compounds: 50KB → 75KB → 100KB+. You're burning tokens just to keep the agent awake.
Why This Matters
Matt's Numbers:
- Initial context: 50KB per operation
- After fixes: 5KB per operation (90% reduction)
- Savings: $3/day → $0/day idle
- Time to implement: 15 minutes of config changes
This is the SINGLE highest-leverage fix. Do this first.
Exactly Do This
- Locate your config file:
Mac: ~/.openclaw/config.json Windows: %APPDATA%\OpenClaw\config.json Linux: ~/.config/openclaw/config.json
- Backup your current config (copy to safe location)
- Use AI prompt below to generate config modifications
- Apply the changes (edit config file)
- Restart OpenClaw and test with simple task
- Re-run token audit to verify reduction
Your Verification
BEFORE THIS FIX: Context per operation: _____ KB Daily idle cost: $_____ AFTER THIS FIX: Context per operation: _____ KB Daily idle cost: $_____ SAVINGS: _____%
Troubleshooting
Common Failures:
- Agent broke / won't complete tasks: Restore backup, use "preserve task context" variant prompt
- Still loading 30KB+ context: Check if session history is the culprit (Step 2)
- Rate limit errors: Implement pacing logic (covered in Step 4)
🤖 AI Prompt Library: Context Management
Prompt 1: Generate Context Management Config
I'm running OpenClaw and need to eliminate context bloat. CURRENT SITUATION: - Every heartbeat loads full context (~50KB+) - Every message loads all context + history - Context size growing: 50KB → 75KB → 100KB+ MY CURRENT CONFIG: [Paste your config.json here OR say "using default config"] GOAL: Generate modified config that: 1. Prevents context loading on heartbeats 2. Implements selective context loading by task type 3. Reduces typical operation from 50KB → <10KB 4. Maintains necessary context for task completion 5. Preserves agent functionality OUTPUT: - Complete modified config.json - Explanation of each changed parameter - Test procedure to verify context reduction - Rollback instructions if something breaks CRITICAL: Don't break the agent. Preserve task completion capability.
Prompt 2: Verify Context Reduction
I just modified my OpenClaw config to reduce context bloat. Run this verification checklist: 1. AUDIT COMPARISON: Before config: [Your Step 0 context size] After config: [Your new audit context size] Reduction: ____% 2. FUNCTIONALITY TEST: - Can agent complete simple task? (Y/N) - Can agent access memory when needed? (Y/N) - Any error messages? [List them] 3. NEXT STEPS: - If reduction <50%: [What to check/fix] - If agent broke: [Rollback procedure] - If reduction >70%: [Proceed to Step 2] Analyze my results and tell me if I'm good to proceed. MY RESULTS: [Paste audit comparison here]
📊 Real Example: Matt's Implementation
MATT'S IMPLEMENTATION: Discovery: Token audit showed 50KB context per heartbeat Fix Applied: Selective context loading config Test: Simple task completion verified Result: 50KB → 5KB (90% reduction) Savings: $3/day → $0/day idle Time: 15 min config + 5 min test
Key Learning: Rate limits forced him to investigate. Most users never look at token breakdown and just accept the cost.
Quality Gates: What's "Good Enough" vs "Exceptional"?
✅ GOOD ENOUGH (Proceed to Step 2)
- Context size reduced by 50%+ (audit shows this)
- Agent still completes basic tasks correctly
- Daily idle cost dropped significantly
- You have backup config if rollback needed
🌟 EXCEPTIONAL (Optimization Round 2)
- Context size <10KB consistently
- Task-specific context rules configured
- 90%+ reduction from baseline
- Automated monitoring alerts if bloat returns
Step 2: Kill Session History Tax (Messaging Platforms)
Before/After Transformation
| ❌ BEFORE (History Explosion) | ✅ AFTER (Clean Sessions) |
|---|---|
| Slack loads entire chat history (111KB!) | "new session" command dumps history |
| Every message sends full history to API | History saved to memory (accessible if needed) |
| 1M+ tokens per prompt when using Slack | Normal token usage restored |
| Rate limit errors constantly | No more rate limit errors |
The Problem
If you use Slack or WhatsApp to communicate with OpenClaw, it's loading your ENTIRE conversation history on every API call. Matt found 111KB of session text being sent every time he prompted the agent.
✅ Slack (confirmed issue)
⚠️ WhatsApp (likely same issue)
Exactly Do This
- Verify you have this issue (token audit shows large session size)
- Use AI prompt below to generate "new session" command
- Integrate the command into your OpenClaw setup
- Test it by typing "new session" in Slack
- Verify dump worked (check memory storage, re-run audit)
- Make it a habit — type "new session" before expensive operations
When to Dump Session
TYPE "NEW SESSION" BEFORE: - Running overnight batch jobs - Expensive research tasks - Multi-agent orchestration - Any operation you want cost-optimized KEEP SESSION WHEN: - Ongoing conversation needs context - Quick back-and-forth exchanges - Currently debugging something
Matt's Usage Pattern
MATT'S APPROACH: Morning: "new session" (clean slate) During work: Keep session active Before bed: "new session" before overnight jobs After big tasks: "new session" to prevent bloat Result: Never hits rate limits anymore Session bloat: 111KB → 0KB consistently
🤖 AI Prompt Library: Session Cleanup
Prompt 1: Generate Session Cleanup Command
I'm running OpenClaw with Slack integration and it's loading my entire Slack history (111KB) on every message. CREATE: A custom OpenClaw command called "new_session" that: FUNCTIONALITY: 1. Dumps current Slack/WhatsApp session buffer 2. Saves dumped content to agent memory (accessible for recall) 3. Clears the session buffer completely 4. Confirms cleanup completed with token savings estimate TECHNICAL REQUIREMENTS: - Callable by typing "new session" in Slack - Memory format allows selective recall if needed - No loss of critical information - Immediate effect on next API call - Logs the action for audit trail OUTPUT: - Complete command code (ready to integrate) - Installation/integration instructions for OpenClaw - Usage examples and best practices - Verification method (how to confirm it worked) - Expected token savings per operation after cleanup PLATFORM: [Slack / WhatsApp / Other]
Prompt 2: Session History Audit
Help me verify session history is my token waste culprit. MY TOKEN AUDIT SHOWS: [Paste relevant audit sections] Analyze: 1. Is session history loading the issue? - Expected indicators: [What to look for in logs] - Platform comparison: Does web UI show same pattern? 2. Estimated impact: - Current waste from session history: $___/day - Potential savings if fixed: $___/day 3. Verification procedure: - How do I confirm session dump worked? - What should audit show after fix? 4. Alternative causes: - If it's NOT session history, what else could cause this pattern?
📊 Real Example: Matt's Session Discovery
MATT'S SESSION HISTORY DISCOVERY: Problem: Hitting rate limits (429 errors) constantly Investigation: Compared web UI vs Slack token usage Finding: Slack used 1M+ tokens, web UI used 50k for SAME task Root cause: 111KB session history blob in every Slack API call Fix: Created "new session" command Result: Rate limits eliminated, token usage normalized Time: 20 min to build command, 2 min to use
Critical Insight: Would never have found this without rate limits forcing investigation. Silent killer for Slack users.
Quality Gates
✅ GOOD ENOUGH (Proceed to Step 3)
- "new session" command works in Slack/WhatsApp
- Token audit shows session size dropped to near-zero
- No more rate limit errors
- Memory recall still functional when needed
Step 3: Heartbeats to Olama (Zero API Cost)
Before/After Transformation
| ❌ BEFORE (Paying to Stay Awake) | ✅ AFTER (Local Heartbeats) |
|---|---|
| Heartbeat every 30 minutes via API | Heartbeat runs on Olama (local, free) |
| Using Opus: $5/day idle | $0/day heartbeat cost (any model) |
| Using Sonnet: $2-3/day idle | Infinite heartbeat frequency if wanted |
| Using Haiku: $0.50/day idle | Same functionality, zero cost |
The Problem
Heartbeats keep your agent alive and checking for active tasks. But using Opus/Sonnet/Haiku for this is like hiring a neurosurgeon to take your temperature. It's complete overkill.
Why This Matters
Heartbeat Economics:
- Frequency: Every 30 minutes (48/day)
- Task: Check memory, check task queue, report status
- Complexity: Literally just "system okay?" level logic
- Cost on Opus: ~$5/day for brainless pings
- Cost on Olama: $0 forever
This is FREE money.
Exactly Do This
- Install Olama (local LLM, open source, free)
- Download from
ollama.ai - Install latest version
- Verify it runs:
ollama --version
- Download from
- Add Olama to OpenClaw config using prompt below
- Update heartbeat routing to use Olama
- Test heartbeat manually (trigger one, verify it works)
- Run overnight and verify zero API calls in audit
Perfect for Olama (Free)
- ✅ Heartbeats (system health checks)
- ✅ File organization (moving, renaming)
- ✅ CSV compilation (merging data)
- ✅ Folder structure (creating directories)
- ✅ Basic text formatting
- ✅ Log file parsing
= Brainless operations with zero reasoning required
Still Use API For
- 🔵 Web research (Haiku)
- 🔵 Email writing (Sonnet)
- 🔵 Code generation (Sonnet)
- 🟣 Strategic reasoning (Opus)
- 🟣 Complex analysis (Opus)
= Anything requiring actual intelligence
🤖 AI Prompt Library: Olama Integration
Prompt 1: Configure Olama Integration
I want to add Olama (local LLM) to my OpenClaw setup to eliminate API costs for brainless operations. CURRENT SETUP: [Paste your current OpenClaw config.json] GOAL: Generate config modifications to: 1. ADD OLAMA: - Model: [Latest Olama version - currently llama2] - Use for: heartbeats, file_ops, csv_compile, folder_structure - Cost tier: FREE - Routing: Default for "brainless" task category 2. UPDATE HEARTBEAT: - Route all heartbeats to Olama - Remove API calls entirely - Maintain check frequency (30 min) - Preserve functionality 3. TESTING: - How to verify Olama is handling heartbeats - How to confirm zero API calls - Fallback if Olama fails OUTPUT: - Complete modified config.json - Olama installation verification steps - Integration test procedure - Troubleshooting common issues Make it copy-paste ready.
Prompt 2: Identify "Brainless" Operations
Help me identify which of my OpenClaw operations should run on Olama (free) vs API models (paid). MY TYPICAL TASKS: [List your common agent tasks here] For each task, classify as: 1. BRAINLESS (Olama - Free): - No reasoning required - File/data manipulation only - Pattern matching at most 2. LOW-COMPLEXITY (Haiku - $0.25/1M): - Basic research/collection - Simple formatting with light intelligence 3. MEDIUM-COMPLEXITY (Sonnet - $3/1M): - Writing, email drafting - Code generation 4. HIGH-COMPLEXITY (Opus - $15/1M): - Strategic reasoning - Novel problem solving Then estimate my cost savings routing everything optimally.
📊 Real Example: Matt's Olama Implementation
MATT'S OLAMA IMPLEMENTATION: Problem: Spending $2-3/day on heartbeats alone (Sonnet) Solution: Installed Olama, routed heartbeats to local Test: Let it run overnight, checked logs Result: Zero API calls for heartbeats confirmed Savings: $2-3/day → $0/day on heartbeats Extended Use Cases Matt Found: - File organization during overnight jobs (14 sub-agents) - CSV compilation from multiple sources - Folder structure creation - Log parsing and cleanup Total Olama Usage: ~15% of all operations Total Savings: These operations would have cost $20-30/month Actual Cost: $0 forever Installation Time: 10 min Config Time: 15 min ROI: Infinite (free forever)
Quality Gates
✅ GOOD ENOUGH (Proceed to Step 4)
- Olama installed and running
- Heartbeats routed to Olama successfully
- Token audit shows zero API calls for heartbeats
- Agent still functions normally
Step 4: Route by Task Complexity (15% Additional Savings)
Before/After Transformation
| ❌ BEFORE (Single Model Waste) | ✅ AFTER (Smart Routing) |
|---|---|
| 100% operations on Sonnet ($3/1M tokens) | 15% Olama (free) — brainless ops |
| OR 100% on Opus ($15/1M tokens) | 75% Haiku ($0.25/1M) — data collection |
| Brainless tasks using expensive AI | 10% Sonnet ($3/1M) — writing/code |
| No escalation logic | <1% Opus ($15/1M) — strategic only |
| Monthly: $90-150 | Monthly: $3-10 |
The Problem
Using Opus for everything is like hiring a Fortune 500 CEO to organize your filing cabinet. The work gets done, but you're burning massive money on overkill.
Task Complexity Distribution:
- 85% of operations: Brainless or low-complexity
- 10% of operations: Medium complexity
- 5% of operations: High complexity
But default setups use ONE model for everything.
Cost Comparison for 1M Operations
| Routing Strategy | Cost | Savings vs Opus |
|---|---|---|
| All Opus | $15,000 | — |
| All Sonnet | $3,000 | 80% |
| All Haiku | $250 | 98% |
| Optimized routing | $300-500 | 96-98% |
Exactly Do This
- Map your task types to complexity tiers
- Use AI prompt to generate 4-tier routing config
- Implement escalation logic (Haiku → Sonnet → Opus)
- Test each tier with representative tasks
- Run overnight batch job and analyze routing distribution
- Adjust thresholds based on results
4-Tier Routing Framework
TIER 0 - OLAMA (Free): Heartbeats, file ops, CSV work, folder management TIER 1 - HAIKU ($0.25/1M): Web scraping, data collection, list building, basic formatting → Escalate to Sonnet if blocked TIER 2 - SONNET ($3/1M): Writing, email drafting, code generation, research synthesis → Escalate to Opus if blocked TIER 3 - OPUS ($15/1M): Strategic reasoning, novel problems, complex logic → Final tier (no escalation)
Matt's Actual Distribution
MATT'S ACTUAL USAGE: Olama: 15% (brainless ops) Haiku: 75% (data collection) Sonnet: 10% (writing/email) Opus: <1% (strategic only) OVERNIGHT JOB BREAKDOWN: - 14 sub-agents running - 10 agents: Haiku (web scraping) - 2 agents: Sonnet (writing emails) - 2 agents: Olama (file organization) - 0 agents: Opus (not needed) Cost: $6 for 6 hours = $1/hour
🤖 AI Prompt Library: Multi-Model Routing
Prompt 1: Generate Multi-Model Routing Config
I want to set up 4-tier model routing in OpenClaw to automatically use cheapest viable model for each task. CURRENT SETUP: [Paste your config.json] MY TASK BREAKDOWN: [Describe your common operations and their complexity] GOAL: Create complete multi-model routing configuration: TIER 0 - OLAMA (Free): - Tasks: heartbeats, file_ops, csv_compile, folder_structure - Escalation: None (if fails, log error) TIER 1 - HAIKU ($0.25/1M): - Tasks: data_collection, web_scrape, list_building, basic_formatting - Escalation: Sonnet (if blocked or error) - Default tier: Use unless explicitly routed elsewhere TIER 2 - SONNET ($3/1M): - Tasks: writing, email_draft, code_gen, research_synthesis - Escalation: Opus (if blocked or error) TIER 3 - OPUS ($15/1M): - Tasks: strategic_reasoning, complex_logic, novel_problems - Escalation: None (final tier) OUTPUT: - Complete routing config (JSON) - Task classification rules - Escalation logic implementation - Test procedure for each tier - Expected cost distribution Make it production-ready and well-documented.
Prompt 2: Sub-Agent Orchestration
I want to create a sub-agent orchestration system for complex overnight batch jobs. USE CASE: [Describe your overnight job - e.g., B2B lead gen] REQUIREMENTS: - Spin up multiple specialized sub-agents - Each routed to appropriate model tier - Parallel execution where possible - Results aggregated and organized - Target: <$2/hour for full operation DESIGN: 1. Agent Specialization: - How many sub-agents needed? - What does each specialize in? - Which model tier per agent? 2. Coordination: - Master agent responsibilities - Sub-agent communication - Error handling and escalation 3. Cost Optimization: - Task distribution for minimal cost - Cached token maximization - Parallel vs sequential trade-offs OUTPUT: - Sub-agent architecture diagram - Task distribution algorithm - Complete implementation code - Test procedure (100-item trial) - Scaling guidelines (1,000+ items)
📊 Real Example: Matt's 6-Hour Overnight Job
Task: 1,000 Qualified B2B Leads
Agents 1-10 (Haiku): Web scraping distressed business signals Reading blogs and finding contact info Using Brave Search API + Hunter.io 4 hours runtime Agents 11-12 (Sonnet): Writing personalized cold outreach emails Creating follow-up sequences 1.5 hours runtime Agents 13-14 (Olama): Organizing files into folders Compiling CSVs with proper headers 0.5 hours runtime RESULTS: Total Cost: $6 for 6 hours Per-hour: $1 Per-lead: $0.006 If run on Opus only: $150 If run on Sonnet only: $30 Savings: 96% vs Opus, 80% vs Sonnet
Key Insight: 95% of tokens were CACHED (repeated operations), further reducing cost. Caching + multi-model routing = production economics.
Quality Gates
✅ GOOD ENOUGH (Production Ready)
- 4-tier routing configured and functional
- Escalation logic tested and working
- 70%+ operations on Haiku or Olama
- Successfully completed multi-hour batch job
- Cost tracking shows expected distribution
🌟 EXCEPTIONAL (Optimized)
- 85%+ on Haiku/Olama
- Sub-agent orchestration working
- Cached token ratio >80%
- Custom routing rules per use case
- Automated cost alerts if distribution drifts
Validate Your Optimization: Final Token Audit
Complete Optimization Verification
✅ COMPLETE OPTIMIZATION VERIFICATION Run final token audit and verify: ☐ Daily idle cost = $0 (no API calls during heartbeats) ☐ Context size <10KB per operation (down from 50KB+) ☐ Session cleanup working (if using Slack/WhatsApp) ☐ Olama handling heartbeats + brainless ops (zero API calls) ☐ Multi-model routing active (80%+ on Haiku/Olama in logs) ☐ Escalation logic tested (trigger escalation, verify it works) ☐ Overnight batch job completed at <$2/hour ☐ Agent functionality unchanged (same output quality) ☐ Cost reduction >90% from baseline ☐ You understand maintenance (can re-run audit monthly)
Your Final Numbers
YOUR FINAL NUMBERS: Before optimization: $_____ /day After optimization: $_____ /day Reduction: _____ % Monthly savings: $_____ CAPABILITIES UNLOCKED: ☐ Overnight batch processing economically viable ☐ Sub-agent orchestration affordable ☐ Can run 24/7 for ~$1/hour ☐ Production lead gen operational
Implementation Resources & Next Steps
Essential Resources
Optimization Quick Reference
| Fix | Savings | Time |
|---|---|---|
| Context Bloat | 80% | 30 min |
| Session History | Variable | 20 min |
| Heartbeats → Olama | $2-5/day | 25 min |
| Multi-Model Routing | 15%+ | 45 min |
| Combined | 97% | ~2.5 hrs |
Dependency Map
OPTIMIZATION DEPENDENCY MAP: Step 0 (Token Audit) ← REQUIRED FIRST (need baseline) │ ├─ Step 1 (Context Bloat) ← HIGHEST IMPACT, do next │ │ │ ├─ Step 2 (Session History) ← Optional: Slack/WhatsApp only │ │ │ └─ Step 3 (Heartbeats) ← Independent, can do after Step 1 │ │ │ └─ Step 4 (Multi-Model) ← Requires Steps 1+3 complete │ └─ Validation ← After all steps complete
Share This Playbook
Was this playbook helpful?