Setting Up OpenClaw, a Field Guide: Cost Traps, and Silent Failure Fixes
This is the practical companion to Building an OpenClaw Multi-Agent AI System.
That post tells the story — what I built, what broke, what it cost. This guide tells you what to do about it: 12 operational rules distilled from eight days of AI-assisted debugging, written for solo operators who want results without accidentally publishing secrets or turning their system into a public attack surface.
If you haven't read the story, this guide still works standalone. But the failures hit harder with context.
Table of contents
If you came from the story...
Here's where each crisis from the narrative maps to rules in this guide:
"The morning was a lie" (silent fallback) → Rule #1: Model transparency
USER.md truncation (Clive didn't know who I was) → Rule #2: Budget bootstrap
"The token crisis" ($50.80 spent per $1 of output) → Rule #2 + Rule #4: Bootstrap budget + On-demand loading
"A second heartbeat" (Quill responded as Clive) → Rule #7: Agent-specific AGENTS.md + Rule #8: Discord hygiene
Model toggle failure (Opus review that wasn't) → Rule #1: Verify via logs, not UI
Hallucinated pipeline entry → Rule #12: Session poisoning
If you're here to fix a specific problem, use the rule numbers above to jump straight there.
What this guide doesn't include
To avoid making it easier to compromise a real system, I'm not publishing:
API keys, tokens, or anything credential-shaped
Discord guild IDs, channel IDs, user IDs, or bot identifiers
Machine paths, mount points, usernames, or host/port layouts
Copy/pasteable proxy code or network configs
You'll still get the operational rules, the failure modes, and the verification steps — the parts that transfer to any setup.
How these rules were discovered
Every lesson in this guide came from a debugging session with Claude Opus 4.6 via Perplexity.
The workflow was always the same:
I noticed something was broken or felt wrong (usually by reading logs or seeing weird behavior)
I copied the evidence and pasted it into Perplexity
Claude asked clarifying questions — "Show me your config," "Show me the bootstrap logs," "What does the cost dashboard say?"
I pasted more files as Claude requested them
Claude proposed a hypothesis and a test
I ran the test and pasted results back
Claude refined the diagnosis and suggested a fix
I applied the fix and tested it
When it worked, I asked Claude to create a summary log of what we'd learned
I called these "context transfer logs" — distilled summaries of the root cause and the fix. When I hit a similar problem days later, I'd paste the old log into a new thread: "We fixed this before — help me remember what we did."
The magic wasn't instant diagnosis. It was patient, methodical troubleshooting with a partner who never got frustrated asking "Can you show me?" And the discipline to document it so future-me could benefit too.
This isn't a guide I wrote by reading documentation. It's a survival manual compiled from eight days of AI-assisted debugging sessions, with AI-generated summaries serving as my operational memory.
Start here if you're overwhelmed
If you do nothing else, implement these five rules and the verification checklist:
Rule #1: Model transparency — Verify which model actually ran
Rule #2: Budget bootstrap — Prevent identity file truncation
Rule #4: On-demand loading — Stop paying to reload the same files
Rule #8: Discord hygiene — One channel, one bot, one agent
Verification checklist — How to check when something feels off
Start there. Add the rest as you hit the problems they solve.
The prime rule: treat silence as a bug
OpenClaw will often keep running even when something critical is wrong. If a failure can be silent, it eventually will be.
Design your organism around one assumption:
If you can't verify it, you can't trust it.
File map: what these files do
This guide references several operational files. Here's what each one is:
AGENTS.md — Your agent's operating manual (loaded at session start). Contains rules, behaviors, and protocols the agent follows every session. This is the most important file in your setup.
USER.md — Your identity, mission, preferences, and business context
IDENTITY.md — The agent's role, authority levels, and domain scope
SOUL.md — The agent's personality and communication style
SKILLSREGISTRY.md — Canonical list of available skills (names, commands, what they do)
Bootstrap files — Any file loaded automatically at session start (AGENTS.md, USER.md, IDENTITY.md, SOUL.md)
On-demand files — Files the agent loads only when needed (project trackers, pipelines, archives)
You don't need all of these on day one. Start with AGENTS.md (operating rules) and USER.md (who you are), then add the rest as your system grows.
The 12 Rules
These are the 12 highest-value lessons from my build logs — the ones that are broadly applicable and safe to publish.
1) Make model transparency mandatory (no silent fallback)
Failure mode
You think you're on your paid cloud model. You're not. Quality drops, costs look weird, and hallucinations become "plausible work."
Rule
At the start of every session, the agent must state:
What model it is running on (exact name)
Whether it is running on a fallback model
Whether its response is based on files it actually loaded vs. "general reasoning"
Add this to your AGENTS.md as a startup rule: "At the start of every session, state your active model. If you're uncertain, say so explicitly — never guess."
Critical warning: Agents can be wrong about what model they're on. The agent doesn't have access to the execution layer — it reports what it "thinks" based on context. Verification must come from logs or billing records, not from asking the agent.
Verify
Use the platform's logs or billing dashboard to confirm the model matches what the agent claims.
If the system can silently route to a different model, assume it will — until you prove otherwise.
After any session where the model matters (strategic reviews, anything important), check your API billing dashboard. Not the UI. Not by asking the agent. The billing logs — they're the only thing that can't lie.
See: Silent Fallback and Logs > UI in the main article.
2) Budget bootstrap explicitly (avoid silent truncation)
Failure mode
Your identity and context files load "successfully" but are silently truncated due to a bootstrap character limit. Your agent "knows" your name and nothing else.
Rule
Treat bootstrap as a resource with a strict budget. Set an explicit bootstrapTotalMaxChars (or the equivalent in your OpenClaw config) — don't trust defaults.
In my case, the platform silently defaulted to ~24,000 characters despite documentation claiming 150,000. My operating manual consumed 69% of that budget before my identity file ever loaded — leaving it with 221 usable characters out of 5,458. Four percent.
Verify
Start a clean session and check the logs for any truncation warnings.
The quick test: ask the agent to summarize your mission and top three priorities in three bullets. If it gets them wrong or stays vague, bootstrap isn't loading correctly. Run this test after every major config change.
See: Bootstrap Truncation in the main article.
3) AGENTS.md is a budget hog — keep it lean
Failure mode
AGENTS.md grows into a novel, and every file that loads after it gets starved or truncated.
Rule
AGENTS.md should be "operational law," not your entire knowledge base. Target: under 10,000 characters. Mine started at 15,800 and left no room for the identity and context files that followed.
Practical pattern:
Keep AGENTS.md short and directive — startup rules, behavioral guardrails, file loading order.
Move long references (project trackers, deployment checklists, content pipelines, historical context) into separate files that load on-demand only.
When you add a new policy, ask whether an old one can be removed or shortened. Maintain the cap.
Verify
Check AGENTS.md character count periodically. If it's creeping above your target, prune.
After pruning, run the bootstrap test from Rule #2. The goal is to confirm that later-loading files (USER.md, IDENTITY.md) are getting their full character budget — not 4% of it.
4) Use on-demand loading for operational files
Failure mode
You reload a large pile of files every session, paying to "remind the agent" of things it already knew five minutes ago.
Rule
Only auto-load what you must. Everything else loads conditionally:
Always load: core identity, operating rules, "how to work with you"
Load on demand: trackers, registries, project boards, deep histories, content pipelines
Never auto-load: human reference archives, old threads, long transcripts
Structure your AGENTS.md with explicit loading tiers: "Always load steps 1–5. Load steps 6–10 only when the topic is relevant."
Verify
Watch your input-to-output ratio. If it's poor, you're probably paying for "reading," not "doing."
Check your API billing: if input tokens consistently dwarf output tokens (I was at ~70% input during my build phase), you're likely reloading context you don't need.
5) Separate execution and review models — and verify the review ran
Failure mode
You believe you got a premium strategic audit. The model toggle failed silently. You paid for a model to review itself.
Rule
Treat "review" as its own discrete, verifiable step:
Execution model produces the deliverable
Review model audits it
Execution model revises based on the audit
Each step is distinct. Each step is verified.
Verify
Confirm the intended review model ran by checking billing logs for that time window.
Save review notes as an artifact (not just in chat) — this creates an audit trail and makes it obvious when a "review" is just a rubber stamp.
See: Logs > UI in the main article.
6) Use explicit model routing — never let the agent decide
Failure mode
A weaker model handles something that requires judgment — security decisions, business strategy, irreversible edits — because it concluded the task was "simple enough." The output looks helpful but is wrong in subtle ways.
Rule
Use explicit routing rules and whitelists. The agent cannot expand its own privileges or self-assign to a cheaper model.
Safe baseline:
Cheap/local models: data pulls, formatting, straightforward file operations (with verification)
Strong models: judgment, synthesis, security, anything irreversible
Verify
Any task that touches credentials, permissions, routing, or automation: require strong-model review.
The whitelist expands based on your observed performance, not the model's self-assessment.
7) Every non-primary agent needs its own scoped AGENTS.md
Failure mode
A secondary agent inherits the main agent's operating manual and behaves like the wrong character — with the wrong priorities, personality, and permissions.
Rule
Each agent gets:
Its own minimal AGENTS.md scoped to its role
Its own explicit "what I can touch / what I cannot touch" boundaries
Its own SOUL.md that's distinct from the primary agent
Without a scoped AGENTS.md, a secondary agent will load the primary's — and you'll get your research manager responding with your executive's persona, priorities, and authority level.
Verify
Ask each agent: "State your role, your allowed tools, and your scope."
If two agents answer identically, you have a bootstrapping leak.
See: Discord Routing in the main article.
8) Discord multi-agent hygiene: one channel belongs to one agent
Failure mode
Two bots receive messages in the same channel and the wrong one responds — or they race each other.
Rule
Design Discord routing like production infrastructure:
One channel → one bot account → one agent binding
Avoid "catch-all" bindings for anything important
Each agent gets its own Discord bot account, its own channel, and its own config binding
Verify
Send a test message to each channel. You should get exactly one responder, every time.
If you ever get a response from the wrong agent, check channel bindings immediately — don't assume it'll sort itself out.
9) Verify your automation actually runs (not just "succeeds")
Failure mode
Your automated tasks (heartbeats, cron jobs, scheduled checks) execute on schedule and report "OK" — but they aren't loading workspace context, aren't reading the right files, and aren't actually performing the checks you designed.
This is subtler than a failure. The script runs. The cron job fires. The exit code is 0. But the agent inside that execution either loaded the wrong context, hit a silent error, or responded with a generic "all clear" because it didn't have enough information to check anything real.
In my case, heartbeat scripts ran for days returning "OK" while never loading the workspace files that contained the actual checks. The script succeeded. The heartbeat didn't.
Rule
If OpenClaw has a first-class heartbeat system, use it. Native scheduling typically:
Runs with correct agent context
Can suppress "all clear" spam
Plays nicer with caching and long-lived sessions
If you must use scripts, build verification into them: log which files were loaded, what checks were performed, and what the results were — not just "success/fail."
Verify
The acid test: deliberately create a condition your automation should catch (a stalled project, a past-due deadline, a known alert state). If the next heartbeat doesn't flag it, your automation is lying to you.
Check the actual log output, not just the exit code. "HEARTBEAT_OK" should mean "I checked everything and nothing needs attention" — not "I ran without crashing."
10) Know where "real config" lives vs. workspace state
Failure mode
You edit a file in the workspace and expect gateway behavior to change. Nothing changes. You burn hours.
Rule
Understand the difference between:
Gateway-level config — controls routing, model selection, authentication, channel bindings. Lives outside your workspace. Changes here affect how the system runs.
Workspace-level files — your agent's operating manual, identity files, skill scripts. Lives in your workspace. Changes here affect how the agent behaves within sessions.
State/metadata files — session caches, workspace state. Managed by the system. Don't edit these manually.
A common trap: you change a model name in your AGENTS.md (workspace) and expect the system to route to a different model. It won't — model routing is gateway config, not workspace config. The agent will say it's on the new model (because AGENTS.md told it to), but the gateway will route wherever it was already configured to route.
Verify
After any config change, do one clean restart cycle and confirm the new behavior with a simple test.
When troubleshooting, always ask first: "Is this a gateway config issue or a workspace file issue?" That distinction saves hours.
11) Skill discovery must be deterministic (registry-driven)
Failure mode
The agent "thinks" a skill exists because it saw one referenced somewhere, guesses a file path, or tries to infer a command. It runs the wrong thing — or hallucinates output when the skill isn't there at all.
Rule
Adopt one hard guardrail:
If it's not in
SKILLSREGISTRY.md, it doesn't exist.
In the registry, include for each skill:
Skill name and what it does
Exact command(s) to run
Required environment variables (names only)
Expected output shape
Verify
Before running any skill, the agent must confirm the registry entry exists and matches the request.
If the agent tries to execute a skill not in the registry, that's a red flag — it's guessing, not checking.
12) Session poisoning is real — reset after failure
Failure mode
A session accumulates failed attempts, partial tool-call narratives, and hallucinated outputs. The model starts pattern-matching on garbage instead of reasoning from first principles. You apply a fix, it still doesn't work, and you assume the fix was wrong.
Rule
When debugging tools, routing, or automation:
Treat failures as contaminating. Each failed attempt adds noise to the session context.
After major failures, start a fresh session with a clear prompt. It will outperform a poisoned session every time.
Make "clean-room rerun" a standard step in your troubleshooting checklist.
Verify
If the system repeats the same wrong behavior despite a fix you're confident in, reset the session before concluding the fix didn't work. The fix may be correct — the session just can't see past the accumulated garbage.
Verification checklist (no secrets required)
Use this checklist when something feels off:
Model check: Confirm the intended model ran (billing logs, not the UI)
Bootstrap check: Have the agent restate your mission and constraints — if it gets them wrong, bootstrap is broken
Skill check: Confirm the skill exists in SKILLSREGISTRY before any execution
Discord check: Confirm only one agent is bound to each channel
Heartbeat check: Induce a test alert condition and verify it triggers
Cost check: If spend rises while output doesn't, you're paying for context reloading or session bloat
Minimal safe defaults for new builders
If you're building your first organism, start here:
One primary agent, one Discord channel, no secondary bots yet
Bootstrap budget explicitly set; keep core files under 10K characters each
On-demand loading for everything non-core
Execution vs. review model separation, verified via logs
Model transparency rule in your AGENTS.md from day one
Registry-driven skills; no guessing
Scale up from there. Add heartbeats after you can prove they actually check something. Add secondary agents after your primary is stable. Add automation after you trust your verification habits.
Closing note
The big shift for me wasn't learning "AI."
It was learning operations: logging, budgets, guardrails, verification, and the discipline to treat silent failure as the default mode — not the edge case.
And I learned all of it by asking Claude Opus to read my error logs and tell me what was broken.
The workflow was simple:
Debug a problem with Claude's help
Ask Claude to create a summary log of what we learned
Save that "context transfer log" as operational memory
When a similar issue appeared days later, paste the old log into a new thread: "We fixed this before — what did we do?"
I didn't need to become an engineer. I needed to become fluent in: "Here's what's broken — what do I fix and in what order?" And I needed to let AI document what AI taught me.
If you build with that assumption — and with an AI debugging partner who never gets tired of explaining what a bootstrap budget is — OpenClaw becomes a platform you can trust instead of a slot machine that sometimes pays out.
If you want the full story behind these rules — the failures, the costs, and what it felt like to watch a second agent come alive — read Building an OpenClaw Multi-Agent AI System.