How Your Voice Agent Thinks
A tour of what goes into every phone call: the three layers of context that always load, the optional tools that may or may not be installed, and the simple rule that decides which file you should edit.
The big picture
Every time someone calls your number, the TTMA Voice Gateway opens a new session and assembles a fresh prompt from a handful of files on your server. Three layers always load. A fourth optional layer plugs in only if you've enabled it.
Most operators only ever touch one of these layers at a time. Knowing which layer holds which knob is the entire game.
Layer 1 - Identity (always loaded)Every call
Who the agent is. Three Markdown files in your OpenClaw workspace, read on every call. If you change them, the next call uses the new version. No restart needed.
| File | Role | What goes in it |
|---|---|---|
SOUL.md | Voice & personality | Tone, pacing, conversational style. The bot's vibe. |
IDENTITY.md | Name & role | Who the agent is, what it represents, which number it answers on. |
MEMORY.md | Long-term memory | Known contacts, learned facts, recurring preferences. The bot's accumulated context. |
These three files are shared with your OpenClaw agent. Edit them once and both the chat agent and the voice agent inherit the changes.
Layer 2 - Operational rules (always loaded)Every call
How the agent handles calls. One Markdown playbook at:
~/.openclaw/workspace/protocols/dojo-voice-agent-playbook.mdThis is where you write:
- Hard rules (reply length, when to escalate, what never to say)
- Pricing facts and other things the bot must always know
- Conversation flow (how to open, how to discover the caller's goal, how to close)
- Examples of good answers - short, contextual, mapped to caller intent
Like Layer 1, the playbook is re-read on every call. Edit it, hang up, call back - the new version is live.
Layer 3 - This-call brief (per-call only)This call only
The first two layers are about the bot itself. This third layer is about this specific call. It's injected only for the duration of one session, then thrown away.
For outbound calls
- purposeTemplate - "You're calling Sarah at Acme about her overdue invoice"
- firstMessage - the exact opening line the bot speaks first
- Any per-call data your app passes in (lead score, account state, last-touch date)
For inbound calls
- Caller phone number (and, if known, their CRM record)
- Whether they're the owner (private mode) or a stranger (public mode)
- Which playbook section to enter on
Layer 3 is the only layer your code touches at runtime. Everything else lives on disk and is static for the lifetime of the deployment.
Optional tools (only if you enable them)Off by default on lean setups
The three layers above are enough for a fully functional voice agent - answering questions, discovering goals, closing the call. Tools add the ability to do things mid-call: look something up, save a contact, hand off to an external system.
Each tool you enable adds a tool definition to every call's system prompt (a few hundred tokens) and gives the model the option to call it. More tools = more capability, but also more latency and more chances the model picks the wrong action. Turn them on as you need them.
A clean info-only bot like our public demo runs with zero tools enabled. Every answer comes from Layers 1 and 2. Zero tool latency, zero tool-call errors, perfectly predictable conversations.
The optimization rule
When you want to change something about how your agent behaves, this table tells you which file to open.
| If it changes… | Edit here | Why |
|---|---|---|
| Every call (tone, persona, role) | SOUL.md / IDENTITY.md | Identity is stable across the deployment |
| Every call (rules, scripts, pricing) | dojo-voice-agent-playbook.md | The bot's operational playbook is one file |
| Recurring facts the bot must always know | MEMORY.md | Long-term memory survives across all calls |
| One particular call | purposeTemplate / firstMessage in your API call | Layer 3 is per-session and doesn't leak into other calls |
| What the bot can do (capabilities) | voice-tools.json on the server | Tools are gateway-side feature toggles, not prompt edits |
Common mistakes
config.json is for audio knobs, timeouts, and feature flags. Words the bot says live in the playbook and the three Layer-1 files.Quick setup checklist
New deployment? Walk through these in order. Most operators finish in 15 minutes.
- 1Write IDENTITY.mdName, role, what the agent represents.
- 2Write SOUL.mdTone, pacing, conversational style.
- 3Seed MEMORY.md with canonical factsPricing, hours, the things the bot must always know.
- 4Customize the playbookHard rules, opening flow, discovery questions, end-of-call behavior.
- 5Decide which tools you needDefault to off. Turn each on only with a clear reason.
- 6Place a test call from your owner phoneConfirm the bot says the right thing first and follows your rules.
TL;DR
- Always loaded: SOUL, IDENTITY, MEMORY, and the playbook.
- Per call: purposeTemplate + firstMessage.
- Optional: tools, only if you enabled them.
- Edit where the change belongs. Identity in SOUL/IDENTITY, rules in the playbook, this-call data in the API.
- Less is more. Fewer tools, shorter prompts, faster calls.
Keep reading
- Setup & Settings - every knob in
config.json, where it lives, and how to change it. - Playbook Customization - section-by-section walkthrough of the operational rules file.
- Tools & Skills - when to enable each tool and what it costs you in latency.
Questions? Reach out at hello@talktomyagent.io.