Voice Agent: Setup & Settings

Every knob you can turn, where it lives, and how to change it. Most operators only ever touch one file (config.json) and one command (restart) - but knowing the layers underneath is what turns a stock voice agent into yours.

The four places settings live

Your voice agent reads from four sources, in this order. Each one wins over the next.

LayerWhat lives hereEdited by
1.Env var overrideEmergency knob overrides exported on the shell before service start. Almost never needed.On-call engineer
2.Cloud portalPersona, voice, greeting, recording on/off, owner phone, call mode, call/idle timeouts. Authoritative source for these.You, in the dashboard
3.config.json on the serverAudio tuning, context-file whitelist, gateway feature toggles. Operator-tuned by SSH-ing in and editing the file directly.You, on the box
4.Workspace filesIdentity and knowledge: SOUL.md, IDENTITY.md, USER.md, MEMORY.md, plus the voice playbook. Same files your OpenClaw agent uses.You, anywhere
5.Compiled defaultsSafe fallbacks baked into the binary for everything operators don't override. The floor.Nobody (rebuild required)

One source of truth per setting. Persona lives in the cloud portal. Audio knobs live on the box. Identity lives in workspace files. Secrets live in .env. Each layer does one job.

The first install (3 minutes)

  1. Generate an install token in the voice portal. Pick the version (default is Stable; pin to a test build only if you're testing a release candidate).
  2. Run the install command on a Linux host (Ubuntu, Debian, or any modern distro). Same host as your OpenClaw agent:
    curl -sSL https://api.talktomyagent.io/install.sh | \
      bash -s -- --token <YOUR_TOKEN> --accept-license
  3. Wait ~30 seconds. The installer downloads the gateway binary, sets up a Cloudflare tunnel, writes config.json with your defaults, seeds the voice playbook, and starts the service. Output shows a check mark for each step.
  4. Place a test call from your owner phone. The agent should greet you in private mode immediately.

What you have on disk afterward

~/ninja-talk/                         ← install dir
  config.json                         ← operator settings (this guide)
  .env                                ← secrets + identity (don't edit)
  ninja-talk                          ← the gateway binary
  start.sh                            ← what systemd runs
  sync-config.sh                      ← pull cloud-managed settings
  integrations/                       ← optional direct-API configs
  install-meta.json                   ← installer-recorded metadata

Re-running install.sh with a fresh token is safe: it refreshes the cloud-managed fields and your binary, but preserves your audio, context, and feature edits in config.json.

config.json - the main operator surface

Open ~/ninja-talk/config.json with nano on the server and you see something like this:

{
  "schemaVersion": 1,

  "callMode": "private",
  "ownerPhone": "+16478359044",
  "greetingPrompt": "Hi {name}, what can I do for you?",
  "voice": "Callirrhoe",
  "language": "auto",
  "sharePrivateVoiceWithMainDm": true,
  "mainDmSourceKey": null,
  "recordingEnabled": true,
  "maxCalls": 2,
  "callTimeoutMinutes": 60,
  "idleTimeoutSeconds": 120,

  "audio":    { /* outbound pipeline tuning, VAD, watchdogs */ },
  "context":  { /* which workspace files load into the prompt */ },
  "features": { /* gateway-side capability toggles */ }
}

Three groups of fields, three different rules:

  • Top-level flat fields (callMode, voice, greetingPrompt, …)
    Cloud-managed. Edit them in the voice portal - running ~/ninja-talk/sync-config.sh overwrites these from the dashboard. Hand edits on the box survive a single restart but get overwritten on the next sync.
  • The three nested sections (audio, context, features)
    Local only. The sync script never touches them; cloud changes never overwrite them. Edit them directly on the server and they survive everything except a hand rm config.json.
  • schemaVersion
    Don't change. The installer bumps this when the schema evolves so old configs migrate cleanly.

audio - outbound pipeline tuning

Eleven knobs that control how the gateway prepares the model's voice for the phone-line codec. Defaults are tuned for a typical realtime voice model + carrier + a typical PSTN path - change them per-deployment if a particular voice or carrier line needs adjustment.

The knobs you'll actually touch

outboundGainTrimDb · -12 to 0 dB · default -3

Pre-attenuation applied to outbound audio before the fade chain. The default lands peaks in telephony's sweet spot (-5 to -7 dBFS on G.711 μ-law). Make it more negative (e.g. -5) if a particular voice plus carrier line still sounds harsh; raise toward 0 only if your voice ends up too quiet.

vadSilenceMs · 0-5000 ms · default 200

How long the model's server VAD waits for silence before deciding the caller stopped speaking. Each ms here lands directly on the perceived gap between “I stopped talking” and “the bot answers.” Lower = snappier but risks interrupting; higher = safer but adds latency. Don't go below ~150 unless you're sure your callers don't pause mid-sentence.

openingSilencePadMs · 0-2000 ms · default 500

Silence prepended to the greeting flush so the carrier's bidirectional codec has time to stabilize before audio crosses the transcode boundary. Without enough padding, you hear popping or breakups in the first 1-2 seconds of every call. 500 ms is tested clean across the carriers we've seen and stays well under any human-perceptible greeting latency. Bump higher (800-1000 ms) if a particular carrier still produces opening pops; set to 0 only as a kill-switch for diagnostic comparison.

responseWatchdogMs · 1000-60000 ms · default 8000

How long the gateway waits for the model to start replying before nudging it with a text prompt. If your model is unusually slow (e.g. tool-call heavy), bump this. Out-of-range values fall back to the default - you can't accidentally set a watchdog that fires instantly.

temperature · 0.0-2.0 · default 0.4

Model generation temperature. 0.4 is tuned for phone customer-service work: deterministic enough to stay on-prompt, warm enough to sound conversational. Lower for crisper, more rule-following speech; higher for more variety (rarely useful on a phone call).

outboundEchoGraceMs · 0-1500 ms · default 500

Half-duplex echo-gating window. While the bot is speaking and for this many ms after, the gateway sends silence (not real audio) into the model's input - stops the model's VAD from reacting to the bot's own PSTN echo. Trade-off: callers can't barge-in mid-sentence. Set to 0 only if your voice/line combo doesn't need it (test first).

Advanced (rarely changed)

The fade chain, large-step adaptive smoother, and audio mode are stable defaults from field-tuning across many calls. Touch only if audio analysis shows a specific defect:

KnobRangeDefault · purpose
outboundTurnFadeMs0-5015 · linear fade-in on the first sample of every turn
outboundChunkSmoothMs0-202 · cross-fade at every chunk boundary within a turn
outboundLargeStepThresholdSamples0-327673000 · sample-step size that triggers the wider smoother
outboundLargeStepSmoothMs0-5015 · widened smoother window above the threshold
outboundGapRecoveryFadeEnabledtrue/falsetrue · re-apply turn fade after low-energy filler chunks
audioModedownsample | passthroughdownsample · how 24 kHz model audio reaches the carrier's 8 kHz μ-law line

Out-of-range or non-numeric values fall back to the default - bad values can't kill calls.

context - which workspace files load into every call

The voice agent inherits the same identity and memory as your OpenClaw agent by reading four files from ~/.openclaw/workspace/ at the start of every call:

  • SOUL.md - voice and personality
  • IDENTITY.md - name, role, who the agent is
  • USER.md - the owner's profile (private mode only)
  • MEMORY.md - long-term memory (contacts, preferences, learned facts)

By default all four load when present. The context.files field lets you include or exclude any of them per-deployment - useful for measuring prompt-size impact on first-token latency, or for trimming a giant MEMORY.md out of the prompt on a low-context test bot.

Default (everything loads)

"context": {
  "files": ["SOUL.md", "IDENTITY.md", "USER.md", "MEMORY.md"]
}

Drop MEMORY.md to trim the prompt

"context": {
  "files": ["SOUL.md", "IDENTITY.md", "USER.md"]
}

On a typical bot this drops 8-10 KB from the system instruction (roughly 2,000-2,500 tokens), which usually saves 100-300 ms on first-token latency.

The whitelist is filename-only against this fixed list of four. Anything else in the array is silently ignored - there is no path traversal vector.

features - gateway capability toggles

On/off flags that control which capabilities the gateway offers per-call. All default to a safe, fully-functional baseline.

FieldDefaultWhat it does
crmCapturetrueSingle CRM on/off toggle. True: at call start, look up the caller by phone in your CRM and inject their profile (name, company, history) into the voice session for personalized greetings; auto-save new caller info via the save_caller_info tool when they introduce themselves. False: skip both the lookup AND the save (no python subprocess at call start, no transcript-driven extraction). Recommended false for public-mode info-only bots where random callers will not be in your CRM.
kbSearchtrueRegister the kb_search tool for callers to query your knowledge base
kbVectortrueUse the vector-search backend for KB queries (versus FTS-only). Set false if vector indexing is broken on your deployment.
openclawQuerytrueAllow the bot to query your OpenClaw agent during calls (lookups, actions, complex questions). Set false for outbound-only or info-only bots that should rely only on the call data you provide via the API. You can also toggle individual tools via voice-tools.json (see the Voice Tools Guide).
openclawStreamfalseUse Server-Sent Events on openclaw_query for incremental responses. Off by default - opt in per-deployment to measure latency impact. Has no effect when openclawQuery is false.
requireCloudPromptstrueRefuse calls when running on the bundled-stub prompt set. Keep true in production - security guard against misconfigured boots.
aniRateLimitPerHour20Max calls accepted per phone number per hour (toll-fraud guard)
aniRateLimitWindowMs3600000Rolling-window length in ms (default 1 hour)

Apply changes - edit, then restart

config.json is read at service start. To apply an edit, restart the service:

# 1. SSH to your server, then edit:
sudo -u <openclaw-user> nano ~/ninja-talk/config.json

# 2. Restart (user-scope service - most common):
sudo -u <openclaw-user> systemctl --user restart ninja-talk

# 3. Verify the new value reached the gateway:
sudo journalctl --user -u ninja-talk -n 20 --no-pager | grep -E "GAIN_TRIM|VAD_SILENCE|context"

If your service runs at the system level, replace step 2 with:

sudo systemctl restart ninja-talk

Sanity-check after a change

The gateway logs the effective values on startup and the assembled system instruction at the start of every call. After a restart, the next call will print:

[ninja-talk:context] System instruction built {
  mode: "private",
  totalBytes: 22628,           ← drops if you removed MEMORY.md from context.files
  agentName: "Donna",
  hasMemory: true,             ← false when MEMORY.md is excluded
  hasPlaybook: true,
  truncated: false,
  stubMode: false
}

No restart required for changes to SOUL.md, IDENTITY.md, USER.md, MEMORY.md, or the playbook - those files are read fresh on every call.

Cloud sync - what comes from the portal

Running ~/ninja-talk/sync-config.sh (or ninja-talk sync from the CLI) pulls the latest cloud-managed fields from the voice portal and merges them into your config.json. The sync explicitly preserves your local audio, context, and features sections - they are never touched.

Fields the sync overwrites

  • callMode, ownerPhone, greetingPrompt, voice, language
  • recordingEnabled, maxCalls, callTimeoutMinutes, idleTimeoutSeconds
  • sharePrivateVoiceWithMainDm, mainDmSourceKey

Fields the sync leaves alone

  • audio.* - every audio knob
  • context.files
  • features.* - every feature toggle
  • schemaVersion

The sync writes config.json.bak alongside the new file every time, so a bad cloud value is one cp config.json.bak config.json away from being undone.

.env - secrets and identity (don't edit)

~/ninja-talk/.env holds the things you should never edit by hand: deployment credentials, the API base URL, the Cloudflare tunnel token reference, and your installer-bound machine fingerprint. The installer regenerates this file on every re-install; nothing operator-tunable lives here.

If you suspect a credential is wrong, the fix is always “re-run install.sh with a fresh token” - never hand-edit .env.

Quick reference: where to change what

What you want to changeWhere
Greeting, voice, language, owner phone, call modeVoice portal → Settings (then run sync)
Recording on/off, call/idle timeouts, max concurrent callsVoice portal → Settings
Audio gain, VAD, watchdogs, fade chainconfig.jsonaudio.*, then restart
Which workspace files load into the promptconfig.jsoncontext.files, then restart
CRM auto-capture, KB search, ANI rate limitconfig.jsonfeatures.*, then restart
What public callers can doVoice playbook → ## Public Mode section. See the Modes guide.
Email, calendar, custom toolsConfigure in OpenClaw - voice gets them automatically. See the Tools guide.
Sub-second direct third-party API access (Wix Bookings, Calendly, Stripe, etc.)Install a voicebridge — see the Tools guide.
Persona content the agent uses on every call~/.openclaw/workspace/SOUL.md, IDENTITY.md, USER.md, MEMORY.md

Troubleshooting

I edited config.json but nothing changed

  • Did you restart the service? config.json is read at service start, not per-call.
  • Check that JSON parses: jq . ~/ninja-talk/config.json. A malformed file will fall through to compiled defaults silently.
  • Confirm the new value reached the process: journalctl --user -u ninja-talk -n 30 --no-pager.

My audio tuning got wiped after a sync

  • Restore from the automatic backup: cp ~/ninja-talk/config.json.bak ~/ninja-talk/config.json, then restart.
  • On schemaVersion: 1 and later, the sync explicitly preserves audio, context, and features. If your tuning was lost, you're likely on an older install - re-run install.sh with a fresh token to upgrade. Your edits are preserved automatically through the upgrade.

The agent boots but seems to ignore my config values

  • Check for stray env-var overrides: systemctl --user show-environment. Any NINJA_TALK_* set there wins over config.json by design (the emergency-override layer).
  • Range-checked numeric values silently snap back to the default if out of range. Confirm yours fit the documented ranges in the audio section above.

I want to A/B test a new audio setting

  • Make a copy of config.json first: cp ~/ninja-talk/config.json ~/ninja-talk/config.json.preA.
  • Edit one field at a time, restart, place a call, listen. Bigger sample sizes for subtle changes - codec interactions can mask small effects.
  • Roll back with one command: cp config.json.preA config.json then restart.

Companion guides

  • Private & Public Mode - how the agent decides who's allowed to do what on a call.
  • Tools & Skills - what the agent can do (email, calendar, direct API integrations) and how to add capabilities.

Questions? Reach out at hello@talktomyagent.io