Knowledge Base Setup

Your voice agent is smart out of the box, but it doesn't know your business. The knowledge base lets you feed it your pricing, policies, FAQs, product details, and team info so it answers from your data instead of generic AI knowledge.

Why this matters

Without a knowledge base, your agent will try to answer caller questions from its general training data. That means it might guess at your business hours, invent prices, or say “I'm not sure” when it should have an instant answer.

With a knowledge base, the same caller hears: “We're open Monday through Friday, 9 AM to 6 PM. A standard consultation is $75, and the first one is free.” Confident, accurate, and immediate.

Two ways to add knowledge

The voice gateway supports two approaches, and you can use both at the same time:

1. Static context sources

Always loaded into the agent's memory at the start of every call. Best for core info the agent should always have on hand: hours, pricing, policies.

2. Searchable KB (kb_search)

Searched on demand when a caller asks something specific. Best for larger document sets: product catalogs, detailed FAQs, technical specs.

Think of static sources as the info pinned to the front of a binder, and kb_search as the full filing cabinet the agent can dig through when needed.

Setting up static context sources

Static sources are markdown files that get loaded directly into the agent's prompt at the start of every call. The agent can answer questions from this content instantly, with no tool call and no delay.

Step 1: Create your data files

Add markdown files to the data/ folder in your workspace:

~/.openclaw/workspace/
  data/
    pricing.md          # Services and pricing
    hours-and-faq.md    # Business hours + common questions
    team.md             # Staff names and roles
    policies.md         # Return policy, cancellation rules, etc.

Step 2: Write clear, structured content

Here is an example data/pricing.md:

# Pricing

## Consultations
- Initial consultation: FREE (30 minutes)
- Standard consultation: $75/hour
- Extended session: $120/90 minutes

## Packages
- Starter (5 sessions): $325 (save $50)
- Professional (10 sessions): $600 (save $150)

## Payment
- We accept Visa, Mastercard, and bank transfer
- Payment due at time of booking
- 24-hour cancellation policy (full refund)

Step 3: Configure voice-context-sources.json

Create (or edit) ~/.openclaw/workspace/data/voice-context-sources.json:

{
  "version": 1,
  "static": [
    {
      "label": "Pricing & Services",
      "path": "data/pricing.md",
      "maxChars": 3000,
      "mode": "both"
    },
    {
      "label": "Business Hours & FAQ",
      "path": "data/hours-and-faq.md",
      "maxChars": 2000,
      "mode": "public"
    },
    {
      "label": "Team Directory",
      "path": "data/team.md",
      "maxChars": 1500,
      "mode": "private"
    }
  ],
  "totalBudget": 8000
}

Field reference

label - a human-readable name (shown in logs)
path - relative to the workspace directory
maxChars - character limit for this source (default: 3000). Keeps large files from blowing the context budget.
mode - who gets this context:
- "both" - loaded for every call (owner and public)
- "private" - loaded only when the owner calls
- "public" - loaded only for public callers
totalBudget - total character budget across all sources (default: 8000). Sources are loaded in order until the budget runs out.

Budget tip: 8000 characters is roughly 3-4 pages of text. Put your most important sources first - if the budget runs out, later sources are skipped. For most businesses, pricing + hours + FAQ fits comfortably.

Setting up the searchable KB (kb_search)

For larger document sets that won't fit in the static context budget, the kb_search tool lets the agent search on demand. When a caller asks a detailed question, the agent runs a search query against your documents and reads back the best match.

How it works under the hood

The gateway searches OpenClaw's memory index using a hybrid approach:

Keyword matching (FTS5) - instant, finds exact terms. If you wrote “30-day return policy” and the caller says “return policy,” it matches.
Semantic vector search - understands meaning. If the caller says “Can I get my money back?” it finds the return policy even without the exact words.

Results combine both approaches (70% semantic, 30% keyword) and return the top 5 most relevant chunks.

Where to put your documents

Add markdown files to the memory/kb/ directory in your OpenClaw workspace:

~/.openclaw/workspace/
  memory/
    kb/
      product-catalog.md
      shipping-info.md
      warranty-details.md
      installation-guide.md

OpenClaw indexes these files automatically. After adding or updating files, the index rebuilds on the next scan cycle (typically within a few minutes).

Performance

Keyword-only search: under 1 millisecond
Hybrid search (keyword + semantic): roughly 2 seconds
On instances without the embedding model, falls back to keyword-only automatically

Both modes are available in private and public calls. The agent decides whether to search based on the caller's question - you don't need to configure anything beyond placing the files.

What to put in your knowledge base

Think about every question a caller might ask that your receptionist, salesperson, or support team handles today. Here are the most common categories:

Category	Example content	Recommended source
Business hours	Mon-Fri 9-6, Sat 10-2, closed Sunday	Static (both)
Pricing	Service menu, packages, discounts	Static (both)
FAQs	Top 10-20 questions you get every week	Static (both)
Policies	Cancellation, refunds, late fees	Static (public)
Team directory	Staff names, roles, availability	Static (private)
Product catalog	Full inventory, specs, compatibility	Searchable KB
How-to guides	Setup instructions, troubleshooting steps	Searchable KB
Location / directions	Address, parking, landmarks	Static (both)

Rule of thumb: if a caller asks it at least once a week, it belongs in static sources. If it comes up occasionally or covers a large set of items, put it in the searchable KB.

Writing effective knowledge base content

The quality of your KB content directly affects how well your agent answers. A few guidelines that make a real difference:

One topic per file. Don't put pricing, hours, and policies in one giant document. Split them. The agent retrieves better when each file has a clear focus.
Use clear headings. Markdown headings (##) help the search engine identify what each section covers. “## Return Policy” is much better than an unlabeled paragraph.
Bullet points over paragraphs. The agent reads content aloud. Short, factual bullet points translate to natural speech better than dense prose.
Write as if explaining to a new employee. Skip jargon. Be explicit about things that seem obvious (“The office is on the 3rd floor, turn left from the elevator”).
Include the question, not just the answer. Instead of just “$75/hour,” write “Standard consultation rate: $75/hour.” This helps the search engine match caller questions to answers.
Keep it current. Outdated pricing or hours will erode caller trust quickly. Update your KB files when your business changes.

Testing your knowledge base

After setting up your KB content, verify it works end to end:

Call your agent and ask a question that can only be answered from your KB: “What are your hours?” or “How much is a consultation?”
Listen for specifics. The agent should respond with your actual data, not a generic “I'd be happy to help you find that information.”
Try edge cases. Ask about something not in your KB to confirm the agent handles gaps gracefully instead of fabricating an answer.
Test both modes. If you have mode: "private" content, call from a non-owner number and confirm it's not exposed to public callers.

Checking the logs

# See which static sources were loaded for the last call:
journalctl --user -u ninja-talk -n 200 --no-pager | grep "context"

# Confirm kb_search ran and returned results:
journalctl --user -u ninja-talk -n 200 --no-pager | grep "kb_search"

Healthy output looks like: static sources loaded with their labels, and kb_search returning results with a duration under 2 seconds.

Quick start checklist

Create data/pricing.md and data/hours-and-faq.md in your workspace
Add a voice-context-sources.json that points to them with mode: "both"
For larger docs, drop markdown files into memory/kb/ and let OpenClaw index them
Call your agent and ask a KB-specific question to confirm it works

Related guides

Voice Agent Tools and Skills - all available tools including kb_search
Playbook Guide - control what your agent says and does
Custom Tools Reference - add your own HTTP tools

Need help setting up your knowledge base? hello@talktomyagent.io