Skip to main content

AI Agent RAG Configuration Guide

Configure knowledge bases to give AI agents context-aware capabilities.

Overview

RAG (Retrieval-Augmented Generation) enables AI agents to answer questions using your documentation, policies, and domain knowledge rather than just their training data.

How RAG Works

User Query → Embedding → Vector Search → Relevant Docs → LLM + Context → Response
  1. User asks a question
  2. Query converted to embedding vector
  3. Similar documents retrieved from vector store
  4. Relevant context passed to LLM
  5. LLM generates answer using context

AI Agent Knowledge Bases

Production Knowledge Bases

Knowledge BaseIndex NameModelUse Case
Support KBsupport-kbT1-T3Customer support (Maximus)
Sales KBsales-kbT2-T4Sales questions (Minerva)
Menu KBmenu-kbT1-T2Menu ordering (Menu AI)
Voice KBvoice-kbT2-T3Voice commands (Hey Maximus)
Dev KBdev-kbT3-T5Developer questions (Dev Agent)
Docs KBdocs-embeddingsT1-T2Public documentation

Model Tier Mapping

TierModelCostBest For
T1Llama 4 Scout (Workers AI)FREESimple lookups, FAQ
T2Gemini 2.0 Flash$0.10/MFast responses, real-time
T3Gemini 3 Flash$0.50/MComplex queries
T4Claude Haiku 4.5$1.00/MNuanced understanding
T5Claude Sonnet 4.5$3.00/MAnalysis, recommendations

Document Indexing

Supported Document Types

TypeFormatProcessing
Markdown.mdParse frontmatter + content
HTML.htmlExtract text, preserve structure
PDF.pdfOCR + text extraction
Plain Text.txtDirect processing
JSON.jsonStructured extraction

Indexing Pipeline

// Example indexing configuration
const indexConfig = {
source: 'documentation/',
index: 'support-kb',
embedding_model: '@cf/baai/bge-base-en-v1.5',
chunk_size: 512,
chunk_overlap: 50,
metadata_fields: [
'rag_type',
'rag_product',
'rag_module',
'rag_keywords',
'rag_audience'
]
};

Chunking Strategy

info

Chunk size directly impacts retrieval quality. Smaller chunks improve precision for short answers (FAQs), while larger chunks preserve context for procedural content (runbooks). Choose the appropriate size based on your content type.

Documents are split into chunks for optimal retrieval:

Content TypeChunk SizeOverlapRationale
User Guides512 tokens50Balanced detail
API Docs256 tokens25Code snippets
FAQs128 tokens10Short answers
Runbooks768 tokens100Step sequences

RAG Frontmatter Schema

Required Fields

Every document should include RAG metadata:

---
rag_type: user_guide # Document type
rag_product: restaurant_revolution # Product area
rag_module: staff_payments # Feature module
rag_keywords: [payments, tips] # Search keywords
rag_audience: [staff, manager] # Target audience
---

Field Definitions

FieldTypeValues
rag_typestringuser_guide, technical_guide, api_reference, troubleshooting, internal_guide
rag_productstringrestaurant_revolution, olympus_cloud, creators_revolution, nebusai_internal
rag_modulestringFeature-specific (e.g., staff_payments, manager_menu)
rag_keywordsarrayRelevant search terms
rag_audiencearrayTarget users (e.g., staff, manager, admin, developer)

Vector Store Configuration

Cloudflare Vectorize Setup

// Vectorize index creation
const index = await env.VECTORIZE.create('support-kb', {
dimensions: 768, // BGE base model
metric: 'cosine'
});

// Insert embeddings
await index.insert([{
id: 'doc-123',
values: embedding,
metadata: {
title: 'Payment Processing Guide',
rag_type: 'user_guide',
rag_product: 'restaurant_revolution',
rag_module: 'staff_payments',
url: '/staff/payments/processing'
}
}]);

Query Configuration

// RAG query with filtering
const results = await index.query(queryEmbedding, {
topK: 5,
filter: {
rag_product: 'restaurant_revolution',
rag_audience: { $contains: 'staff' }
},
returnValues: false,
returnMetadata: true
});

Agent-Specific Configuration

Maximus (Customer Support)

agent: maximus
knowledge_base: support-kb
model_routing:
simple_faq: T1
general_support: T2
complex_issues: T3
context_window: 4000
max_chunks: 5
filters:
rag_type: [user_guide, troubleshooting]
rag_audience: [customer, staff]
system_prompt: |
You are Maximus, a helpful support assistant for Restaurant Revolution.
Use the provided documentation to answer questions accurately.
If unsure, offer to connect with human support.

Minerva (Sales AI)

agent: minerva
knowledge_base: sales-kb
model_routing:
product_info: T2
competitive: T3
pricing: T4
context_window: 8000
max_chunks: 8
filters:
rag_type: [sales_guide, product_overview]
rag_product: [restaurant_revolution, olympus_cloud]
system_prompt: |
You are Minerva, NebusAI's sales intelligence assistant.
Help answer product questions and provide competitive insights.
Focus on value and differentiation.
agent: menu_ai
knowledge_base: menu-kb
model_routing:
menu_lookup: T1
recommendations: T2
context_window: 2000
max_chunks: 3
dynamic_content: true # Loads tenant-specific menu
filters:
rag_type: [menu_content, allergen_info]
system_prompt: |
You help customers browse the menu and answer food questions.
Be accurate about allergens and ingredients.
Suggest items based on preferences.

Dev Agent (Engineering)

agent: dev_agent
knowledge_base: dev-kb
model_routing:
api_lookup: T3
architecture: T4
debugging: T5
context_window: 16000
max_chunks: 10
filters:
rag_type: [api_reference, technical_guide]
rag_audience: [developer, engineering]
system_prompt: |
You are a development assistant for Olympus Cloud platform.
Provide accurate technical information from documentation.
Include code examples when relevant.

Embedding Models

Supported Models

ModelDimensionsSpeedQuality
BGE Base768FastGood
BGE Large1024MediumBetter
Cohere Embed1024MediumBetter
OpenAI Ada1536MediumBest

Model Selection

// Use BGE for cost-effective, fast embeddings
const embedding = await ai.run(
'@cf/baai/bge-base-en-v1.5',
{ text: documentContent }
);

// Use larger model for critical queries
const embedding = await ai.run(
'@cf/baai/bge-large-en-v1.5',
{ text: queryText }
);

Query Optimization

Semantic Search Tuning

const searchConfig = {
// Relevance threshold (0-1)
minScore: 0.7,

// Number of results
topK: 5,

// Diversity (avoid too-similar results)
diversityBias: 0.3,

// Metadata boosting
boosts: {
'rag_type:troubleshooting': 1.2, // Boost troubleshooting
'rag_audience:customer': 1.1 // Boost customer content
}
};

Query Expansion

// Expand query for better recall
async function expandQuery(query: string): Promise<string[]> {
// Generate synonyms and related terms
const expanded = await ai.run('@cf/meta/llama-3.1-8b-instruct', {
prompt: `Generate 3 alternative phrasings for: "${query}"`,
max_tokens: 100
});

return [query, ...parseAlternatives(expanded)];
}

Combine semantic and keyword search:

const results = await hybridSearch({
query: userQuery,
semantic: {
index: 'support-kb',
weight: 0.7
},
keyword: {
fields: ['title', 'rag_keywords'],
weight: 0.3
},
topK: 5
});

Monitoring & Analytics

Track RAG Performance

// Log RAG query metrics
await analytics.log({
event: 'rag_query',
agent: 'maximus',
query_length: query.length,
results_count: results.length,
top_score: results[0]?.score,
model_tier: selectedTier,
latency_ms: endTime - startTime
});

Key Metrics

MetricTargetAlert Threshold
Query Latencyunder 200msover 500ms
Top Result Scoreabove 0.75below 0.5
Results Returned3-50
User Satisfactionabove 4.0/5below 3.5/5

Updating Knowledge Bases

Automatic Sync

Documentation changes trigger reindexing:

# .github/workflows/docs-index.yml
on:
push:
paths:
- 'documentation/**/*.md'

jobs:
reindex:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Index Documentation
run: |
npm run index-docs
env:
VECTORIZE_API_TOKEN: ${{ secrets.VECTORIZE_TOKEN }}

Manual Reindex

# Reindex specific knowledge base
npm run reindex -- --kb=support-kb

# Reindex all knowledge bases
npm run reindex -- --all

# Preview changes without indexing
npm run reindex -- --kb=support-kb --dry-run

Best Practices

Document Quality

tip

Documents with complete RAG frontmatter (rag_type, rag_product, rag_module, rag_keywords, rag_audience) are significantly more discoverable. Missing metadata means your content may never surface in agent responses.

  1. Clear titles - Descriptive, searchable
  2. Proper frontmatter - Complete RAG metadata
  3. Logical structure - Headers, sections
  4. Actionable content - Steps, examples
  5. Regular updates - Keep current

Query Handling

  1. Fallback gracefully - Handle no-results
  2. Cite sources - Link to docs
  3. Know limits - Escalate when unsure
  4. Learn from feedback - Improve over time

Performance

  1. Right-size chunks - Balance detail vs noise
  2. Filter appropriately - Use metadata
  3. Cache common queries - Reduce latency
  4. Monitor costs - Track token usage