AI Agent RAG Configuration Guide
Configure knowledge bases to give AI agents context-aware capabilities.
Overview
RAG (Retrieval-Augmented Generation) enables AI agents to answer questions using your documentation, policies, and domain knowledge rather than just their training data.
How RAG Works
User Query → Embedding → Vector Search → Relevant Docs → LLM + Context → Response
- User asks a question
- Query converted to embedding vector
- Similar documents retrieved from vector store
- Relevant context passed to LLM
- LLM generates answer using context
AI Agent Knowledge Bases
Production Knowledge Bases
| Knowledge Base | Index Name | Model | Use Case |
|---|---|---|---|
| Support KB | support-kb | T1-T3 | Customer support (Maximus) |
| Sales KB | sales-kb | T2-T4 | Sales questions (Minerva) |
| Menu KB | menu-kb | T1-T2 | Menu ordering (Menu AI) |
| Voice KB | voice-kb | T2-T3 | Voice commands (Hey Maximus) |
| Dev KB | dev-kb | T3-T5 | Developer questions (Dev Agent) |
| Docs KB | docs-embeddings | T1-T2 | Public documentation |
Model Tier Mapping
| Tier | Model | Cost | Best For |
|---|---|---|---|
| T1 | Llama 4 Scout (Workers AI) | FREE | Simple lookups, FAQ |
| T2 | Gemini 2.0 Flash | $0.10/M | Fast responses, real-time |
| T3 | Gemini 3 Flash | $0.50/M | Complex queries |
| T4 | Claude Haiku 4.5 | $1.00/M | Nuanced understanding |
| T5 | Claude Sonnet 4.5 | $3.00/M | Analysis, recommendations |
Document Indexing
Supported Document Types
| Type | Format | Processing |
|---|---|---|
| Markdown | .md | Parse frontmatter + content |
| HTML | .html | Extract text, preserve structure |
| OCR + text extraction | ||
| Plain Text | .txt | Direct processing |
| JSON | .json | Structured extraction |
Indexing Pipeline
// Example indexing configuration
const indexConfig = {
source: 'documentation/',
index: 'support-kb',
embedding_model: '@cf/baai/bge-base-en-v1.5',
chunk_size: 512,
chunk_overlap: 50,
metadata_fields: [
'rag_type',
'rag_product',
'rag_module',
'rag_keywords',
'rag_audience'
]
};
Chunking Strategy
Chunk size directly impacts retrieval quality. Smaller chunks improve precision for short answers (FAQs), while larger chunks preserve context for procedural content (runbooks). Choose the appropriate size based on your content type.
Documents are split into chunks for optimal retrieval:
| Content Type | Chunk Size | Overlap | Rationale |
|---|---|---|---|
| User Guides | 512 tokens | 50 | Balanced detail |
| API Docs | 256 tokens | 25 | Code snippets |
| FAQs | 128 tokens | 10 | Short answers |
| Runbooks | 768 tokens | 100 | Step sequences |
RAG Frontmatter Schema
Required Fields
Every document should include RAG metadata:
---
rag_type: user_guide # Document type
rag_product: restaurant_revolution # Product area
rag_module: staff_payments # Feature module
rag_keywords: [payments, tips] # Search keywords
rag_audience: [staff, manager] # Target audience
---
Field Definitions
| Field | Type | Values |
|---|---|---|
| rag_type | string | user_guide, technical_guide, api_reference, troubleshooting, internal_guide |
| rag_product | string | restaurant_revolution, olympus_cloud, creators_revolution, nebusai_internal |
| rag_module | string | Feature-specific (e.g., staff_payments, manager_menu) |
| rag_keywords | array | Relevant search terms |
| rag_audience | array | Target users (e.g., staff, manager, admin, developer) |
Vector Store Configuration
Cloudflare Vectorize Setup
// Vectorize index creation
const index = await env.VECTORIZE.create('support-kb', {
dimensions: 768, // BGE base model
metric: 'cosine'
});
// Insert embeddings
await index.insert([{
id: 'doc-123',
values: embedding,
metadata: {
title: 'Payment Processing Guide',
rag_type: 'user_guide',
rag_product: 'restaurant_revolution',
rag_module: 'staff_payments',
url: '/staff/payments/processing'
}
}]);
Query Configuration
// RAG query with filtering
const results = await index.query(queryEmbedding, {
topK: 5,
filter: {
rag_product: 'restaurant_revolution',
rag_audience: { $contains: 'staff' }
},
returnValues: false,
returnMetadata: true
});
Agent-Specific Configuration
Maximus (Customer Support)
agent: maximus
knowledge_base: support-kb
model_routing:
simple_faq: T1
general_support: T2
complex_issues: T3
context_window: 4000
max_chunks: 5
filters:
rag_type: [user_guide, troubleshooting]
rag_audience: [customer, staff]
system_prompt: |
You are Maximus, a helpful support assistant for Restaurant Revolution.
Use the provided documentation to answer questions accurately.
If unsure, offer to connect with human support.
Minerva (Sales AI)
agent: minerva
knowledge_base: sales-kb
model_routing:
product_info: T2
competitive: T3
pricing: T4
context_window: 8000
max_chunks: 8
filters:
rag_type: [sales_guide, product_overview]
rag_product: [restaurant_revolution, olympus_cloud]
system_prompt: |
You are Minerva, NebusAI's sales intelligence assistant.
Help answer product questions and provide competitive insights.
Focus on value and differentiation.
Menu AI (Ordering)
agent: menu_ai
knowledge_base: menu-kb
model_routing:
menu_lookup: T1
recommendations: T2
context_window: 2000
max_chunks: 3
dynamic_content: true # Loads tenant-specific menu
filters:
rag_type: [menu_content, allergen_info]
system_prompt: |
You help customers browse the menu and answer food questions.
Be accurate about allergens and ingredients.
Suggest items based on preferences.
Dev Agent (Engineering)
agent: dev_agent
knowledge_base: dev-kb
model_routing:
api_lookup: T3
architecture: T4
debugging: T5
context_window: 16000
max_chunks: 10
filters:
rag_type: [api_reference, technical_guide]
rag_audience: [developer, engineering]
system_prompt: |
You are a development assistant for Olympus Cloud platform.
Provide accurate technical information from documentation.
Include code examples when relevant.
Embedding Models
Supported Models
| Model | Dimensions | Speed | Quality |
|---|---|---|---|
| BGE Base | 768 | Fast | Good |
| BGE Large | 1024 | Medium | Better |
| Cohere Embed | 1024 | Medium | Better |
| OpenAI Ada | 1536 | Medium | Best |
Model Selection
// Use BGE for cost-effective, fast embeddings
const embedding = await ai.run(
'@cf/baai/bge-base-en-v1.5',
{ text: documentContent }
);
// Use larger model for critical queries
const embedding = await ai.run(
'@cf/baai/bge-large-en-v1.5',
{ text: queryText }
);
Query Optimization
Semantic Search Tuning
const searchConfig = {
// Relevance threshold (0-1)
minScore: 0.7,
// Number of results
topK: 5,
// Diversity (avoid too-similar results)
diversityBias: 0.3,
// Metadata boosting
boosts: {
'rag_type:troubleshooting': 1.2, // Boost troubleshooting
'rag_audience:customer': 1.1 // Boost customer content
}
};
Query Expansion
// Expand query for better recall
async function expandQuery(query: string): Promise<string[]> {
// Generate synonyms and related terms
const expanded = await ai.run('@cf/meta/llama-3.1-8b-instruct', {
prompt: `Generate 3 alternative phrasings for: "${query}"`,
max_tokens: 100
});
return [query, ...parseAlternatives(expanded)];
}
Hybrid Search
Combine semantic and keyword search:
const results = await hybridSearch({
query: userQuery,
semantic: {
index: 'support-kb',
weight: 0.7
},
keyword: {
fields: ['title', 'rag_keywords'],
weight: 0.3
},
topK: 5
});
Monitoring & Analytics
Track RAG Performance
// Log RAG query metrics
await analytics.log({
event: 'rag_query',
agent: 'maximus',
query_length: query.length,
results_count: results.length,
top_score: results[0]?.score,
model_tier: selectedTier,
latency_ms: endTime - startTime
});
Key Metrics
| Metric | Target | Alert Threshold |
|---|---|---|
| Query Latency | under 200ms | over 500ms |
| Top Result Score | above 0.75 | below 0.5 |
| Results Returned | 3-5 | 0 |
| User Satisfaction | above 4.0/5 | below 3.5/5 |
Updating Knowledge Bases
Automatic Sync
Documentation changes trigger reindexing:
# .github/workflows/docs-index.yml
on:
push:
paths:
- 'documentation/**/*.md'
jobs:
reindex:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Index Documentation
run: |
npm run index-docs
env:
VECTORIZE_API_TOKEN: ${{ secrets.VECTORIZE_TOKEN }}
Manual Reindex
# Reindex specific knowledge base
npm run reindex -- --kb=support-kb
# Reindex all knowledge bases
npm run reindex -- --all
# Preview changes without indexing
npm run reindex -- --kb=support-kb --dry-run
Best Practices
Document Quality
Documents with complete RAG frontmatter (rag_type, rag_product, rag_module, rag_keywords, rag_audience) are significantly more discoverable. Missing metadata means your content may never surface in agent responses.
- Clear titles - Descriptive, searchable
- Proper frontmatter - Complete RAG metadata
- Logical structure - Headers, sections
- Actionable content - Steps, examples
- Regular updates - Keep current
Query Handling
- Fallback gracefully - Handle no-results
- Cite sources - Link to docs
- Know limits - Escalate when unsure
- Learn from feedback - Improve over time
Performance
- Right-size chunks - Balance detail vs noise
- Filter appropriately - Use metadata
- Cache common queries - Reduce latency
- Monitor costs - Track token usage