AI Agent RAG Configuration Guide

Configure knowledge bases to give AI agents context-aware capabilities.

Overview

RAG (Retrieval-Augmented Generation) enables AI agents to answer questions using your documentation, policies, and domain knowledge rather than just their training data.

How RAG Works

User Query → Embedding → Vector Search → Relevant Docs → LLM + Context → Response

User asks a question
Query converted to embedding vector
Similar documents retrieved from vector store
Relevant context passed to LLM
LLM generates answer using context

AI Agent Knowledge Bases

Production Knowledge Bases

Knowledge Base	Index Name	Model	Use Case
Support KB	`support-kb`	T1-T3	Customer support (Maximus)
Sales KB	`sales-kb`	T2-T4	Sales questions (Minerva)
Menu KB	`menu-kb`	T1-T2	Menu ordering (Menu AI)
Voice KB	`voice-kb`	T2-T3	Voice commands (Hey Maximus)
Dev KB	`dev-kb`	T3-T5	Developer questions (Dev Agent)
Docs KB	`docs-embeddings`	T1-T2	Public documentation

Model Tier Mapping

Tier	Model	Cost	Best For
T1	Llama 4 Scout (Workers AI)	FREE	Simple lookups, FAQ
T2	Gemini 2.0 Flash	$0.10/M	Fast responses, real-time
T3	Gemini 3 Flash	$0.50/M	Complex queries
T4	Claude Haiku 4.5	$1.00/M	Nuanced understanding
T5	Claude Sonnet 4.5	$3.00/M	Analysis, recommendations

Document Indexing

Supported Document Types

Type	Format	Processing
Markdown	.md	Parse frontmatter + content
HTML	.html	Extract text, preserve structure
PDF	.pdf	OCR + text extraction
Plain Text	.txt	Direct processing
JSON	.json	Structured extraction

Indexing Pipeline

// Example indexing configuration
const indexConfig = {
  source: 'documentation/',
  index: 'support-kb',
  embedding_model: '@cf/baai/bge-base-en-v1.5',
  chunk_size: 512,
  chunk_overlap: 50,
  metadata_fields: [
    'rag_type',
    'rag_product',
    'rag_module',
    'rag_keywords',
    'rag_audience'
  ]
};

Chunking Strategy

info

Chunk size directly impacts retrieval quality. Smaller chunks improve precision for short answers (FAQs), while larger chunks preserve context for procedural content (runbooks). Choose the appropriate size based on your content type.

Documents are split into chunks for optimal retrieval:

Content Type	Chunk Size	Overlap	Rationale
User Guides	512 tokens	50	Balanced detail
API Docs	256 tokens	25	Code snippets
FAQs	128 tokens	10	Short answers
Runbooks	768 tokens	100	Step sequences

RAG Frontmatter Schema

Required Fields

Every document should include RAG metadata:

---
rag_type: user_guide           # Document type
rag_product: restaurant_revolution  # Product area
rag_module: staff_payments     # Feature module
rag_keywords: [payments, tips] # Search keywords
rag_audience: [staff, manager] # Target audience
---

Field Definitions

Field	Type	Values
rag_type	string	`user_guide`, `technical_guide`, `api_reference`, `troubleshooting`, `internal_guide`
rag_product	string	`restaurant_revolution`, `olympus_cloud`, `creators_revolution`, `nebusai_internal`
rag_module	string	Feature-specific (e.g., `staff_payments`, `manager_menu`)
rag_keywords	array	Relevant search terms
rag_audience	array	Target users (e.g., `staff`, `manager`, `admin`, `developer`)

Vector Store Configuration

Cloudflare Vectorize Setup

// Vectorize index creation
const index = await env.VECTORIZE.create('support-kb', {
  dimensions: 768,  // BGE base model
  metric: 'cosine'
});

// Insert embeddings
await index.insert([{
  id: 'doc-123',
  values: embedding,
  metadata: {
    title: 'Payment Processing Guide',
    rag_type: 'user_guide',
    rag_product: 'restaurant_revolution',
    rag_module: 'staff_payments',
    url: '/staff/payments/processing'
  }
}]);

Query Configuration

// RAG query with filtering
const results = await index.query(queryEmbedding, {
  topK: 5,
  filter: {
    rag_product: 'restaurant_revolution',
    rag_audience: { $contains: 'staff' }
  },
  returnValues: false,
  returnMetadata: true
});

Agent-Specific Configuration

Maximus (Customer Support)

agent: maximus
knowledge_base: support-kb
model_routing:
  simple_faq: T1
  general_support: T2
  complex_issues: T3
context_window: 4000
max_chunks: 5
filters:
  rag_type: [user_guide, troubleshooting]
  rag_audience: [customer, staff]
system_prompt: |
  You are Maximus, a helpful support assistant for Restaurant Revolution.
  Use the provided documentation to answer questions accurately.
  If unsure, offer to connect with human support.

Minerva (Sales AI)

agent: minerva
knowledge_base: sales-kb
model_routing:
  product_info: T2
  competitive: T3
  pricing: T4
context_window: 8000
max_chunks: 8
filters:
  rag_type: [sales_guide, product_overview]
  rag_product: [restaurant_revolution, olympus_cloud]
system_prompt: |
  You are Minerva, NebusAI's sales intelligence assistant.
  Help answer product questions and provide competitive insights.
  Focus on value and differentiation.

agent: menu_ai
knowledge_base: menu-kb
model_routing:
  menu_lookup: T1
  recommendations: T2
context_window: 2000
max_chunks: 3
dynamic_content: true  # Loads tenant-specific menu
filters:
  rag_type: [menu_content, allergen_info]
system_prompt: |
  You help customers browse the menu and answer food questions.
  Be accurate about allergens and ingredients.
  Suggest items based on preferences.

Dev Agent (Engineering)

agent: dev_agent
knowledge_base: dev-kb
model_routing:
  api_lookup: T3
  architecture: T4
  debugging: T5
context_window: 16000
max_chunks: 10
filters:
  rag_type: [api_reference, technical_guide]
  rag_audience: [developer, engineering]
system_prompt: |
  You are a development assistant for Olympus Cloud platform.
  Provide accurate technical information from documentation.
  Include code examples when relevant.

Embedding Models

Supported Models

Model	Dimensions	Speed	Quality
BGE Base	768	Fast	Good
BGE Large	1024	Medium	Better
Cohere Embed	1024	Medium	Better
OpenAI Ada	1536	Medium	Best

Model Selection

// Use BGE for cost-effective, fast embeddings
const embedding = await ai.run(
  '@cf/baai/bge-base-en-v1.5',
  { text: documentContent }
);

// Use larger model for critical queries
const embedding = await ai.run(
  '@cf/baai/bge-large-en-v1.5',
  { text: queryText }
);

Query Optimization

Semantic Search Tuning

const searchConfig = {
  // Relevance threshold (0-1)
  minScore: 0.7,

  // Number of results
  topK: 5,

  // Diversity (avoid too-similar results)
  diversityBias: 0.3,

  // Metadata boosting
  boosts: {
    'rag_type:troubleshooting': 1.2,  // Boost troubleshooting
    'rag_audience:customer': 1.1      // Boost customer content
  }
};

Query Expansion

// Expand query for better recall
async function expandQuery(query: string): Promise<string[]> {
  // Generate synonyms and related terms
  const expanded = await ai.run('@cf/meta/llama-3.1-8b-instruct', {
    prompt: `Generate 3 alternative phrasings for: "${query}"`,
    max_tokens: 100
  });

  return [query, ...parseAlternatives(expanded)];
}

Hybrid Search

Combine semantic and keyword search:

const results = await hybridSearch({
  query: userQuery,
  semantic: {
    index: 'support-kb',
    weight: 0.7
  },
  keyword: {
    fields: ['title', 'rag_keywords'],
    weight: 0.3
  },
  topK: 5
});

Monitoring & Analytics

Track RAG Performance

// Log RAG query metrics
await analytics.log({
  event: 'rag_query',
  agent: 'maximus',
  query_length: query.length,
  results_count: results.length,
  top_score: results[0]?.score,
  model_tier: selectedTier,
  latency_ms: endTime - startTime
});

Key Metrics

Metric	Target	Alert Threshold
Query Latency	under 200ms	over 500ms
Top Result Score	above 0.75	below 0.5
Results Returned	3-5	0
User Satisfaction	above 4.0/5	below 3.5/5

Updating Knowledge Bases

Automatic Sync

Documentation changes trigger reindexing:

# .github/workflows/docs-index.yml
on:
  push:
    paths:
      - 'documentation/**/*.md'

jobs:
  reindex:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Index Documentation
        run: |
          npm run index-docs
        env:
          VECTORIZE_API_TOKEN: ${{ secrets.VECTORIZE_TOKEN }}

Manual Reindex

# Reindex specific knowledge base
npm run reindex -- --kb=support-kb

# Reindex all knowledge bases
npm run reindex -- --all

# Preview changes without indexing
npm run reindex -- --kb=support-kb --dry-run

Best Practices

Document Quality

tip

Documents with complete RAG frontmatter (rag_type, rag_product, rag_module, rag_keywords, rag_audience) are significantly more discoverable. Missing metadata means your content may never surface in agent responses.

Clear titles - Descriptive, searchable
Proper frontmatter - Complete RAG metadata
Logical structure - Headers, sections
Actionable content - Steps, examples
Regular updates - Keep current

Query Handling

Fallback gracefully - Handle no-results
Cite sources - Link to docs
Know limits - Escalate when unsure
Learn from feedback - Improve over time

Performance

Right-size chunks - Balance detail vs noise
Filter appropriately - Use metadata
Cache common queries - Reduce latency
Monitor costs - Track token usage

Overview​

How RAG Works​

AI Agent Knowledge Bases​

Production Knowledge Bases​

Model Tier Mapping​

Document Indexing​

Supported Document Types​

Indexing Pipeline​

Chunking Strategy​

RAG Frontmatter Schema​

Required Fields​

Field Definitions​

Vector Store Configuration​

Cloudflare Vectorize Setup​

Query Configuration​

Agent-Specific Configuration​

Maximus (Customer Support)​

Minerva (Sales AI)​

Menu AI (Ordering)​

Dev Agent (Engineering)​

Embedding Models​

Supported Models​

Model Selection​

Query Optimization​

Semantic Search Tuning​

Query Expansion​

Hybrid Search​

Monitoring & Analytics​

Track RAG Performance​

Key Metrics​

Updating Knowledge Bases​

Automatic Sync​

Manual Reindex​

Best Practices​

Document Quality​

Query Handling​

Performance​

Related Guides​