RAG Indexing
How documents are processed, chunked, embedded, and stored in Vectorize indexes for RAG retrieval.
RAG Architecture
The Olympus Cloud RAG system uses a hybrid architecture:
┌─────────────────────────────────────────────────────────────────────────┐
│ RAG Architecture │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Document │ │ Chunking │ │ Embedding │ │
│ │ Ingestion │───▶│ Pipeline │───▶│ Service │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Cloudflare Vectorize │ │
│ │ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ │ │
│ │ │menu-rag│ │support │ │sales-kb│ │ops-kb │ │ │
│ │ └────────┘ └────────┘ └────────┘ └────────┘ │ │
│ └──────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Query Router │ │
│ │ score_threshold → top_k → re-ranking → response │ │
│ └──────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Vectorize vs Vertex AI
| Feature | Cloudflare Vectorize | GCP Vertex AI |
|---|---|---|
| Latency | Under 20ms (edge) | 50-100ms |
| Cost | Included with Workers | Per query |
| Scaling | Automatic | Manual configuration |
| Max Dimensions | 1536 | 3072 |
| Max Vectors | 5M per index | Unlimited |
| Best For | Real-time queries | Large-scale analytics |
Recommendation: Use Vectorize for real-time agent queries, Vertex AI for batch processing and training.
Vectorize Index Management
Available Indexes
| Index Name | Content | Agents | Update Frequency |
|---|---|---|---|
menu-rag | Menu items, ingredients, allergens | Menu Assistant, Voice AI | Real-time |
support-rag | FAQs, troubleshooting, docs | Support Agent | Daily |
sales-rag | Pricing, ROI, competitors | Minerva | Weekly |
ops-rag | Runbooks, monitoring, alerts | Maximus | On change |
training-rag | Internal docs, procedures | All internal agents | Weekly |
policy-rag | Business rules, compliance | Scheduling, Analytics | On change |
Index Configuration
// Create new index
const index = await vectorize.createIndex({
name: 'support-rag',
dimensions: 768, // BGE-base dimensions
metric: 'cosine',
metadata_fields: {
doc_type: 'String',
category: 'String',
tenant_id: 'String',
updated_at: 'Number',
}
});
Metadata Schema
| Field | Type | Purpose |
|---|---|---|
doc_type | String | Content classification (faq, guide, runbook) |
category | String | Content category (orders, payments, scheduling) |
tenant_id | String | Tenant isolation for multi-tenant queries |
updated_at | Number | Timestamp for freshness filtering |
language | String | Content language (en, es, fr) |
audience | String | Target audience (staff, manager, customer) |
Document Ingestion Pipeline
Supported Formats
| Format | Processing | Chunking Strategy |
|---|---|---|
| Markdown | Native | Header-based semantic |
| PyMuPDF extraction | Page + paragraph | |
| HTML | BeautifulSoup | Section-based |
| Video | Whisper transcription | Time-segment |
| FAQ JSON | Direct import | Q&A pairs |
Ingestion Workflow
# GitHub Actions workflow for doc ingestion
name: RAG Document Ingestion
on:
push:
paths:
- 'documentation/**/*.md'
- 'docs/**/*.md'
jobs:
ingest:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Process documents
run: |
python scripts/rag/process_docs.py \
--source documentation/ \
--index support-rag \
--chunk-strategy semantic
- name: Upload to Vectorize
run: |
python scripts/rag/upload_vectors.py \
--index support-rag \
--vectors output/vectors.json
Chunking Strategies
Strategy Comparison
| Strategy | Best For | Chunk Size | Overlap |
|---|---|---|---|
| Semantic | Documentation | Variable | 50 tokens |
| Fixed | Code, logs | 500 tokens | 100 tokens |
| Paragraph | Articles | Variable | 1 sentence |
| Q&A | FAQs | Question + Answer | None |
Semantic Chunking (Recommended)
def semantic_chunk(markdown_content: str) -> list[Chunk]:
"""Split by headers, respecting document structure."""
chunks = []
current_chunk = []
current_headers = []
for line in markdown_content.split('\n'):
if line.startswith('#'):
# Save previous chunk
if current_chunk:
chunks.append(Chunk(
content='\n'.join(current_chunk),
headers=current_headers.copy(),
metadata={'type': 'section'}
))
# Start new chunk with header context
level = len(line.split()[0])
current_headers = current_headers[:level-1] + [line]
current_chunk = [line]
else:
current_chunk.append(line)
return chunks
Content-Type Specific Strategies
| Content Type | Strategy | Rationale |
|---|---|---|
| API docs | Endpoint-based | One chunk per endpoint |
| Runbooks | Step-based | One chunk per procedure |
| FAQs | Q&A pairs | Question + answer together |
| Tutorials | Section-based | Logical learning units |
| Reference | Term-based | Definition + examples |
Embedding Models
Model Comparison
| Model | Provider | Dimensions | Speed | Quality | Cost |
|---|---|---|---|---|---|
| BGE-base-en-v1.5 | Workers AI | 768 | Fast | Good | FREE |
| BGE-small-en-v1.5 | Workers AI | 384 | Fastest | Acceptable | FREE |
| text-embedding-004 | Vertex AI | 768 | Medium | Excellent | $0.025/1K |
| text-embedding-3-large | OpenAI | 3072 | Medium | Excellent | $0.13/1K |
Recommendation by Use Case
| Use Case | Recommended Model | Rationale |
|---|---|---|
| Real-time chat | BGE-small | Lowest latency |
| Support queries | BGE-base | Balance of speed/quality |
| Sales/complex | text-embedding-004 | Highest accuracy |
| Batch indexing | text-embedding-004 | Quality over speed |
Embedding Code Example
// Workers AI embedding
const embeddings = await ai.run('@cf/baai/bge-base-en-v1.5', {
text: [chunk.content]
});
// Insert into Vectorize
await index.upsert([{
id: chunk.id,
values: embeddings.data[0],
metadata: {
doc_type: chunk.type,
category: chunk.category,
tenant_id: tenantId,
updated_at: Date.now()
}
}]);
Related Pages
- Querying - Query patterns and retrieval strategies
- Maintenance - Index maintenance and monitoring
- Overview - RAG Knowledge Base overview