RAG Indexing

How documents are processed, chunked, embedded, and stored in Vectorize indexes for RAG retrieval.

RAG Architecture

The Olympus Cloud RAG system uses a hybrid architecture:

┌─────────────────────────────────────────────────────────────────────────┐
│                           RAG Architecture                               │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│   ┌──────────────┐    ┌──────────────┐    ┌──────────────┐              │
│   │   Document   │    │   Chunking   │    │  Embedding   │              │
│   │   Ingestion  │───▶│   Pipeline   │───▶│   Service    │              │
│   └──────────────┘    └──────────────┘    └──────────────┘              │
│          │                   │                    │                      │
│          ▼                   ▼                    ▼                      │
│   ┌──────────────────────────────────────────────────────┐              │
│   │              Cloudflare Vectorize                     │              │
│   │  ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐        │              │
│   │  │menu-rag│ │support │ │sales-kb│ │ops-kb  │        │              │
│   │  └────────┘ └────────┘ └────────┘ └────────┘        │              │
│   └──────────────────────────────────────────────────────┘              │
│                              │                                           │
│                              ▼                                           │
│   ┌──────────────────────────────────────────────────────┐              │
│   │                  Query Router                         │              │
│   │   score_threshold → top_k → re-ranking → response    │              │
│   └──────────────────────────────────────────────────────┘              │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Vectorize vs Vertex AI

Feature	Cloudflare Vectorize	GCP Vertex AI
Latency	Under 20ms (edge)	50-100ms
Cost	Included with Workers	Per query
Scaling	Automatic	Manual configuration
Max Dimensions	1536	3072
Max Vectors	5M per index	Unlimited
Best For	Real-time queries	Large-scale analytics

Recommendation: Use Vectorize for real-time agent queries, Vertex AI for batch processing and training.

Vectorize Index Management

Available Indexes

Index Name	Content	Agents	Update Frequency
`menu-rag`	Menu items, ingredients, allergens	Menu Assistant, Voice AI	Real-time
`support-rag`	FAQs, troubleshooting, docs	Support Agent	Daily
`sales-rag`	Pricing, ROI, competitors	Minerva	Weekly
`ops-rag`	Runbooks, monitoring, alerts	Maximus	On change
`training-rag`	Internal docs, procedures	All internal agents	Weekly
`policy-rag`	Business rules, compliance	Scheduling, Analytics	On change

Index Configuration

// Create new index
const index = await vectorize.createIndex({
  name: 'support-rag',
  dimensions: 768,  // BGE-base dimensions
  metric: 'cosine',
  metadata_fields: {
    doc_type: 'String',
    category: 'String',
    tenant_id: 'String',
    updated_at: 'Number',
  }
});

Metadata Schema

Field	Type	Purpose
`doc_type`	String	Content classification (faq, guide, runbook)
`category`	String	Content category (orders, payments, scheduling)
`tenant_id`	String	Tenant isolation for multi-tenant queries
`updated_at`	Number	Timestamp for freshness filtering
`language`	String	Content language (en, es, fr)
`audience`	String	Target audience (staff, manager, customer)

Document Ingestion Pipeline

Supported Formats

Format	Processing	Chunking Strategy
Markdown	Native	Header-based semantic
PDF	PyMuPDF extraction	Page + paragraph
HTML	BeautifulSoup	Section-based
Video	Whisper transcription	Time-segment
FAQ JSON	Direct import	Q&A pairs

Ingestion Workflow

# GitHub Actions workflow for doc ingestion
name: RAG Document Ingestion
on:
  push:
    paths:
      - 'documentation/**/*.md'
      - 'docs/**/*.md'

jobs:
  ingest:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Process documents
        run: |
          python scripts/rag/process_docs.py \
            --source documentation/ \
            --index support-rag \
            --chunk-strategy semantic

      - name: Upload to Vectorize
        run: |
          python scripts/rag/upload_vectors.py \
            --index support-rag \
            --vectors output/vectors.json

Chunking Strategies

Strategy Comparison

Strategy	Best For	Chunk Size	Overlap
Semantic	Documentation	Variable	50 tokens
Fixed	Code, logs	500 tokens	100 tokens
Paragraph	Articles	Variable	1 sentence
Q&A	FAQs	Question + Answer	None

Semantic Chunking (Recommended)

def semantic_chunk(markdown_content: str) -> list[Chunk]:
    """Split by headers, respecting document structure."""
    chunks = []
    current_chunk = []
    current_headers = []

    for line in markdown_content.split('\n'):
        if line.startswith('#'):
            # Save previous chunk
            if current_chunk:
                chunks.append(Chunk(
                    content='\n'.join(current_chunk),
                    headers=current_headers.copy(),
                    metadata={'type': 'section'}
                ))
            # Start new chunk with header context
            level = len(line.split()[0])
            current_headers = current_headers[:level-1] + [line]
            current_chunk = [line]
        else:
            current_chunk.append(line)

    return chunks

Content-Type Specific Strategies

Content Type	Strategy	Rationale
API docs	Endpoint-based	One chunk per endpoint
Runbooks	Step-based	One chunk per procedure
FAQs	Q&A pairs	Question + answer together
Tutorials	Section-based	Logical learning units
Reference	Term-based	Definition + examples

Embedding Models

Model Comparison

Model	Provider	Dimensions	Speed	Quality	Cost
BGE-base-en-v1.5	Workers AI	768	Fast	Good	FREE
BGE-small-en-v1.5	Workers AI	384	Fastest	Acceptable	FREE
text-embedding-004	Vertex AI	768	Medium	Excellent	$0.025/1K
text-embedding-3-large	OpenAI	3072	Medium	Excellent	$0.13/1K

Recommendation by Use Case

Use Case	Recommended Model	Rationale
Real-time chat	BGE-small	Lowest latency
Support queries	BGE-base	Balance of speed/quality
Sales/complex	text-embedding-004	Highest accuracy
Batch indexing	text-embedding-004	Quality over speed

Embedding Code Example

// Workers AI embedding
const embeddings = await ai.run('@cf/baai/bge-base-en-v1.5', {
  text: [chunk.content]
});

// Insert into Vectorize
await index.upsert([{
  id: chunk.id,
  values: embeddings.data[0],
  metadata: {
    doc_type: chunk.type,
    category: chunk.category,
    tenant_id: tenantId,
    updated_at: Date.now()
  }
}]);

Querying - Query patterns and retrieval strategies
Maintenance - Index maintenance and monitoring
Overview - RAG Knowledge Base overview

RAG Architecture​

Vectorize vs Vertex AI​

Vectorize Index Management​

Available Indexes​

Index Configuration​

Metadata Schema​

Document Ingestion Pipeline​

Supported Formats​

Ingestion Workflow​

Chunking Strategies​

Strategy Comparison​

Semantic Chunking (Recommended)​

Content-Type Specific Strategies​

Embedding Models​

Model Comparison​

Recommendation by Use Case​

Embedding Code Example​

Related Pages​