Tenant-Specific RAG Platform

Build AI-powered knowledge bases scoped to individual tenants, enabling each organization to upload, index, and query their own documents through the Knowledge Hub.

Overview

The Tenant-Specific RAG Platform provides every tenant in Olympus Cloud with an isolated, AI-powered knowledge base. Unlike the platform-wide RAG system that indexes shared documentation (support articles, release notes, user guides), the tenant RAG platform lets each organization maintain its own private knowledge corpus.

Platform-Wide RAG vs. Tenant RAG

Feature	Platform-Wide RAG	Tenant RAG
Scope	Shared across all tenants	Private per tenant
Content	Olympus docs, guides, FAQs	Tenant-uploaded documents
Index naming	`support-kb`, `sales-kb`, `docs-embeddings`	`minerva-knowledge-base-{tenant_id}`
Management	Platform team maintains	Tenant admins manage
Use case	Agent context (Maximus, Minerva)	Tenant-specific Q&A, support, operations
Configuration	See RAG Configuration	This document

Key Capabilities

Multi-format document ingestion -- PDF, Markdown, video transcripts, support tickets, FAQs, release notes, articles, and troubleshooting guides
Hybrid search -- Combines semantic vector search with BM25-style keyword matching via Reciprocal Rank Fusion (RRF)
Per-tenant isolation -- Each tenant gets a dedicated Vectorize index with tenant-scoped queries
Answer generation with citations -- LLM-powered answers that cite source documents with confidence scoring
Usage metering -- Track document counts, chunk counts, word counts, and query volume per tenant

Knowledge Hub Architecture

The Knowledge Hub processes documents through a multi-stage pipeline before they become queryable:

                         Knowledge Hub Pipeline
  +------------------------------------------------------------------+
  |                                                                    |
  |  Upload/API       Extract Text       Chunk Document                |
  |  +---------+      +-----------+      +-------------+              |
  |  | PDF     |      | pypdf     |      | Semantic    |              |
  |  | Markdown| ---> | Markdown  | ---> | Fixed-size  |              |
  |  | Ticket  |      | Cleanup   |      | Paragraph   |              |
  |  | Video   |      | Normalize |      | (configurable)|            |
  |  +---------+      +-----------+      +------+------+              |
  |                                             |                      |
  |                                             v                      |
  |  Query Pipeline        Index               Embed                   |
  |  +-----------+      +-----------+      +----------+               |
  |  | Hybrid    |      | Vectorize | <--- | BGE Base |               |
  |  | Search    | <--- | + Keyword |      | 768-dim  |               |
  |  | + Answer  |      | Index     |      | Workers  |               |
  |  | Generation|      +-----------+      |   AI     |               |
  |  +-----------+                         +----------+               |
  |                                                                    |
  +------------------------------------------------------------------+

Component Responsibilities

Component	Class	Source File
Ingestion Pipeline	`DocumentIngester`	`backend/python/app/services/knowledge_base/service.py`
Chunking Strategy	`ChunkingStrategy`	`backend/python/app/services/knowledge_base/service.py`
Hybrid Search	`HybridSearchEngine`	`backend/python/app/services/knowledge_base/service.py`
Answer Generator	`AnswerGenerator`	`backend/python/app/services/knowledge_base/service.py`
API Routes	`router`	`backend/python/app/api/knowledge_base_routes.py`
Vectorize Client	`VectorizeClient`	`backend/python/app/clients/vectorize_client.py`
Minerva KB Service	`MinervaKnowledgeBaseService`	`backend/python/app/services/minerva/knowledge_base.py`

Document Ingestion

Supported Document Types

The DocumentType enum defines the content types that the Knowledge Hub can ingest:

Type	Enum Value	Processing Method	Notes
Markdown	`markdown`	Frontmatter extraction, content cleanup	Default type; strips YAML frontmatter
PDF	`pdf`	Text extraction via `pypdf`	Multi-page support with per-page extraction
Video Transcript	`video_transcript`	Timestamp removal, speaker label cleanup	Normalizes raw transcription output
Support Ticket	`support_ticket`	Issue/resolution formatting	Formats resolved tickets as knowledge articles
Release Notes	`release_notes`	Version tagging	Automatically titles with version number
FAQ	`faq`	Direct processing	Short-answer optimized
Troubleshooting	`troubleshooting`	Direct processing	Step-by-step resolution content
Article	`article`	Direct processing	General knowledge content

Ingestion Pipeline Details

Each document goes through the following stages:

1. Content Extraction

The DocumentIngester class handles format-specific text extraction:

# Markdown: strips frontmatter, normalizes whitespace
document = await ingester.ingest_markdown(
    content="# Getting Started\n...",
    tenant_id="tenant-123",
    title="Getting Started Guide",
    category="onboarding",
    tags=["setup", "quickstart"],
)

# PDF: extracts text from all pages using pypdf
document = await ingester.ingest_pdf(
    pdf_content=pdf_bytes,
    tenant_id="tenant-123",
    title="Operations Manual",
)

# Support Ticket: formats issue + resolution as knowledge article
document = await ingester.ingest_support_ticket(
    ticket_content={
        "title": "POS Not Printing Receipts",
        "issue": "Receipts stopped printing after firmware update...",
        "resolution": "Reset printer spooler and re-pair Bluetooth...",
        "category": "hardware",
        "tags": ["printer", "pos", "bluetooth"],
    },
    tenant_id="tenant-123",
)

2. Content Hashing

Each document receives a SHA-256 content hash (first 16 hex characters) for change detection, enabling efficient re-indexing when content is updated.

3. Chunking

Documents are split into chunks using one of three strategies:

Strategy	Description	Best For
Semantic (default)	Splits on markdown headers and paragraphs, respects section boundaries	Structured documents with headers
Fixed	Fixed character window with overlap, avoids mid-word splits	Unstructured text, transcripts
Paragraph	Splits on double newlines	Simple documents, articles

Chunking parameters:

Parameter	Default	Range	Purpose
`chunk_size`	512	--	Target characters per chunk
`chunk_overlap`	50	--	Overlap between consecutive chunks
`min_chunk_size`	100	--	Discard chunks smaller than this
`max_chunk_size`	1024	--	Split sections exceeding this

4. Embedding

Chunks are embedded using Workers AI's BGE model (@cf/baai/bge-base-en-v1.5) producing 768-dimensional vectors. The AI Gateway client handles batch embedding for efficiency.

5. Indexing

Embedded chunks are upserted into Cloudflare Vectorize with metadata:

# Metadata stored with each vector
{
    "document_id": "doc-uuid",
    "tenant_id": "tenant-123",
    "doc_type": "markdown",
    "title": "Getting Started Guide",
    "section_title": "Installation",
    "category": "onboarding",
    "tags": ["setup", "quickstart"],
}

A parallel keyword index is built in-memory for BM25-style retrieval, with stopword removal and term-frequency scoring.

External Data Connectors

The platform supports ingesting data from external systems through the Universal Data Ingestion Engine (Epic #1400) and the Knowledge Hub API. External data can be imported into the knowledge base through two primary patterns:

Supported Connector Types

Connector	Pattern	Data Flow
PostgreSQL	ETL pipeline	Extract rows, transform to documents, ingest via KB API
MySQL	ETL pipeline	Extract rows, transform to documents, ingest via KB API
Salesforce	CRM integration	Sync CRM records as knowledge articles
HubSpot	CRM integration	Sync contacts, deals, and knowledge content
Webhook	Push-based	External systems push documents to the ingestion endpoint

Webhook Integration

External systems can push documents directly to the Knowledge Hub API:

# Webhook-style document ingestion
curl -X POST https://dev.api.olympuscloud.ai/v1/knowledge-base/ingest \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "tenant_id": "550e8400-e29b-41d4-a716-446655449100",
    "title": "Product Update Q1 2026",
    "content": "## New Features\n\n- Inventory auto-reorder...",
    "doc_type": "article",
    "category": "product-updates",
    "tags": ["product", "q1-2026"],
    "chunking_strategy": "semantic"
  }'

Bulk Import Pattern

For large-scale data migration from external databases:

# Bulk ingest multiple documents
curl -X POST https://dev.api.olympuscloud.ai/v1/knowledge-base/ingest/bulk \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "tenant_id": "550e8400-e29b-41d4-a716-446655449100",
    "documents": [
      {
        "title": "SOP: Opening Procedures",
        "content": "## Opening Checklist\n...",
        "doc_type": "article",
        "category": "operations",
        "tags": ["sop", "opening"]
      },
      {
        "title": "SOP: Closing Procedures",
        "content": "## Closing Checklist\n...",
        "doc_type": "article",
        "category": "operations",
        "tags": ["sop", "closing"]
      }
    ]
  }'

The bulk endpoint returns per-document status including success/failure counts:

{
  "total": 2,
  "success": 2,
  "failed": 0,
  "results": [
    {"document_id": "abc-123", "status": "success", "title": "SOP: Opening Procedures"},
    {"document_id": "def-456", "status": "success", "title": "SOP: Closing Procedures"}
  ]
}

Hybrid Search

The HybridSearchEngine combines two retrieval strategies and fuses their results using Reciprocal Rank Fusion (RRF).

Search Methods

Method	When Used	Strengths
Vector	Semantic similarity via Vectorize	Finds conceptually similar content even with different wording
Keyword	BM25-style term matching	Precise matches for specific terms, product names, codes
Hybrid (default)	Both vector + keyword, fused via RRF	Best overall recall and precision

How Hybrid Search Works

  User Query: "How do I reset the printer?"
       |
       +---> Vector Search (Vectorize)
       |       Query embedding --> cosine similarity
       |       Returns: top_k * 2 results with scores
       |
       +---> Keyword Search (BM25-style)
       |       Tokenize, remove stopwords, TF-IDF scoring
       |       Returns: top_k * 2 results with scores
       |
       +---> Reciprocal Rank Fusion (k=60)
               Merge & deduplicate results
               RRF score = sum(1 / (k + rank_i)) for each method
               Results appearing in both methods get "hybrid" tag
               Return top_k final results

Search Request Parameters

Parameter	Type	Default	Description
`query`	string	required	Natural language search query
`tenant_id`	string	required	Scopes search to tenant's documents
`top_k`	integer	5	Maximum results (1-20)
`category`	string	null	Filter by document category
`doc_types`	array	null	Filter by `DocumentType` values
`search_method`	string	`"hybrid"`	One of: `vector`, `keyword`, `hybrid`

Search Example

from app.services.knowledge_base.service import KnowledgeBaseService, DocumentType

service = await get_knowledge_base_service()

results = await service.search(
    query="How do I configure inventory alerts?",
    tenant_id="tenant-123",
    top_k=5,
    category="operations",
    doc_types=[DocumentType.ARTICLE, DocumentType.TROUBLESHOOTING],
    search_method="hybrid",
)

for result in results:
    print(f"[{result.search_method}] {result.document_title}")
    print(f"  Section: {result.section_title}")
    print(f"  Score: {result.score:.4f}")
    print(f"  Content: {result.content[:200]}...")

Per-Tenant Isolation

Tenant isolation is enforced at multiple levels to ensure data privacy and prevent cross-tenant data leakage.

Isolation Architecture

  Tenant A                           Tenant B
  +------------------+              +------------------+
  | Documents        |              | Documents        |
  | Chunks           |              | Chunks           |
  | Keyword Index    |              | Keyword Index    |
  +--------+---------+              +--------+---------+
           |                                 |
           v                                 v
  +------------------+              +------------------+
  | Vectorize Index  |              | Vectorize Index  |
  | minerva-kb-      |              | minerva-kb-      |
  | tenant-a-uuid    |              | tenant-b-uuid    |
  +------------------+              +------------------+

Isolation Mechanisms

Layer	Mechanism	Implementation
API Layer	`require_tenant_auth()` dependency	All Knowledge Hub routes require tenant authentication
Data Model	`tenant_id` field on `Document` and `DocumentChunk`	Every record is tagged with the owning tenant
Vector Store	Tenant-scoped index names	`minerva-knowledge-base-{tenant_id}` naming convention
Query Filtering	Tenant ID filter on all searches	Both vector and keyword searches filter by `tenant_id`
Vectorize Metadata	`tenant_id` in vector metadata	Stored alongside each embedding for secondary filtering

Index Provisioning

Each tenant's Vectorize index is created on demand using the ensure_tenant_index method:

from app.clients.vectorize_client import VectorizeClient

client = VectorizeClient()

# Creates index if it doesn't exist, returns existing if it does
index = await client.ensure_tenant_index(
    tenant_id="550e8400-e29b-41d4-a716-446655449100",
    index_prefix="minerva-knowledge-base",  # default
    dimensions=1536,                         # vector dimensions
)
# Index name: minerva-knowledge-base-550e8400-e29b-41d4-a716-446655449100

Index Lifecycle

Operation	Endpoint	When
Provision	`POST /minerva/knowledge-base/{tenant_id}/provision`	Tenant onboarding / Minerva addon enabled
Status check	`GET /minerva/knowledge-base/{tenant_id}/status`	Health monitoring
Rebuild	`POST /minerva/knowledge-base/{tenant_id}/rebuild`	Full re-index with fresh data
Delete	`DELETE /minerva/knowledge-base/{tenant_id}`	Tenant offboarding / addon disabled

Usage Metering and Billing

The Knowledge Hub tracks usage metrics per tenant for billing and capacity planning.

Tracked Metrics

Metric	Source	Description
`total_documents`	`StatsResponse`	Number of documents ingested
`total_chunks`	`StatsResponse`	Total chunks across all documents
`total_words`	`StatsResponse`	Aggregate word count
`documents_by_type`	`StatsResponse`	Breakdown by `DocumentType`
`documents_by_category`	`StatsResponse`	Breakdown by category label
`indexed_documents`	`StatsResponse`	Documents successfully embedded and indexed
`pending_documents`	`StatsResponse`	Documents awaiting processing

Querying Usage Stats

# Get tenant knowledge base statistics
curl -X GET "https://dev.api.olympuscloud.ai/v1/knowledge-base/stats?tenant_id=550e8400-e29b-41d4-a716-446655449100" \
  -H "Authorization: Bearer $TOKEN"

Response:

{
  "tenant_id": "550e8400-e29b-41d4-a716-446655449100",
  "total_documents": 47,
  "total_chunks": 312,
  "total_words": 89450,
  "documents_by_type": {
    "markdown": 20,
    "pdf": 12,
    "support_ticket": 10,
    "faq": 5
  },
  "documents_by_category": {
    "operations": 15,
    "menu": 12,
    "training": 10,
    "policies": 7,
    "uncategorized": 3
  },
  "indexed_documents": 45,
  "pending_documents": 2
}

Ingestion Metering

Each ingestion response includes processing metrics for cost attribution:

Field	Type	Description
`document_id`	string	Unique ID of the ingested document
`status`	string	`"success"` or `"failed"`
`chunks_created`	integer	Number of chunks produced
`chunks_indexed`	integer	Number successfully embedded and stored
`processing_time_ms`	integer	Total ingestion latency
`error_message`	string (nullable)	Error details if `status` is `"failed"`

API Endpoints

All Knowledge Hub endpoints are mounted under /v1/knowledge-base and require tenant authentication.

Endpoint Reference

Method	Path	Summary	Request Body
`GET`	`/knowledge-base/health`	Health check	--
`POST`	`/knowledge-base/ingest`	Ingest a text document	`IngestDocumentRequest`
`POST`	`/knowledge-base/ingest/pdf`	Upload and ingest a PDF	Multipart file + query params
`POST`	`/knowledge-base/ingest/support-ticket`	Ingest a resolved support ticket	`IngestSupportTicketRequest`
`POST`	`/knowledge-base/ingest/bulk`	Bulk ingest multiple documents	`BulkIngestRequest`
`POST`	`/knowledge-base/search`	Search the knowledge base	`SearchRequest`
`POST`	`/knowledge-base/answer`	Generate an answer with citations	`AnswerRequest`
`GET`	`/knowledge-base/documents`	List documents (paginated)	Query params
`GET`	`/knowledge-base/documents/{document_id}`	Get document details	--
`DELETE`	`/knowledge-base/documents/{document_id}`	Delete a document	--
`GET`	`/knowledge-base/stats`	Get tenant statistics	Query params

Ingest Document

curl -X POST https://dev.api.olympuscloud.ai/v1/knowledge-base/ingest \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "tenant_id": "550e8400-e29b-41d4-a716-446655449100",
    "title": "Employee Handbook",
    "content": "# Employee Handbook\n\n## Attendance Policy\n...",
    "doc_type": "markdown",
    "category": "hr",
    "tags": ["handbook", "policies", "hr"],
    "source_url": "https://internal.example.com/handbook",
    "chunking_strategy": "semantic"
  }'

Ingest PDF

curl -X POST "https://dev.api.olympuscloud.ai/v1/knowledge-base/ingest/pdf?tenant_id=550e8400-e29b-41d4-a716-446655449100&title=Safety%20Manual&category=compliance" \
  -H "Authorization: Bearer $TOKEN" \
  -F "file=@safety-manual.pdf"

Search Knowledge Base

curl -X POST https://dev.api.olympuscloud.ai/v1/knowledge-base/search \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What is the dress code policy?",
    "tenant_id": "550e8400-e29b-41d4-a716-446655449100",
    "top_k": 5,
    "category": "hr",
    "search_method": "hybrid"
  }'

Response:

{
  "query": "What is the dress code policy?",
  "results": [
    {
      "chunk_id": "chunk-uuid-1",
      "document_id": "doc-uuid-1",
      "content": "All staff must wear the approved uniform during shifts...",
      "score": 0.032,
      "document_title": "Employee Handbook",
      "doc_type": "markdown",
      "section_title": "Dress Code",
      "source_url": "https://internal.example.com/handbook",
      "search_method": "hybrid"
    }
  ],
  "total": 1,
  "search_method": "hybrid"
}

Generate Answer with Citations

curl -X POST https://dev.api.olympuscloud.ai/v1/knowledge-base/answer \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "question": "What should I wear to work?",
    "tenant_id": "550e8400-e29b-41d4-a716-446655449100",
    "top_k": 5,
    "category": "hr"
  }'

Response:

{
  "answer": "According to the Employee Handbook [Source 1], all staff must wear the approved uniform during shifts. This includes...",
  "confidence": 0.82,
  "sources": [
    {
      "source_number": 1,
      "document_id": "doc-uuid-1",
      "title": "Employee Handbook",
      "section": "Dress Code",
      "url": "https://internal.example.com/handbook",
      "relevance_score": 0.89
    }
  ],
  "context_chunks": 5,
  "model_tier": "T3",
  "generation_latency_ms": 1250,
  "search_latency_ms": 85,
  "has_direct_answer": true,
  "needs_human_review": false
}

List Documents

curl -X GET "https://dev.api.olympuscloud.ai/v1/knowledge-base/documents?tenant_id=550e8400-e29b-41d4-a716-446655449100&category=hr&limit=10&offset=0" \
  -H "Authorization: Bearer $TOKEN"

Delete Document

curl -X DELETE https://dev.api.olympuscloud.ai/v1/knowledge-base/documents/doc-uuid-1 \
  -H "Authorization: Bearer $TOKEN"

tip

Deleting a document removes it from both the Vectorize index and the keyword index. The operation cascades to all associated chunks.

AI Agent RAG Configuration -- Agent-level RAG config for Maximus, Minerva, Menu AI, and Dev Agent
ACP AI Router -- Smart model routing and the T1-T6 tier system used for answer generation
Agent Contexts and Personas -- Persona definitions for agents that consume Knowledge Hub data

Overview​

Platform-Wide RAG vs. Tenant RAG​

Key Capabilities​

Knowledge Hub Architecture​

Component Responsibilities​

Document Ingestion​

Supported Document Types​

Ingestion Pipeline Details​

External Data Connectors​

Supported Connector Types​

Webhook Integration​

Bulk Import Pattern​

Hybrid Search​

Search Methods​

How Hybrid Search Works​

Search Request Parameters​

Search Example​

Per-Tenant Isolation​

Isolation Architecture​

Isolation Mechanisms​

Index Provisioning​

Index Lifecycle​

Usage Metering and Billing​

Tracked Metrics​

Querying Usage Stats​

Ingestion Metering​

API Endpoints​

Endpoint Reference​

Ingest Document​

Ingest PDF​

Search Knowledge Base​

Generate Answer with Citations​

List Documents​

Delete Document​

Related Documentation​

Overview

Platform-Wide RAG vs. Tenant RAG

Key Capabilities

Knowledge Hub Architecture

Component Responsibilities

Document Ingestion

Supported Document Types

Ingestion Pipeline Details

External Data Connectors

Supported Connector Types

Webhook Integration

Bulk Import Pattern

Hybrid Search

Search Methods

How Hybrid Search Works

Search Request Parameters

Search Example

Per-Tenant Isolation

Isolation Architecture

Isolation Mechanisms

Index Provisioning

Index Lifecycle

Usage Metering and Billing

Tracked Metrics

Querying Usage Stats

Ingestion Metering

API Endpoints

Endpoint Reference

Ingest Document

Ingest PDF

Search Knowledge Base

Generate Answer with Citations

List Documents

Delete Document

Related Documentation