Tenant-Specific RAG Platform
Build AI-powered knowledge bases scoped to individual tenants, enabling each organization to upload, index, and query their own documents through the Knowledge Hub.
Overview
The Tenant-Specific RAG Platform provides every tenant in Olympus Cloud with an isolated, AI-powered knowledge base. Unlike the platform-wide RAG system that indexes shared documentation (support articles, release notes, user guides), the tenant RAG platform lets each organization maintain its own private knowledge corpus.
Platform-Wide RAG vs. Tenant RAG
| Feature | Platform-Wide RAG | Tenant RAG |
|---|---|---|
| Scope | Shared across all tenants | Private per tenant |
| Content | Olympus docs, guides, FAQs | Tenant-uploaded documents |
| Index naming | support-kb, sales-kb, docs-embeddings | minerva-knowledge-base-{tenant_id} |
| Management | Platform team maintains | Tenant admins manage |
| Use case | Agent context (Maximus, Minerva) | Tenant-specific Q&A, support, operations |
| Configuration | See RAG Configuration | This document |
Key Capabilities
- Multi-format document ingestion -- PDF, Markdown, video transcripts, support tickets, FAQs, release notes, articles, and troubleshooting guides
- Hybrid search -- Combines semantic vector search with BM25-style keyword matching via Reciprocal Rank Fusion (RRF)
- Per-tenant isolation -- Each tenant gets a dedicated Vectorize index with tenant-scoped queries
- Answer generation with citations -- LLM-powered answers that cite source documents with confidence scoring
- Usage metering -- Track document counts, chunk counts, word counts, and query volume per tenant
Knowledge Hub Architecture
The Knowledge Hub processes documents through a multi-stage pipeline before they become queryable:
Knowledge Hub Pipeline
+------------------------------------------------------------------+
| |
| Upload/API Extract Text Chunk Document |
| +---------+ +-----------+ +-------------+ |
| | PDF | | pypdf | | Semantic | |
| | Markdown| ---> | Markdown | ---> | Fixed-size | |
| | Ticket | | Cleanup | | Paragraph | |
| | Video | | Normalize | | (configurable)| |
| +---------+ +-----------+ +------+------+ |
| | |
| v |
| Query Pipeline Index Embed |
| +-----------+ +-----------+ +----------+ |
| | Hybrid | | Vectorize | <--- | BGE Base | |
| | Search | <--- | + Keyword | | 768-dim | |
| | + Answer | | Index | | Workers | |
| | Generation| +-----------+ | AI | |
| +-----------+ +----------+ |
| |
+------------------------------------------------------------------+
Component Responsibilities
| Component | Class | Source File |
|---|---|---|
| Ingestion Pipeline | DocumentIngester | backend/python/app/services/knowledge_base/service.py |
| Chunking Strategy | ChunkingStrategy | backend/python/app/services/knowledge_base/service.py |
| Hybrid Search | HybridSearchEngine | backend/python/app/services/knowledge_base/service.py |
| Answer Generator | AnswerGenerator | backend/python/app/services/knowledge_base/service.py |
| API Routes | router | backend/python/app/api/knowledge_base_routes.py |
| Vectorize Client | VectorizeClient | backend/python/app/clients/vectorize_client.py |
| Minerva KB Service | MinervaKnowledgeBaseService | backend/python/app/services/minerva/knowledge_base.py |
Document Ingestion
Supported Document Types
The DocumentType enum defines the content types that the Knowledge Hub can ingest:
| Type | Enum Value | Processing Method | Notes |
|---|---|---|---|
| Markdown | markdown | Frontmatter extraction, content cleanup | Default type; strips YAML frontmatter |
pdf | Text extraction via pypdf | Multi-page support with per-page extraction | |
| Video Transcript | video_transcript | Timestamp removal, speaker label cleanup | Normalizes raw transcription output |
| Support Ticket | support_ticket | Issue/resolution formatting | Formats resolved tickets as knowledge articles |
| Release Notes | release_notes | Version tagging | Automatically titles with version number |
| FAQ | faq | Direct processing | Short-answer optimized |
| Troubleshooting | troubleshooting | Direct processing | Step-by-step resolution content |
| Article | article | Direct processing | General knowledge content |
Ingestion Pipeline Details
Each document goes through the following stages:
1. Content Extraction
The DocumentIngester class handles format-specific text extraction:
# Markdown: strips frontmatter, normalizes whitespace
document = await ingester.ingest_markdown(
content="# Getting Started\n...",
tenant_id="tenant-123",
title="Getting Started Guide",
category="onboarding",
tags=["setup", "quickstart"],
)
# PDF: extracts text from all pages using pypdf
document = await ingester.ingest_pdf(
pdf_content=pdf_bytes,
tenant_id="tenant-123",
title="Operations Manual",
)
# Support Ticket: formats issue + resolution as knowledge article
document = await ingester.ingest_support_ticket(
ticket_content={
"title": "POS Not Printing Receipts",
"issue": "Receipts stopped printing after firmware update...",
"resolution": "Reset printer spooler and re-pair Bluetooth...",
"category": "hardware",
"tags": ["printer", "pos", "bluetooth"],
},
tenant_id="tenant-123",
)
2. Content Hashing
Each document receives a SHA-256 content hash (first 16 hex characters) for change detection, enabling efficient re-indexing when content is updated.
3. Chunking
Documents are split into chunks using one of three strategies:
| Strategy | Description | Best For |
|---|---|---|
| Semantic (default) | Splits on markdown headers and paragraphs, respects section boundaries | Structured documents with headers |
| Fixed | Fixed character window with overlap, avoids mid-word splits | Unstructured text, transcripts |
| Paragraph | Splits on double newlines | Simple documents, articles |
Chunking parameters:
| Parameter | Default | Range | Purpose |
|---|---|---|---|
chunk_size | 512 | -- | Target characters per chunk |
chunk_overlap | 50 | -- | Overlap between consecutive chunks |
min_chunk_size | 100 | -- | Discard chunks smaller than this |
max_chunk_size | 1024 | -- | Split sections exceeding this |
4. Embedding
Chunks are embedded using Workers AI's BGE model (@cf/baai/bge-base-en-v1.5) producing 768-dimensional vectors. The AI Gateway client handles batch embedding for efficiency.
5. Indexing
Embedded chunks are upserted into Cloudflare Vectorize with metadata:
# Metadata stored with each vector
{
"document_id": "doc-uuid",
"tenant_id": "tenant-123",
"doc_type": "markdown",
"title": "Getting Started Guide",
"section_title": "Installation",
"category": "onboarding",
"tags": ["setup", "quickstart"],
}
A parallel keyword index is built in-memory for BM25-style retrieval, with stopword removal and term-frequency scoring.
External Data Connectors
The platform supports ingesting data from external systems through the Universal Data Ingestion Engine (Epic #1400) and the Knowledge Hub API. External data can be imported into the knowledge base through two primary patterns:
Supported Connector Types
| Connector | Pattern | Data Flow |
|---|---|---|
| PostgreSQL | ETL pipeline | Extract rows, transform to documents, ingest via KB API |
| MySQL | ETL pipeline | Extract rows, transform to documents, ingest via KB API |
| Salesforce | CRM integration | Sync CRM records as knowledge articles |
| HubSpot | CRM integration | Sync contacts, deals, and knowledge content |
| Webhook | Push-based | External systems push documents to the ingestion endpoint |
Webhook Integration
External systems can push documents directly to the Knowledge Hub API:
# Webhook-style document ingestion
curl -X POST https://dev.api.olympuscloud.ai/v1/knowledge-base/ingest \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"tenant_id": "550e8400-e29b-41d4-a716-446655449100",
"title": "Product Update Q1 2026",
"content": "## New Features\n\n- Inventory auto-reorder...",
"doc_type": "article",
"category": "product-updates",
"tags": ["product", "q1-2026"],
"chunking_strategy": "semantic"
}'
Bulk Import Pattern
For large-scale data migration from external databases:
# Bulk ingest multiple documents
curl -X POST https://dev.api.olympuscloud.ai/v1/knowledge-base/ingest/bulk \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"tenant_id": "550e8400-e29b-41d4-a716-446655449100",
"documents": [
{
"title": "SOP: Opening Procedures",
"content": "## Opening Checklist\n...",
"doc_type": "article",
"category": "operations",
"tags": ["sop", "opening"]
},
{
"title": "SOP: Closing Procedures",
"content": "## Closing Checklist\n...",
"doc_type": "article",
"category": "operations",
"tags": ["sop", "closing"]
}
]
}'
The bulk endpoint returns per-document status including success/failure counts:
{
"total": 2,
"success": 2,
"failed": 0,
"results": [
{"document_id": "abc-123", "status": "success", "title": "SOP: Opening Procedures"},
{"document_id": "def-456", "status": "success", "title": "SOP: Closing Procedures"}
]
}
Hybrid Search
The HybridSearchEngine combines two retrieval strategies and fuses their results using Reciprocal Rank Fusion (RRF).
Search Methods
| Method | When Used | Strengths |
|---|---|---|
| Vector | Semantic similarity via Vectorize | Finds conceptually similar content even with different wording |
| Keyword | BM25-style term matching | Precise matches for specific terms, product names, codes |
| Hybrid (default) | Both vector + keyword, fused via RRF | Best overall recall and precision |
How Hybrid Search Works
User Query: "How do I reset the printer?"
|
+---> Vector Search (Vectorize)
| Query embedding --> cosine similarity
| Returns: top_k * 2 results with scores
|
+---> Keyword Search (BM25-style)
| Tokenize, remove stopwords, TF-IDF scoring
| Returns: top_k * 2 results with scores
|
+---> Reciprocal Rank Fusion (k=60)
Merge & deduplicate results
RRF score = sum(1 / (k + rank_i)) for each method
Results appearing in both methods get "hybrid" tag
Return top_k final results
Search Request Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
query | string | required | Natural language search query |
tenant_id | string | required | Scopes search to tenant's documents |
top_k | integer | 5 | Maximum results (1-20) |
category | string | null | Filter by document category |
doc_types | array | null | Filter by DocumentType values |
search_method | string | "hybrid" | One of: vector, keyword, hybrid |
Search Example
from app.services.knowledge_base.service import KnowledgeBaseService, DocumentType
service = await get_knowledge_base_service()
results = await service.search(
query="How do I configure inventory alerts?",
tenant_id="tenant-123",
top_k=5,
category="operations",
doc_types=[DocumentType.ARTICLE, DocumentType.TROUBLESHOOTING],
search_method="hybrid",
)
for result in results:
print(f"[{result.search_method}] {result.document_title}")
print(f" Section: {result.section_title}")
print(f" Score: {result.score:.4f}")
print(f" Content: {result.content[:200]}...")
Per-Tenant Isolation
Tenant isolation is enforced at multiple levels to ensure data privacy and prevent cross-tenant data leakage.
Isolation Architecture
Tenant A Tenant B
+------------------+ +------------------+
| Documents | | Documents |
| Chunks | | Chunks |
| Keyword Index | | Keyword Index |
+--------+---------+ +--------+---------+
| |
v v
+------------------+ +------------------+
| Vectorize Index | | Vectorize Index |
| minerva-kb- | | minerva-kb- |
| tenant-a-uuid | | tenant-b-uuid |
+------------------+ +------------------+
Isolation Mechanisms
| Layer | Mechanism | Implementation |
|---|---|---|
| API Layer | require_tenant_auth() dependency | All Knowledge Hub routes require tenant authentication |
| Data Model | tenant_id field on Document and DocumentChunk | Every record is tagged with the owning tenant |
| Vector Store | Tenant-scoped index names | minerva-knowledge-base-{tenant_id} naming convention |
| Query Filtering | Tenant ID filter on all searches | Both vector and keyword searches filter by tenant_id |
| Vectorize Metadata | tenant_id in vector metadata | Stored alongside each embedding for secondary filtering |
Index Provisioning
Each tenant's Vectorize index is created on demand using the ensure_tenant_index method:
from app.clients.vectorize_client import VectorizeClient
client = VectorizeClient()
# Creates index if it doesn't exist, returns existing if it does
index = await client.ensure_tenant_index(
tenant_id="550e8400-e29b-41d4-a716-446655449100",
index_prefix="minerva-knowledge-base", # default
dimensions=1536, # vector dimensions
)
# Index name: minerva-knowledge-base-550e8400-e29b-41d4-a716-446655449100
Index Lifecycle
| Operation | Endpoint | When |
|---|---|---|
| Provision | POST /minerva/knowledge-base/{tenant_id}/provision | Tenant onboarding / Minerva addon enabled |
| Status check | GET /minerva/knowledge-base/{tenant_id}/status | Health monitoring |
| Rebuild | POST /minerva/knowledge-base/{tenant_id}/rebuild | Full re-index with fresh data |
| Delete | DELETE /minerva/knowledge-base/{tenant_id} | Tenant offboarding / addon disabled |
Usage Metering and Billing
The Knowledge Hub tracks usage metrics per tenant for billing and capacity planning.
Tracked Metrics
| Metric | Source | Description |
|---|---|---|
total_documents | StatsResponse | Number of documents ingested |
total_chunks | StatsResponse | Total chunks across all documents |
total_words | StatsResponse | Aggregate word count |
documents_by_type | StatsResponse | Breakdown by DocumentType |
documents_by_category | StatsResponse | Breakdown by category label |
indexed_documents | StatsResponse | Documents successfully embedded and indexed |
pending_documents | StatsResponse | Documents awaiting processing |
Querying Usage Stats
# Get tenant knowledge base statistics
curl -X GET "https://dev.api.olympuscloud.ai/v1/knowledge-base/stats?tenant_id=550e8400-e29b-41d4-a716-446655449100" \
-H "Authorization: Bearer $TOKEN"
Response:
{
"tenant_id": "550e8400-e29b-41d4-a716-446655449100",
"total_documents": 47,
"total_chunks": 312,
"total_words": 89450,
"documents_by_type": {
"markdown": 20,
"pdf": 12,
"support_ticket": 10,
"faq": 5
},
"documents_by_category": {
"operations": 15,
"menu": 12,
"training": 10,
"policies": 7,
"uncategorized": 3
},
"indexed_documents": 45,
"pending_documents": 2
}
Ingestion Metering
Each ingestion response includes processing metrics for cost attribution:
| Field | Type | Description |
|---|---|---|
document_id | string | Unique ID of the ingested document |
status | string | "success" or "failed" |
chunks_created | integer | Number of chunks produced |
chunks_indexed | integer | Number successfully embedded and stored |
processing_time_ms | integer | Total ingestion latency |
error_message | string (nullable) | Error details if status is "failed" |
API Endpoints
All Knowledge Hub endpoints are mounted under /v1/knowledge-base and require tenant authentication.
Endpoint Reference
| Method | Path | Summary | Request Body |
|---|---|---|---|
GET | /knowledge-base/health | Health check | -- |
POST | /knowledge-base/ingest | Ingest a text document | IngestDocumentRequest |
POST | /knowledge-base/ingest/pdf | Upload and ingest a PDF | Multipart file + query params |
POST | /knowledge-base/ingest/support-ticket | Ingest a resolved support ticket | IngestSupportTicketRequest |
POST | /knowledge-base/ingest/bulk | Bulk ingest multiple documents | BulkIngestRequest |
POST | /knowledge-base/search | Search the knowledge base | SearchRequest |
POST | /knowledge-base/answer | Generate an answer with citations | AnswerRequest |
GET | /knowledge-base/documents | List documents (paginated) | Query params |
GET | /knowledge-base/documents/{document_id} | Get document details | -- |
DELETE | /knowledge-base/documents/{document_id} | Delete a document | -- |
GET | /knowledge-base/stats | Get tenant statistics | Query params |
Ingest Document
curl -X POST https://dev.api.olympuscloud.ai/v1/knowledge-base/ingest \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"tenant_id": "550e8400-e29b-41d4-a716-446655449100",
"title": "Employee Handbook",
"content": "# Employee Handbook\n\n## Attendance Policy\n...",
"doc_type": "markdown",
"category": "hr",
"tags": ["handbook", "policies", "hr"],
"source_url": "https://internal.example.com/handbook",
"chunking_strategy": "semantic"
}'
Ingest PDF
curl -X POST "https://dev.api.olympuscloud.ai/v1/knowledge-base/ingest/pdf?tenant_id=550e8400-e29b-41d4-a716-446655449100&title=Safety%20Manual&category=compliance" \
-H "Authorization: Bearer $TOKEN" \
-F "file=@safety-manual.pdf"
Search Knowledge Base
curl -X POST https://dev.api.olympuscloud.ai/v1/knowledge-base/search \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"query": "What is the dress code policy?",
"tenant_id": "550e8400-e29b-41d4-a716-446655449100",
"top_k": 5,
"category": "hr",
"search_method": "hybrid"
}'
Response:
{
"query": "What is the dress code policy?",
"results": [
{
"chunk_id": "chunk-uuid-1",
"document_id": "doc-uuid-1",
"content": "All staff must wear the approved uniform during shifts...",
"score": 0.032,
"document_title": "Employee Handbook",
"doc_type": "markdown",
"section_title": "Dress Code",
"source_url": "https://internal.example.com/handbook",
"search_method": "hybrid"
}
],
"total": 1,
"search_method": "hybrid"
}
Generate Answer with Citations
curl -X POST https://dev.api.olympuscloud.ai/v1/knowledge-base/answer \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"question": "What should I wear to work?",
"tenant_id": "550e8400-e29b-41d4-a716-446655449100",
"top_k": 5,
"category": "hr"
}'
Response:
{
"answer": "According to the Employee Handbook [Source 1], all staff must wear the approved uniform during shifts. This includes...",
"confidence": 0.82,
"sources": [
{
"source_number": 1,
"document_id": "doc-uuid-1",
"title": "Employee Handbook",
"section": "Dress Code",
"url": "https://internal.example.com/handbook",
"relevance_score": 0.89
}
],
"context_chunks": 5,
"model_tier": "T3",
"generation_latency_ms": 1250,
"search_latency_ms": 85,
"has_direct_answer": true,
"needs_human_review": false
}
List Documents
curl -X GET "https://dev.api.olympuscloud.ai/v1/knowledge-base/documents?tenant_id=550e8400-e29b-41d4-a716-446655449100&category=hr&limit=10&offset=0" \
-H "Authorization: Bearer $TOKEN"
Delete Document
curl -X DELETE https://dev.api.olympuscloud.ai/v1/knowledge-base/documents/doc-uuid-1 \
-H "Authorization: Bearer $TOKEN"
Deleting a document removes it from both the Vectorize index and the keyword index. The operation cascades to all associated chunks.
Related Documentation
- AI Agent RAG Configuration -- Agent-level RAG config for Maximus, Minerva, Menu AI, and Dev Agent
- ACP AI Router -- Smart model routing and the T1-T6 tier system used for answer generation
- Agent Contexts and Personas -- Persona definitions for agents that consume Knowledge Hub data