ACP Foundation
The AI Cost-optimized Platform (ACP) is the foundation layer for all AI operations in Olympus Cloud. It provides tiered model routing, cost tracking, safety controls, and agent orchestration across multiple backend services.
Overview
ACP Foundation spans three service layers:
- AI Gateway (Python FastAPI) -- Routes AI inference requests to 6 model tiers from FREE to enterprise
- AI Cost Analytics (Python FastAPI) -- Tracks usage, manages budgets, and provides cost optimization recommendations
- AI Safety Controls (Python FastAPI) -- Content moderation, PII detection, bias monitoring, hallucination detection, prompt injection prevention, kill switch, and incident management
- ACP Server (Go GraphQL) -- Workspace management, session handling, tool execution, and RAG search for AI agents
- AI Proxy (Cloudflare Worker) -- OpenAI-compatible edge proxy for low-latency inference
Architecture
┌─────────────────────────────────────────────────────────────────┐
│ Flutter Shells / API Clients │
└────────────────────────┬────────────────────────────────────────┘
│
┌────────────────────────▼────────────────────────────────────────┐
│ Cloudflare Edge Layer │
│ ai-proxy Worker (OpenAI-compatible) │
└────────────────────────┬────────────────────────────────────────┘
│
┌────────────────────────▼────────────────────────────────────────┐
│ Go API Gateway (port 8080) │
│ GraphQL / REST routing │
└────────┬───────────────┬────────────────────────┬───────────────┘
│ │ │
┌────────▼────────┐ ┌────▼────────────┐ ┌────────▼────────────┐
│ Python │ │ Python │ │ Go ACP Server │
│ Analytics 8004 │ │ ML 8005 │ │ (port 8090) │
│ │ │ │ │ │
│ AI Gateway │ │ LangGraph │ │ GraphQL + WS │
│ AI Cost │ │ Agents │ │ Workspace Mgmt │
│ AI Safety │ │ │ │ Session Handling │
│ Conversational │ │ │ │ Tool Registry │
│ Analytics │ │ │ │ RAG Search │
└─────────────────┘ └─────────────────┘ └─────────────────────┘
│ │ │
└───────────────────┴────────────────────┘
│
┌──────────────┴──────────────┐
│ ACP AI Router (T1-T6) │
│ Model Selection + Fallback │
└──────────────────────────────┘
│
┌───────────┬───────┴───────┬───────────┐
│ │ │ │
Workers AI Anthropic Google OpenAI
(Llama 4) (Claude 4.5) (Gemini) (GPT-4o)
Service Locations
| Component | Service | Port | Location |
|---|---|---|---|
| AI Gateway | Python Analytics | 8004 | backend/python/app/api/ai_gateway_routes.py |
| AI Cost Analytics | Python Analytics | 8004 | backend/python/app/api/ai_cost_routes.py |
| AI Safety Controls | Python Analytics | 8004 | backend/python/app/api/ai_safety_routes.py |
| LangGraph Agents | Python ML | 8005 | backend/python/app/api/agent_routes.py |
| ACP Server | Go ACP Server | 8090 | backend/go/cmd/acp-server/main.go |
| AI Proxy | Cloudflare Worker | -- | workers/ai-proxy/ |
AI Gateway (Python FastAPI)
The AI Gateway is the primary inference routing service, implemented as a FastAPI router at backend/python/app/api/ai_gateway_routes.py. It provides cost-optimized chat completion by routing requests to the appropriate model based on a 6-tier system.
Model Tiers
| Tier | Name | Model | Provider | Input Cost/M | Output Cost/M |
|---|---|---|---|---|---|
| T1 | FREE | Llama 4 Scout | Workers AI | $0.00 | $0.00 |
| T2 | BUDGET | Gemini 2.0 Flash | $0.10 | $0.40 | |
| T3 | STANDARD | Gemini 3 Flash | $0.50 | $3.00 | |
| T4 | QUALITY | Claude Haiku 4.5 | Anthropic | $1.00 | $5.00 |
| T5 | PREMIUM | Claude Sonnet 4.5 | Anthropic | $3.00 | $15.00 |
| T6 | ENTERPRISE | Claude Opus 4.5 | Anthropic | $5.00 | $25.00 |
TTS Tiers
| Tier | Model | Cost (per 1k chars) | Use For |
|---|---|---|---|
| TTS-T0 | Cloudflare Workers AI (Deepgram Aura) | FREE | Standard voice output |
| TTS-T1 | ElevenLabs Turbo v2.5 | $0.18 | Fast, basic quality |
| TTS-T2 | ElevenLabs STS v2 | $0.24 | Speech-to-speech |
| TTS-T3 | ElevenLabs v3 | $0.30 | Premium, most advanced |
Gateway Endpoints
| Endpoint | Method | Description |
|---|---|---|
/ai/chat | POST | Tier-routed chat completion |
/ai/chat/direct | POST | Direct model access by provider and model ID |
/ai/tts | POST | Text-to-speech via ElevenLabs |
/ai/models | GET | List available models with pricing |
/ai/tiers | GET | Get tier definitions and routing strategy |
/ai/usage/{tenant_id} | GET | Usage statistics per tenant |
/ai/health | GET | Gateway and provider health status |
Chat Completion Example
POST /ai/chat
{
"messages": [{"role": "user", "content": "Hello"}],
"tenant_id": "restaurant-1",
"tier": "t1",
"temperature": 0.7,
"max_tokens": 1024,
"stream": false,
"cache_enabled": true
}
Response:
{
"content": "Hello! How can I help you today?",
"model": "llama-4-scout",
"provider": "workers-ai",
"tier": "t1",
"usage": {
"prompt_tokens": 12,
"completion_tokens": 9,
"total_tokens": 21
},
"cached": false,
"latency_ms": 245,
"estimated_cost": 0.0
}
Routing Strategy
The gateway uses cost-optimized routing with automatic fallback. It selects the lowest-cost model capable of handling the request complexity:
- Client specifies a tier (T1-T6) in the request
- Gateway routes to the corresponding model and provider
- If the primary model is unavailable, it falls back to the next available tier
- Response includes the actual model used, latency, and estimated cost
AI Cost Analytics
The AI Cost Analytics service (backend/python/app/api/ai_cost_routes.py) provides comprehensive cost tracking, budget management, and optimization recommendations.
Cost Dashboard Endpoints
| Endpoint | Method | Description |
|---|---|---|
/ai-cost/summary | GET | Cost summary with breakdowns by tier, agent, and tenant |
/ai-cost/tokens | GET | Token usage analytics with efficiency metrics |
/ai-cost/performance | GET | Model performance metrics (latency, success rate, cache hit rate) |
/ai-cost/router | GET | Router analytics (tier distribution, cost savings) |
/ai-cost/optimizations | GET | Cost optimization recommendations |
Budget Management
| Endpoint | Method | Description |
|---|---|---|
/ai-cost/budgets | POST | Set budget for an agent or tenant |
/ai-cost/budgets/{entity_id} | GET | Get budget status with projections |
/ai-cost/budgets | GET | List all budget statuses |
Budget features include:
- Monthly and daily budget limits
- Configurable alert thresholds (default: 80% of budget)
- Auto-throttle thresholds (default: 95% of budget)
- Projected monthly spend and days-until-exhaustion forecasts
Reporting and Export
| Endpoint | Method | Description |
|---|---|---|
/ai-cost/reports/allocation | GET | Cost allocation report for chargeback |
/ai-cost/reports/chargeback/{tenant_id} | GET | Detailed tenant chargeback report |
/ai-cost/reports/monthly | GET | Monthly summary with period-over-period comparison |
/ai-cost/export/csv | GET | Export usage data as CSV |
/ai-cost/export/excel | GET | Export usage data as Excel |
Usage Tracking
POST /ai-cost/track
{
"tenant_id": "restaurant-1",
"agent_id": "business_assistant",
"tier": "t2",
"input_tokens": 500,
"output_tokens": 200,
"latency_ms": 342.5,
"success": true,
"cached": false,
"routing_decision": "direct"
}
Model Configuration
Per-tenant model configuration allows controlling which tiers are enabled, setting default and fallback tiers, and defining rate limits:
| Endpoint | Method | Description |
|---|---|---|
/ai-cost/config/{tenant_id} | GET | Get model configuration |
/ai-cost/config/{tenant_id} | PUT | Update model configuration |
/ai-cost/models/status | GET | Get enabled/disabled status per tier |
/ai-cost/models/{tier}/enabled | PUT | Enable or disable a specific tier |
/ai-cost/models/bulk-enable | PUT | Bulk enable/disable multiple tiers |
AI Safety Controls
The AI Safety service (backend/python/app/api/ai_safety_routes.py) provides multiple layers of protection for AI operations.
Safety Check Types
| Check | Description | Endpoint |
|---|---|---|
| Content Moderation | Detect harmful content, PII, policy violations | POST /ai-safety/analyze/content |
| Bias Detection | Detect demographic bias in AI responses | POST /ai-safety/analyze/bias |
| Hallucination Detection | Verify AI responses against source documents | POST /ai-safety/analyze/hallucination |
| Prompt Injection Prevention | Detect jailbreak attempts and malicious patterns | POST /ai-safety/analyze/prompt |
| Comprehensive Analysis | Run all checks on input and/or output text | POST /ai-safety/analyze/comprehensive |
Kill Switch
The kill switch provides emergency controls to halt AI operations at various scopes:
| Scope | Description | Example |
|---|---|---|
| Global | Stops ALL AI operations | Emergency shutdown |
| Tenant | Stops AI for a specific tenant | Tenant policy violation |
| Agent | Stops a specific agent | Agent misbehavior |
| Location | Stops AI at a specific location | Location-specific issue |
| Model Tier | Stops a specific model tier | Model quality issue |
Kill switch endpoints:
| Endpoint | Method | Description |
|---|---|---|
/ai-safety/kill-switch/activate | POST | Activate at specified scope |
/ai-safety/kill-switch/deactivate | POST | Deactivate at specified scope |
/ai-safety/kill-switch/emergency | POST | Global emergency shutdown |
/ai-safety/kill-switch/check | POST | Check if operations are blocked |
/ai-safety/kill-switch/status | GET | Get all active kill switches |
Incident Management
| Endpoint | Method | Description |
|---|---|---|
/ai-safety/incidents | POST | Create a safety incident |
/ai-safety/incidents/{id} | GET | Get incident details |
/ai-safety/incidents/{id} | PATCH | Update incident (status, severity, remediation) |
/ai-safety/incidents | GET | List incidents with filtering |
/ai-safety/incidents/stats | GET | Get incident statistics |
Incidents track severity (low, medium, high, critical), status lifecycle, timeline of events, and optional kill switch activation.
ACP Server (Go GraphQL)
The ACP Server is a standalone Go service (backend/go/cmd/acp-server/main.go) that provides workspace management, session handling, and tool execution for AI agents via a GraphQL API with WebSocket subscriptions.
Key Capabilities
| Capability | Description |
|---|---|
| Workspace Management | Register and index codebases for AI access |
| Session Handling | Persistent context across AI agent interactions |
| Tool Registry | Unified tool execution framework (file_read, file_write, bash, git, semantic_search) |
| RAG Search | Vertex AI-powered semantic search over indexed workspaces |
| Real-Time Updates | WebSocket-based streaming of tool output and session events |
| Security | JWT authentication, role-based access, audit logging |
Configuration
| Variable | Description | Default |
|---|---|---|
--port | Server port | 8090 |
--workspace | Default workspace directory | -- |
--playground | Enable GraphQL playground | true |
--spanner-project | GCP Project ID for Spanner | SPANNER_PROJECT_ID |
--vertex-project | GCP Project for Vertex AI RAG | GOOGLE_CLOUD_PROJECT |
The ACP Server integrates with Cloud Spanner for state persistence, Vertex AI for RAG search, and the AI Router for model selection.
Integration with AI Router
All ACP components use the AI Router for model selection. The router implements cost-optimized selection with automatic fallback:
Request with tier hint
│
v
┌───────────────────┐
│ AI Router │
│ │
│ 1. Select model │
│ by tier │
│ 2. Check provider │
│ health │
│ 3. Fallback if │
│ unavailable │
└────────┬──────────┘
│
┌─────┼─────────────────┐
│ │ │
v v v
Workers AI Anthropic Google
(T1 FREE) (T4-T6) (T2-T3)
Cost Optimization Approach
The system achieves 95%+ cost savings through:
- T1 FREE tier -- Simple queries, greetings, and basic FAQ use Llama 4 Scout at zero cost
- Tier matching -- Each task type routes to the lowest capable tier
- Response caching -- Repeated similar queries return cached results
- Budget controls -- Per-tenant and per-agent budgets prevent cost overruns
- Usage tracking -- Every request is tracked for cost analysis and optimization
Epic and Issue References
| Component | Issue | Description |
|---|---|---|
| ACP AI Router Epic | #944 | AI Router and Orchestration |
| Smart Router | #945 | Cost-optimized model selection |
| AI Gateway Integration | #946 | Cloudflare AI Gateway |
| LangGraph Agent Orchestrator | #947 | Python LangGraph agents |
| Vectorize RAG | #948 | Cloudflare Vectorize for RAG |
| AI Cost Analytics | #1114 | Cockpit AI cost dashboard |
| AI Safety Controls | #1113 | Safety controls and incident management |
Related Documentation
- ACP AI Router -- Model routing and cost optimization details
- LangGraph Agent Workflows -- Stateful agent architecture
- Cockpit Operations -- ACP monitoring in the Cockpit UI