Skip to main content

ACP Foundation

The AI Cost-optimized Platform (ACP) is the foundation layer for all AI operations in Olympus Cloud. It provides tiered model routing, cost tracking, safety controls, and agent orchestration across multiple backend services.

Overview

ACP Foundation spans three service layers:

  • AI Gateway (Python FastAPI) -- Routes AI inference requests to 6 model tiers from FREE to enterprise
  • AI Cost Analytics (Python FastAPI) -- Tracks usage, manages budgets, and provides cost optimization recommendations
  • AI Safety Controls (Python FastAPI) -- Content moderation, PII detection, bias monitoring, hallucination detection, prompt injection prevention, kill switch, and incident management
  • ACP Server (Go GraphQL) -- Workspace management, session handling, tool execution, and RAG search for AI agents
  • AI Proxy (Cloudflare Worker) -- OpenAI-compatible edge proxy for low-latency inference

Architecture

┌─────────────────────────────────────────────────────────────────┐
│ Flutter Shells / API Clients │
└────────────────────────┬────────────────────────────────────────┘

┌────────────────────────▼────────────────────────────────────────┐
│ Cloudflare Edge Layer │
│ ai-proxy Worker (OpenAI-compatible) │
└────────────────────────┬────────────────────────────────────────┘

┌────────────────────────▼────────────────────────────────────────┐
│ Go API Gateway (port 8080) │
│ GraphQL / REST routing │
└────────┬───────────────┬────────────────────────┬───────────────┘
│ │ │
┌────────▼────────┐ ┌────▼────────────┐ ┌────────▼────────────┐
│ Python │ │ Python │ │ Go ACP Server │
│ Analytics 8004 │ │ ML 8005 │ │ (port 8090) │
│ │ │ │ │ │
│ AI Gateway │ │ LangGraph │ │ GraphQL + WS │
│ AI Cost │ │ Agents │ │ Workspace Mgmt │
│ AI Safety │ │ │ │ Session Handling │
│ Conversational │ │ │ │ Tool Registry │
│ Analytics │ │ │ │ RAG Search │
└─────────────────┘ └─────────────────┘ └─────────────────────┘
│ │ │
└───────────────────┴────────────────────┘

┌──────────────┴──────────────┐
│ ACP AI Router (T1-T6) │
│ Model Selection + Fallback │
└──────────────────────────────┘

┌───────────┬───────┴───────┬───────────┐
│ │ │ │
Workers AI Anthropic Google OpenAI
(Llama 4) (Claude 4.5) (Gemini) (GPT-4o)

Service Locations

ComponentServicePortLocation
AI GatewayPython Analytics8004backend/python/app/api/ai_gateway_routes.py
AI Cost AnalyticsPython Analytics8004backend/python/app/api/ai_cost_routes.py
AI Safety ControlsPython Analytics8004backend/python/app/api/ai_safety_routes.py
LangGraph AgentsPython ML8005backend/python/app/api/agent_routes.py
ACP ServerGo ACP Server8090backend/go/cmd/acp-server/main.go
AI ProxyCloudflare Worker--workers/ai-proxy/

AI Gateway (Python FastAPI)

The AI Gateway is the primary inference routing service, implemented as a FastAPI router at backend/python/app/api/ai_gateway_routes.py. It provides cost-optimized chat completion by routing requests to the appropriate model based on a 6-tier system.

Model Tiers

TierNameModelProviderInput Cost/MOutput Cost/M
T1FREELlama 4 ScoutWorkers AI$0.00$0.00
T2BUDGETGemini 2.0 FlashGoogle$0.10$0.40
T3STANDARDGemini 3 FlashGoogle$0.50$3.00
T4QUALITYClaude Haiku 4.5Anthropic$1.00$5.00
T5PREMIUMClaude Sonnet 4.5Anthropic$3.00$15.00
T6ENTERPRISEClaude Opus 4.5Anthropic$5.00$25.00

TTS Tiers

TierModelCost (per 1k chars)Use For
TTS-T0Cloudflare Workers AI (Deepgram Aura)FREEStandard voice output
TTS-T1ElevenLabs Turbo v2.5$0.18Fast, basic quality
TTS-T2ElevenLabs STS v2$0.24Speech-to-speech
TTS-T3ElevenLabs v3$0.30Premium, most advanced

Gateway Endpoints

EndpointMethodDescription
/ai/chatPOSTTier-routed chat completion
/ai/chat/directPOSTDirect model access by provider and model ID
/ai/ttsPOSTText-to-speech via ElevenLabs
/ai/modelsGETList available models with pricing
/ai/tiersGETGet tier definitions and routing strategy
/ai/usage/{tenant_id}GETUsage statistics per tenant
/ai/healthGETGateway and provider health status

Chat Completion Example

POST /ai/chat
{
"messages": [{"role": "user", "content": "Hello"}],
"tenant_id": "restaurant-1",
"tier": "t1",
"temperature": 0.7,
"max_tokens": 1024,
"stream": false,
"cache_enabled": true
}

Response:

{
"content": "Hello! How can I help you today?",
"model": "llama-4-scout",
"provider": "workers-ai",
"tier": "t1",
"usage": {
"prompt_tokens": 12,
"completion_tokens": 9,
"total_tokens": 21
},
"cached": false,
"latency_ms": 245,
"estimated_cost": 0.0
}

Routing Strategy

The gateway uses cost-optimized routing with automatic fallback. It selects the lowest-cost model capable of handling the request complexity:

  1. Client specifies a tier (T1-T6) in the request
  2. Gateway routes to the corresponding model and provider
  3. If the primary model is unavailable, it falls back to the next available tier
  4. Response includes the actual model used, latency, and estimated cost

AI Cost Analytics

The AI Cost Analytics service (backend/python/app/api/ai_cost_routes.py) provides comprehensive cost tracking, budget management, and optimization recommendations.

Cost Dashboard Endpoints

EndpointMethodDescription
/ai-cost/summaryGETCost summary with breakdowns by tier, agent, and tenant
/ai-cost/tokensGETToken usage analytics with efficiency metrics
/ai-cost/performanceGETModel performance metrics (latency, success rate, cache hit rate)
/ai-cost/routerGETRouter analytics (tier distribution, cost savings)
/ai-cost/optimizationsGETCost optimization recommendations

Budget Management

EndpointMethodDescription
/ai-cost/budgetsPOSTSet budget for an agent or tenant
/ai-cost/budgets/{entity_id}GETGet budget status with projections
/ai-cost/budgetsGETList all budget statuses

Budget features include:

  • Monthly and daily budget limits
  • Configurable alert thresholds (default: 80% of budget)
  • Auto-throttle thresholds (default: 95% of budget)
  • Projected monthly spend and days-until-exhaustion forecasts

Reporting and Export

EndpointMethodDescription
/ai-cost/reports/allocationGETCost allocation report for chargeback
/ai-cost/reports/chargeback/{tenant_id}GETDetailed tenant chargeback report
/ai-cost/reports/monthlyGETMonthly summary with period-over-period comparison
/ai-cost/export/csvGETExport usage data as CSV
/ai-cost/export/excelGETExport usage data as Excel

Usage Tracking

POST /ai-cost/track
{
"tenant_id": "restaurant-1",
"agent_id": "business_assistant",
"tier": "t2",
"input_tokens": 500,
"output_tokens": 200,
"latency_ms": 342.5,
"success": true,
"cached": false,
"routing_decision": "direct"
}

Model Configuration

Per-tenant model configuration allows controlling which tiers are enabled, setting default and fallback tiers, and defining rate limits:

EndpointMethodDescription
/ai-cost/config/{tenant_id}GETGet model configuration
/ai-cost/config/{tenant_id}PUTUpdate model configuration
/ai-cost/models/statusGETGet enabled/disabled status per tier
/ai-cost/models/{tier}/enabledPUTEnable or disable a specific tier
/ai-cost/models/bulk-enablePUTBulk enable/disable multiple tiers

AI Safety Controls

The AI Safety service (backend/python/app/api/ai_safety_routes.py) provides multiple layers of protection for AI operations.

Safety Check Types

CheckDescriptionEndpoint
Content ModerationDetect harmful content, PII, policy violationsPOST /ai-safety/analyze/content
Bias DetectionDetect demographic bias in AI responsesPOST /ai-safety/analyze/bias
Hallucination DetectionVerify AI responses against source documentsPOST /ai-safety/analyze/hallucination
Prompt Injection PreventionDetect jailbreak attempts and malicious patternsPOST /ai-safety/analyze/prompt
Comprehensive AnalysisRun all checks on input and/or output textPOST /ai-safety/analyze/comprehensive

Kill Switch

The kill switch provides emergency controls to halt AI operations at various scopes:

ScopeDescriptionExample
GlobalStops ALL AI operationsEmergency shutdown
TenantStops AI for a specific tenantTenant policy violation
AgentStops a specific agentAgent misbehavior
LocationStops AI at a specific locationLocation-specific issue
Model TierStops a specific model tierModel quality issue

Kill switch endpoints:

EndpointMethodDescription
/ai-safety/kill-switch/activatePOSTActivate at specified scope
/ai-safety/kill-switch/deactivatePOSTDeactivate at specified scope
/ai-safety/kill-switch/emergencyPOSTGlobal emergency shutdown
/ai-safety/kill-switch/checkPOSTCheck if operations are blocked
/ai-safety/kill-switch/statusGETGet all active kill switches

Incident Management

EndpointMethodDescription
/ai-safety/incidentsPOSTCreate a safety incident
/ai-safety/incidents/{id}GETGet incident details
/ai-safety/incidents/{id}PATCHUpdate incident (status, severity, remediation)
/ai-safety/incidentsGETList incidents with filtering
/ai-safety/incidents/statsGETGet incident statistics

Incidents track severity (low, medium, high, critical), status lifecycle, timeline of events, and optional kill switch activation.


ACP Server (Go GraphQL)

The ACP Server is a standalone Go service (backend/go/cmd/acp-server/main.go) that provides workspace management, session handling, and tool execution for AI agents via a GraphQL API with WebSocket subscriptions.

Key Capabilities

CapabilityDescription
Workspace ManagementRegister and index codebases for AI access
Session HandlingPersistent context across AI agent interactions
Tool RegistryUnified tool execution framework (file_read, file_write, bash, git, semantic_search)
RAG SearchVertex AI-powered semantic search over indexed workspaces
Real-Time UpdatesWebSocket-based streaming of tool output and session events
SecurityJWT authentication, role-based access, audit logging

Configuration

VariableDescriptionDefault
--portServer port8090
--workspaceDefault workspace directory--
--playgroundEnable GraphQL playgroundtrue
--spanner-projectGCP Project ID for SpannerSPANNER_PROJECT_ID
--vertex-projectGCP Project for Vertex AI RAGGOOGLE_CLOUD_PROJECT

The ACP Server integrates with Cloud Spanner for state persistence, Vertex AI for RAG search, and the AI Router for model selection.


Integration with AI Router

All ACP components use the AI Router for model selection. The router implements cost-optimized selection with automatic fallback:

Request with tier hint

v
┌───────────────────┐
│ AI Router │
│ │
│ 1. Select model │
│ by tier │
│ 2. Check provider │
│ health │
│ 3. Fallback if │
│ unavailable │
└────────┬──────────┘

┌─────┼─────────────────┐
│ │ │
v v v
Workers AI Anthropic Google
(T1 FREE) (T4-T6) (T2-T3)

Cost Optimization Approach

The system achieves 95%+ cost savings through:

  1. T1 FREE tier -- Simple queries, greetings, and basic FAQ use Llama 4 Scout at zero cost
  2. Tier matching -- Each task type routes to the lowest capable tier
  3. Response caching -- Repeated similar queries return cached results
  4. Budget controls -- Per-tenant and per-agent budgets prevent cost overruns
  5. Usage tracking -- Every request is tracked for cost analysis and optimization

Epic and Issue References

ComponentIssueDescription
ACP AI Router Epic#944AI Router and Orchestration
Smart Router#945Cost-optimized model selection
AI Gateway Integration#946Cloudflare AI Gateway
LangGraph Agent Orchestrator#947Python LangGraph agents
Vectorize RAG#948Cloudflare Vectorize for RAG
AI Cost Analytics#1114Cockpit AI cost dashboard
AI Safety Controls#1113Safety controls and incident management