ACP Foundation

The AI Cost-optimized Platform (ACP) is the foundation layer for all AI operations in Olympus Cloud. It provides tiered model routing, cost tracking, safety controls, and agent orchestration across multiple backend services.

Overview

ACP Foundation spans three service layers:

AI Gateway (Python FastAPI) -- Routes AI inference requests to 6 model tiers from FREE to enterprise
AI Cost Analytics (Python FastAPI) -- Tracks usage, manages budgets, and provides cost optimization recommendations
AI Safety Controls (Python FastAPI) -- Content moderation, PII detection, bias monitoring, hallucination detection, prompt injection prevention, kill switch, and incident management
ACP Server (Go GraphQL) -- Workspace management, session handling, tool execution, and RAG search for AI agents
AI Proxy (Cloudflare Worker) -- OpenAI-compatible edge proxy for low-latency inference

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                  Flutter Shells / API Clients                     │
└────────────────────────┬────────────────────────────────────────┘
                         │
┌────────────────────────▼────────────────────────────────────────┐
│                 Cloudflare Edge Layer                             │
│              ai-proxy Worker (OpenAI-compatible)                 │
└────────────────────────┬────────────────────────────────────────┘
                         │
┌────────────────────────▼────────────────────────────────────────┐
│                   Go API Gateway (port 8080)                     │
│                GraphQL / REST routing                            │
└────────┬───────────────┬────────────────────────┬───────────────┘
         │               │                        │
┌────────▼────────┐ ┌────▼────────────┐ ┌────────▼────────────┐
│  Python         │ │  Python         │ │  Go ACP Server      │
│  Analytics 8004 │ │  ML 8005        │ │  (port 8090)        │
│                 │ │                 │ │                     │
│  AI Gateway     │ │  LangGraph      │ │  GraphQL + WS       │
│  AI Cost        │ │  Agents         │ │  Workspace Mgmt     │
│  AI Safety      │ │                 │ │  Session Handling    │
│  Conversational │ │                 │ │  Tool Registry       │
│  Analytics      │ │                 │ │  RAG Search          │
└─────────────────┘ └─────────────────┘ └─────────────────────┘
         │                   │                    │
         └───────────────────┴────────────────────┘
                             │
              ┌──────────────┴──────────────┐
              │   ACP AI Router (T1-T6)      │
              │   Model Selection + Fallback │
              └──────────────────────────────┘
                             │
         ┌───────────┬───────┴───────┬───────────┐
         │           │               │           │
    Workers AI   Anthropic       Google       OpenAI
    (Llama 4)   (Claude 4.5)   (Gemini)     (GPT-4o)

Service Locations

Component	Service	Port	Location
AI Gateway	Python Analytics	8004	`backend/python/app/api/ai_gateway_routes.py`
AI Cost Analytics	Python Analytics	8004	`backend/python/app/api/ai_cost_routes.py`
AI Safety Controls	Python Analytics	8004	`backend/python/app/api/ai_safety_routes.py`
LangGraph Agents	Python ML	8005	`backend/python/app/api/agent_routes.py`
ACP Server	Go ACP Server	8090	`backend/go/cmd/acp-server/main.go`
AI Proxy	Cloudflare Worker	--	`workers/ai-proxy/`

AI Gateway (Python FastAPI)

The AI Gateway is the primary inference routing service, implemented as a FastAPI router at backend/python/app/api/ai_gateway_routes.py. It provides cost-optimized chat completion by routing requests to the appropriate model based on a 6-tier system.

Model Tiers

Tier	Name	Model	Provider	Input Cost/M	Output Cost/M
T1	FREE	Llama 4 Scout	Workers AI	$0.00	$0.00
T2	BUDGET	Gemini 2.0 Flash	Google	$0.10	$0.40
T3	STANDARD	Gemini 3 Flash	Google	$0.50	$3.00
T4	QUALITY	Claude Haiku 4.5	Anthropic	$1.00	$5.00
T5	PREMIUM	Claude Sonnet 4.5	Anthropic	$3.00	$15.00
T6	ENTERPRISE	Claude Opus 4.5	Anthropic	$5.00	$25.00

TTS Tiers

Tier	Model	Cost (per 1k chars)	Use For
TTS-T0	Cloudflare Workers AI (Deepgram Aura)	FREE	Standard voice output
TTS-T1	ElevenLabs Turbo v2.5	$0.18	Fast, basic quality
TTS-T2	ElevenLabs STS v2	$0.24	Speech-to-speech
TTS-T3	ElevenLabs v3	$0.30	Premium, most advanced

Gateway Endpoints

Endpoint	Method	Description
`/ai/chat`	POST	Tier-routed chat completion
`/ai/chat/direct`	POST	Direct model access by provider and model ID
`/ai/tts`	POST	Text-to-speech via ElevenLabs
`/ai/models`	GET	List available models with pricing
`/ai/tiers`	GET	Get tier definitions and routing strategy
`/ai/usage/{tenant_id}`	GET	Usage statistics per tenant
`/ai/health`	GET	Gateway and provider health status

Chat Completion Example

POST /ai/chat
{
  "messages": [{"role": "user", "content": "Hello"}],
  "tenant_id": "restaurant-1",
  "tier": "t1",
  "temperature": 0.7,
  "max_tokens": 1024,
  "stream": false,
  "cache_enabled": true
}

Response:

{
  "content": "Hello! How can I help you today?",
  "model": "llama-4-scout",
  "provider": "workers-ai",
  "tier": "t1",
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 9,
    "total_tokens": 21
  },
  "cached": false,
  "latency_ms": 245,
  "estimated_cost": 0.0
}

Routing Strategy

The gateway uses cost-optimized routing with automatic fallback. It selects the lowest-cost model capable of handling the request complexity:

Client specifies a tier (T1-T6) in the request
Gateway routes to the corresponding model and provider
If the primary model is unavailable, it falls back to the next available tier
Response includes the actual model used, latency, and estimated cost

AI Cost Analytics

The AI Cost Analytics service (backend/python/app/api/ai_cost_routes.py) provides comprehensive cost tracking, budget management, and optimization recommendations.

Cost Dashboard Endpoints

Endpoint	Method	Description
`/ai-cost/summary`	GET	Cost summary with breakdowns by tier, agent, and tenant
`/ai-cost/tokens`	GET	Token usage analytics with efficiency metrics
`/ai-cost/performance`	GET	Model performance metrics (latency, success rate, cache hit rate)
`/ai-cost/router`	GET	Router analytics (tier distribution, cost savings)
`/ai-cost/optimizations`	GET	Cost optimization recommendations

Budget Management

Endpoint	Method	Description
`/ai-cost/budgets`	POST	Set budget for an agent or tenant
`/ai-cost/budgets/{entity_id}`	GET	Get budget status with projections
`/ai-cost/budgets`	GET	List all budget statuses

Budget features include:

Monthly and daily budget limits
Configurable alert thresholds (default: 80% of budget)
Auto-throttle thresholds (default: 95% of budget)
Projected monthly spend and days-until-exhaustion forecasts

Reporting and Export

Endpoint	Method	Description
`/ai-cost/reports/allocation`	GET	Cost allocation report for chargeback
`/ai-cost/reports/chargeback/{tenant_id}`	GET	Detailed tenant chargeback report
`/ai-cost/reports/monthly`	GET	Monthly summary with period-over-period comparison
`/ai-cost/export/csv`	GET	Export usage data as CSV
`/ai-cost/export/excel`	GET	Export usage data as Excel

Usage Tracking

POST /ai-cost/track
{
  "tenant_id": "restaurant-1",
  "agent_id": "business_assistant",
  "tier": "t2",
  "input_tokens": 500,
  "output_tokens": 200,
  "latency_ms": 342.5,
  "success": true,
  "cached": false,
  "routing_decision": "direct"
}

Model Configuration

Per-tenant model configuration allows controlling which tiers are enabled, setting default and fallback tiers, and defining rate limits:

Endpoint	Method	Description
`/ai-cost/config/{tenant_id}`	GET	Get model configuration
`/ai-cost/config/{tenant_id}`	PUT	Update model configuration
`/ai-cost/models/status`	GET	Get enabled/disabled status per tier
`/ai-cost/models/{tier}/enabled`	PUT	Enable or disable a specific tier
`/ai-cost/models/bulk-enable`	PUT	Bulk enable/disable multiple tiers

AI Safety Controls

The AI Safety service (backend/python/app/api/ai_safety_routes.py) provides multiple layers of protection for AI operations.

Safety Check Types

Check	Description	Endpoint
Content Moderation	Detect harmful content, PII, policy violations	`POST /ai-safety/analyze/content`
Bias Detection	Detect demographic bias in AI responses	`POST /ai-safety/analyze/bias`
Hallucination Detection	Verify AI responses against source documents	`POST /ai-safety/analyze/hallucination`
Prompt Injection Prevention	Detect jailbreak attempts and malicious patterns	`POST /ai-safety/analyze/prompt`
Comprehensive Analysis	Run all checks on input and/or output text	`POST /ai-safety/analyze/comprehensive`

Kill Switch

The kill switch provides emergency controls to halt AI operations at various scopes:

Scope	Description	Example
Global	Stops ALL AI operations	Emergency shutdown
Tenant	Stops AI for a specific tenant	Tenant policy violation
Agent	Stops a specific agent	Agent misbehavior
Location	Stops AI at a specific location	Location-specific issue
Model Tier	Stops a specific model tier	Model quality issue

Kill switch endpoints:

Endpoint	Method	Description
`/ai-safety/kill-switch/activate`	POST	Activate at specified scope
`/ai-safety/kill-switch/deactivate`	POST	Deactivate at specified scope
`/ai-safety/kill-switch/emergency`	POST	Global emergency shutdown
`/ai-safety/kill-switch/check`	POST	Check if operations are blocked
`/ai-safety/kill-switch/status`	GET	Get all active kill switches

Incident Management

Endpoint	Method	Description
`/ai-safety/incidents`	POST	Create a safety incident
`/ai-safety/incidents/{id}`	GET	Get incident details
`/ai-safety/incidents/{id}`	PATCH	Update incident (status, severity, remediation)
`/ai-safety/incidents`	GET	List incidents with filtering
`/ai-safety/incidents/stats`	GET	Get incident statistics

Incidents track severity (low, medium, high, critical), status lifecycle, timeline of events, and optional kill switch activation.

ACP Server (Go GraphQL)

The ACP Server is a standalone Go service (backend/go/cmd/acp-server/main.go) that provides workspace management, session handling, and tool execution for AI agents via a GraphQL API with WebSocket subscriptions.

Key Capabilities

Capability	Description
Workspace Management	Register and index codebases for AI access
Session Handling	Persistent context across AI agent interactions
Tool Registry	Unified tool execution framework (file_read, file_write, bash, git, semantic_search)
RAG Search	Vertex AI-powered semantic search over indexed workspaces
Real-Time Updates	WebSocket-based streaming of tool output and session events
Security	JWT authentication, role-based access, audit logging

Configuration

Variable	Description	Default
`--port`	Server port	8090
`--workspace`	Default workspace directory	--
`--playground`	Enable GraphQL playground	true
`--spanner-project`	GCP Project ID for Spanner	`SPANNER_PROJECT_ID`
`--vertex-project`	GCP Project for Vertex AI RAG	`GOOGLE_CLOUD_PROJECT`

The ACP Server integrates with Cloud Spanner for state persistence, Vertex AI for RAG search, and the AI Router for model selection.

Integration with AI Router

All ACP components use the AI Router for model selection. The router implements cost-optimized selection with automatic fallback:

Request with tier hint
        │
        v
┌───────────────────┐
│    AI Router      │
│                   │
│ 1. Select model   │
│    by tier        │
│ 2. Check provider │
│    health         │
│ 3. Fallback if    │
│    unavailable    │
└────────┬──────────┘
         │
   ┌─────┼─────────────────┐
   │     │                 │
   v     v                 v
Workers AI  Anthropic    Google
(T1 FREE)   (T4-T6)    (T2-T3)

Cost Optimization Approach

The system achieves 95%+ cost savings through:

T1 FREE tier -- Simple queries, greetings, and basic FAQ use Llama 4 Scout at zero cost
Tier matching -- Each task type routes to the lowest capable tier
Response caching -- Repeated similar queries return cached results
Budget controls -- Per-tenant and per-agent budgets prevent cost overruns
Usage tracking -- Every request is tracked for cost analysis and optimization

Epic and Issue References

Component	Issue	Description
ACP AI Router Epic	#944	AI Router and Orchestration
Smart Router	#945	Cost-optimized model selection
AI Gateway Integration	#946	Cloudflare AI Gateway
LangGraph Agent Orchestrator	#947	Python LangGraph agents
Vectorize RAG	#948	Cloudflare Vectorize for RAG
AI Cost Analytics	#1114	Cockpit AI cost dashboard
AI Safety Controls	#1113	Safety controls and incident management

ACP AI Router -- Model routing and cost optimization details
LangGraph Agent Workflows -- Stateful agent architecture
Cockpit Operations -- ACP monitoring in the Cockpit UI

Overview​

Architecture​

Service Locations​

AI Gateway (Python FastAPI)​

Model Tiers​

TTS Tiers​

Gateway Endpoints​

Chat Completion Example​

Routing Strategy​

AI Cost Analytics​

Cost Dashboard Endpoints​

Budget Management​

Reporting and Export​

Usage Tracking​

Model Configuration​

AI Safety Controls​

Safety Check Types​

Kill Switch​

Incident Management​

ACP Server (Go GraphQL)​

Key Capabilities​

Configuration​

Integration with AI Router​

Cost Optimization Approach​

Epic and Issue References​

Related Documentation​