Operations Guide

This section covers deployment, monitoring, and operational procedures for the Olympus Cloud platform.

Infrastructure Overview

Services

Backend Services

Service	Technology	Cloud Run	Purpose
Auth	Rust (Axum)	`auth-service`	Authentication
Platform	Rust (GraphQL)	`platform-service`	Multi-tenancy
Commerce	Rust (Axum)	`commerce-service`	Orders, payments
Gateway	Go (Chi)	`api-gateway`	Unified routing
AI/ML	Python (FastAPI)	`ai-ml-service`	ML models, analytics

Edge Services

Service	Technology	Worker	Purpose
Main Worker	TypeScript	`olympus-edge`	Auth, routing, caching
AI Gateway	TypeScript	`ai-gateway`	Model routing
Restaurant AI	TypeScript	`restaurant-ai-chat`	Support agents

Deployment

Cloud Run

All backend services deploy via Cloud Build:

# Deploy a service
gcloud run deploy auth-service \
  --image gcr.io/olympuscloud/auth-service:latest \
  --region us-central1 \
  --platform managed

Cloudflare Workers

# Deploy worker
cd infrastructure/cloudflare-workers
wrangler deploy --env production

Monitoring

Key Metrics

Latency: P50, P95, P99 response times
Error Rate: 4xx, 5xx responses
Throughput: Requests per second
Saturation: CPU, memory utilization

Dashboards

Runbooks

Operational procedures and emergency response guides:

Emergency Response

Incident Response - First response procedures, severity levels, escalation
Disaster Recovery - DR procedures, failover, data recovery
On-Call Guide - On-call responsibilities, shift handoffs, escalation paths

Infrastructure Operations

Database Operations - Spanner operations, backup/restore, query optimization
Scaling - Auto-scaling, manual scaling, capacity planning
Deployment - Deployment procedures, rollback, blue-green deployments

Monitoring & Alerting

Monitoring & Alerts - Alert handling, threshold tuning, false positive management
Production Security - Security incident response, access reviews, audit procedures

Specialized Systems

Vision AI Operations - Vision AI system operations, model updates, camera troubleshooting
Data Sync & IoT - Edge sync issues, IoT device management, OlympusEdge operations

Environments

Environment	API URL	Purpose
Production	`api.olympuscloud.ai`	Live traffic
Staging	`staging.api.olympuscloud.ai`	Pre-production
Development	`dev.api.olympuscloud.ai`	Development

On-Call

For production issues:

Check status page
Review Cloud Monitoring alerts
Follow runbook for incident type
Escalate if needed

Infrastructure Overview​

Services​

Backend Services​

Edge Services​

Deployment​

Cloud Run​

Cloudflare Workers​

Monitoring​

Key Metrics​

Dashboards​

Runbooks​

Emergency Response​

Infrastructure Operations​

Monitoring & Alerting​

Specialized Systems​

Environments​

On-Call​