Skip to main content

Operations Guide

This section covers deployment, monitoring, and operational procedures for the Olympus Cloud platform.

Infrastructure Overview

Services

Backend Services

ServiceTechnologyCloud RunPurpose
AuthRust (Axum)auth-serviceAuthentication
PlatformRust (GraphQL)platform-serviceMulti-tenancy
CommerceRust (Axum)commerce-serviceOrders, payments
GatewayGo (Chi)api-gatewayUnified routing
AI/MLPython (FastAPI)ai-ml-serviceML models, analytics

Edge Services

ServiceTechnologyWorkerPurpose
Main WorkerTypeScriptolympus-edgeAuth, routing, caching
AI GatewayTypeScriptai-gatewayModel routing
Restaurant AITypeScriptrestaurant-ai-chatSupport agents

Deployment

Cloud Run

All backend services deploy via Cloud Build:

# Deploy a service
gcloud run deploy auth-service \
--image gcr.io/olympuscloud/auth-service:latest \
--region us-central1 \
--platform managed

Cloudflare Workers

# Deploy worker
cd infrastructure/cloudflare-workers
wrangler deploy --env production

Monitoring

Key Metrics

  • Latency: P50, P95, P99 response times
  • Error Rate: 4xx, 5xx responses
  • Throughput: Requests per second
  • Saturation: CPU, memory utilization

Dashboards

Runbooks

Operational procedures and emergency response guides:

Emergency Response

Infrastructure Operations

  • Database Operations - Spanner operations, backup/restore, query optimization
  • Scaling - Auto-scaling, manual scaling, capacity planning
  • Deployment - Deployment procedures, rollback, blue-green deployments

Monitoring & Alerting

Specialized Systems

Environments

EnvironmentAPI URLPurpose
Productionapi.olympuscloud.aiLive traffic
Stagingstaging.api.olympuscloud.aiPre-production
Developmentdev.api.olympuscloud.aiDevelopment

On-Call

For production issues:

  1. Check status page
  2. Review Cloud Monitoring alerts
  3. Follow runbook for incident type
  4. Escalate if needed