Cloud Run Deployment
Deploy and manage Olympus Cloud services on Google Cloud Run.
Overview
Cloud Run hosts all core backend services:
| Service | Language | Port | CPU | Memory | Min Instances |
|---|---|---|---|---|---|
| API Gateway | Go | 8080 | 1 | 512Mi | 2 |
| Auth Service | Rust | 8001 | 1 | 512Mi | 1 |
| Platform Service | Rust | 8002 | 2 | 1Gi | 2 |
| Commerce Service | Rust | 8003 | 2 | 1Gi | 2 |
| Creator Service | Rust | 8004 | 1 | 512Mi | 1 |
| CMS Service | Rust | 8005 | 1 | 512Mi | 1 |
| Chat Service | Rust | 8007 | 1 | 512Mi | 1 |
| Alerting Service | Rust | 8080 | 1 | 512Mi | 1 |
| Distribution Service | Go | 8011 | 1 | 512Mi | 1 |
| ACP Server | Go | 8090 | 1 | 512Mi | 1 |
| IoT Service | Go (gRPC) | 50052 | 1 | 512Mi | 1 |
| Analytics Service | Python | 8004 | 2 | 2Gi | 1 |
| ML Service | Python | 8005 | 4 | 4Gi | 1 |
Service Configuration
Basic Deployment
# service.yaml
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: platform-service
annotations:
run.googleapis.com/ingress: internal-and-cloud-load-balancing
spec:
template:
metadata:
annotations:
autoscaling.knative.dev/minScale: "2"
autoscaling.knative.dev/maxScale: "100"
run.googleapis.com/cpu-throttling: "false"
run.googleapis.com/startup-cpu-boost: "true"
spec:
containerConcurrency: 80
timeoutSeconds: 300
serviceAccountName: platform-service@olympuscloud-prod.iam.gserviceaccount.com
containers:
- image: gcr.io/olympuscloud-prod/platform-service:latest
ports:
- containerPort: 8080
resources:
limits:
cpu: "2"
memory: 1Gi
env:
- name: ENVIRONMENT
value: production
- name: SPANNER_INSTANCE
value: olympus-prod
startupProbe:
httpGet:
path: /health/startup
port: 8080
initialDelaySeconds: 0
periodSeconds: 1
failureThreshold: 30
livenessProbe:
httpGet:
path: /health/live
port: 8080
periodSeconds: 10
readinessProbe:
httpGet:
path: /health/ready
port: 8080
periodSeconds: 5
Deploy Command
# Deploy service
gcloud run deploy platform-service \
--image gcr.io/olympuscloud-prod/platform-service:v1.2.3 \
--region us-central1 \
--platform managed \
--allow-unauthenticated \
--service-account platform-service@olympuscloud-prod.iam.gserviceaccount.com \
--set-env-vars "ENVIRONMENT=production,SPANNER_INSTANCE=prod-olympus-spanner" \
--min-instances 2 \
--max-instances 100 \
--cpu 2 \
--memory 1Gi \
--concurrency 80 \
--timeout 300
Auto-Scaling Configuration
Scaling Parameters
| Parameter | Default | Production | Description |
|---|---|---|---|
minScale | 0 | 2 | Minimum instances |
maxScale | 100 | 100 | Maximum instances |
concurrency | 80 | 80 | Requests per instance |
cpu-throttling | true | false | CPU allocation |
CPU Allocation
# Always-on CPU (recommended for production)
annotations:
run.googleapis.com/cpu-throttling: "false"
Always-on CPU Benefits:
- Consistent performance
- Background task processing
- WebSocket connections
- Lower cold start latency
Scaling Behavior
Requests → Load Balancer → Cloud Run
│
├── Instance 1 (80 concurrent)
├── Instance 2 (80 concurrent)
├── Instance 3 (80 concurrent)
└── ... up to maxScale
Traffic Management
Blue-Green Deployments
Best Practice
Always deploy new revisions with --no-traffic first and test the canary endpoint before shifting any production traffic. Start with 10% traffic split, monitor error rates for at least 5 minutes, then proceed to full rollout. This approach catches most regressions before they impact the majority of users.
# Deploy new revision without traffic
gcloud run deploy platform-service \
--image gcr.io/olympuscloud-prod/platform-service:v1.3.0 \
--no-traffic \
--tag canary
# Test canary endpoint
curl https://canary---platform-service-xyz.run.app/health
# Gradually shift traffic
gcloud run services update-traffic platform-service \
--to-tags canary=10
# Full rollout
gcloud run services update-traffic platform-service \
--to-latest
# Rollback if needed
gcloud run services update-traffic platform-service \
--to-revisions platform-service-v1-2-3=100
Traffic Splitting
# Split traffic between revisions
gcloud run services update-traffic platform-service \
--to-revisions \
platform-service-v1-2-3=90,\
platform-service-v1-3-0=10
Environment Variables
Configuration
env:
# Application config
- name: ENVIRONMENT
value: production
- name: LOG_LEVEL
value: info
- name: RUST_LOG
value: info,tower_http=debug
# Database
- name: SPANNER_PROJECT
value: olympuscloud-prod
- name: SPANNER_INSTANCE
value: olympus-prod
- name: SPANNER_DATABASE
value: olympus
# Service discovery
- name: COMMERCE_SERVICE_URL
value: https://commerce-service-xyz.run.app
- name: AI_SERVICE_URL
value: https://ai-service-xyz.run.app
Secrets
env:
- name: DATABASE_PASSWORD
valueFrom:
secretKeyRef:
name: database-password
key: latest
- name: API_KEY
valueFrom:
secretKeyRef:
name: api-key
key: latest
# Create secret
echo -n "secret-value" | gcloud secrets create api-key --data-file=-
# Grant access
gcloud secrets add-iam-policy-binding api-key \
--member serviceAccount:platform-service@olympuscloud-prod.iam.gserviceaccount.com \
--role roles/secretmanager.secretAccessor
Health Checks
Probe Configuration
// src/health.rs
use axum::{routing::get, Router, Json};
use serde::Serialize;
#[derive(Serialize)]
struct HealthResponse {
status: String,
version: String,
checks: Vec<HealthCheck>,
}
#[derive(Serialize)]
struct HealthCheck {
name: String,
status: String,
latency_ms: Option<u64>,
}
pub fn health_routes() -> Router {
Router::new()
.route("/health/startup", get(startup_check))
.route("/health/live", get(liveness_check))
.route("/health/ready", get(readiness_check))
}
async fn startup_check() -> &'static str {
"OK"
}
async fn liveness_check() -> &'static str {
"OK"
}
async fn readiness_check(
State(state): State<AppState>,
) -> Result<Json<HealthResponse>, StatusCode> {
let db_check = check_database(&state.db).await;
let cache_check = check_cache(&state.cache).await;
let all_healthy = db_check.status == "healthy"
&& cache_check.status == "healthy";
let response = HealthResponse {
status: if all_healthy { "healthy" } else { "degraded" }.into(),
version: env!("CARGO_PKG_VERSION").into(),
checks: vec![db_check, cache_check],
};
if all_healthy {
Ok(Json(response))
} else {
Err(StatusCode::SERVICE_UNAVAILABLE)
}
}
CI/CD Pipeline
Cloud Build Configuration
# cloudbuild.yaml
steps:
# Build container
- name: 'gcr.io/cloud-builders/docker'
args:
- 'build'
- '-t'
- 'gcr.io/$PROJECT_ID/platform-service:$SHORT_SHA'
- '-t'
- 'gcr.io/$PROJECT_ID/platform-service:latest'
- '-f'
- 'services/platform/Dockerfile'
- '.'
# Push to registry
- name: 'gcr.io/cloud-builders/docker'
args: ['push', '--all-tags', 'gcr.io/$PROJECT_ID/platform-service']
# Deploy to Cloud Run
- name: 'gcr.io/google.com/cloudsdktool/cloud-sdk'
entrypoint: gcloud
args:
- 'run'
- 'deploy'
- 'platform-service'
- '--image'
- 'gcr.io/$PROJECT_ID/platform-service:$SHORT_SHA'
- '--region'
- 'us-central1'
- '--platform'
- 'managed'
options:
logging: CLOUD_LOGGING_ONLY
substitutions:
_ENVIRONMENT: production
GitHub Actions
# .github/workflows/deploy.yml
name: Deploy to Cloud Run
on:
push:
branches: [main]
paths:
- 'services/platform/**'
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: google-github-actions/auth@v2
with:
credentials_json: ${{ secrets.GCP_SA_KEY }}
- uses: google-github-actions/setup-gcloud@v2
- name: Build and Push
run: |
gcloud builds submit \
--config cloudbuild.yaml \
--substitutions SHORT_SHA=${{ github.sha }}
Monitoring
Key Metrics
| Metric | Alert Threshold | Description |
|---|---|---|
request_count | - | Total requests |
request_latencies | p99 > 500ms | Response time |
instance_count | > 80% max | Scaling headroom |
billable_instance_time | Budget alerts | Cost tracking |
Cloud Monitoring Dashboard
{
"displayName": "Cloud Run Services",
"gridLayout": {
"widgets": [
{
"title": "Request Latency (p99)",
"xyChart": {
"dataSets": [{
"timeSeriesQuery": {
"timeSeriesFilter": {
"filter": "resource.type=\"cloud_run_revision\" AND metric.type=\"run.googleapis.com/request_latencies\"",
"aggregation": {
"perSeriesAligner": "ALIGN_PERCENTILE_99"
}
}
}
}]
}
}
]
}
}
Troubleshooting
Common Issues
| Issue | Cause | Solution |
|---|---|---|
| Cold start latency | Min instances = 0 | Set minScale: 2 |
| Request timeouts | Long operations | Increase timeout, use async |
| OOM errors | Memory limit | Increase memory allocation |
| Connection limits | Too many DB connections | Use connection pooling |
Debug Commands
# View logs
gcloud logging read "resource.type=cloud_run_revision AND resource.labels.service_name=platform-service" \
--limit 100 \
--format "table(timestamp,textPayload)"
# List revisions
gcloud run revisions list --service platform-service
# Describe service
gcloud run services describe platform-service --format yaml
# View current traffic
gcloud run services describe platform-service \
--format "value(status.traffic)"
Related Documentation
- Environments - Environment configuration
- Edge Workers - Cloudflare deployment
- Metrics - Monitoring setup