Admin API

This endpoint requires admin-level roles (platform_admin, tenant_admin, or system_admin). Accessible via the API gateway at /v1/platform/*.

Alerting & On-Call API

Custom alerting and on-call management platform replacing PagerDuty with edge-first architecture and AIOps capabilities.

Overview

Attribute	Value
Base Path	`/api/v1/alerting`
Authentication	Bearer Token
Required Roles	`platform_admin`, `system_admin`, `super_admin`, `support_agent`, `tenant_admin`, `restaurant_manager`, `manager`

Key Features

Edge-First Architecture - Primary on Cloudflare Workers, GCP Cloud Run failover
Multi-Source Ingestion - Prometheus, Cloud Monitoring, Sentry, custom webhooks
AIOps Engine - Anomaly detection, alert correlation, predictive alerting
On-Call Management - Rotations, overrides, escalation policies
Multi-Channel Notifications - SMS, voice, email, push, Slack, Teams

Alert Ingestion

Create Alert

POST /api/v1/alerting/alerts

Create a new alert from any source.

Request Body

{
  "source": "prometheus",
  "title": "High CPU Usage on api-gateway-prod",
  "description": "CPU usage exceeded 90% for 5 minutes",
  "severity": "P2",
  "service": "api-gateway",
  "environment": "production",
  "labels": {
    "pod": "api-gateway-prod-7b8f9",
    "region": "us-central1"
  },
  "annotations": {
    "runbook_url": "https://docs.olympuscloud.ai/runbooks/high-cpu",
    "dashboard_url": "https://grafana.olympuscloud.ai/d/cpu"
  },
  "fingerprint": "hash_abc123"
}

Severity Levels

Level	Description	Default Timeout
`P1`	Critical - Immediate attention	2 minutes
`P2`	High - Urgent response needed	5 minutes
`P3`	Medium - Normal priority	15 minutes
`P4`	Low - Can wait	1 hour
`P5`	Info - Notification only	N/A

Response

{
  "alert_id": "alert_001",
  "status": "firing",
  "severity": "P2",
  "title": "High CPU Usage on api-gateway-prod",
  "service": "api-gateway",
  "assigned_to": {
    "user_id": "usr_oncall_001",
    "name": "John Smith",
    "notification_sent": true
  },
  "escalation_policy_id": "esc_api_team",
  "aiops": {
    "correlated_alerts": ["alert_002", "alert_003"],
    "root_cause_probability": 0.85,
    "suggested_action": "Scale api-gateway deployment"
  },
  "created_at": "2026-01-24T21:00:00Z"
}

Prometheus AlertManager Webhook

POST /api/v1/alerting/webhooks/prometheus

Native Prometheus AlertManager integration.

Request Body (AlertManager format)

{
  "version": "4",
  "groupKey": "...",
  "status": "firing",
  "receiver": "olympus-alerting",
  "groupLabels": {},
  "commonLabels": {
    "alertname": "HighCPU",
    "severity": "critical"
  },
  "alerts": [
    {
      "status": "firing",
      "labels": {
        "alertname": "HighCPU",
        "instance": "api-gateway:9090"
      },
      "annotations": {
        "summary": "High CPU on api-gateway"
      },
      "startsAt": "2026-01-24T21:00:00Z"
    }
  ]
}

Cloud Monitoring Webhook

POST /api/v1/alerting/webhooks/gcp

Google Cloud Monitoring integration.

Sentry Webhook

POST /api/v1/alerting/webhooks/sentry

Sentry error tracking integration.

Alert Management

List Alerts

GET /api/v1/alerting/alerts

Query Parameters

Parameter	Type	Description
`status`	string	`firing`, `acknowledged`, `resolved`
`severity`	string	`P1`, `P2`, `P3`, `P4`, `P5`
`service`	string	Filter by service name
`assigned_to`	uuid	Filter by assignee
`since`	datetime	Alerts since timestamp

Response

{
  "alerts": [
    {
      "alert_id": "alert_001",
      "status": "firing",
      "severity": "P2",
      "title": "High CPU Usage",
      "service": "api-gateway",
      "created_at": "2026-01-24T21:00:00Z",
      "acknowledged_at": null,
      "assigned_to": "usr_oncall_001"
    }
  ],
  "total": 15,
  "firing": 3,
  "acknowledged": 5,
  "resolved": 7
}

Get Alert

GET /api/v1/alerting/alerts/{alert_id}

Acknowledge Alert

POST /api/v1/alerting/alerts/{alert_id}/acknowledge

Request Body

{
  "message": "Investigating the issue",
  "snooze_minutes": 30
}

Response

{
  "alert_id": "alert_001",
  "status": "acknowledged",
  "acknowledged_by": "usr_001",
  "acknowledged_at": "2026-01-24T21:05:00Z",
  "snooze_until": "2026-01-24T21:35:00Z"
}

Resolve Alert

POST /api/v1/alerting/alerts/{alert_id}/resolve

Request Body

{
  "resolution_notes": "Scaled deployment to 5 replicas",
  "root_cause": "Traffic spike from marketing campaign"
}

Escalate Alert

POST /api/v1/alerting/alerts/{alert_id}/escalate

Request Body

{
  "reason": "Unable to access production systems",
  "skip_levels": 1
}

On-Call Schedules

Create Schedule

POST /api/v1/alerting/schedules

Request Body

{
  "name": "API Team Primary",
  "description": "Primary on-call for API services",
  "timezone": "America/New_York",
  "rotation_type": "weekly",
  "handoff_time": "09:00",
  "handoff_day": "monday",
  "participants": [
    {"user_id": "usr_001", "order": 1},
    {"user_id": "usr_002", "order": 2},
    {"user_id": "usr_003", "order": 3},
    {"user_id": "usr_004", "order": 4}
  ],
  "layers": [
    {
      "name": "primary",
      "priority": 1
    },
    {
      "name": "secondary",
      "priority": 2,
      "participants": ["usr_005", "usr_006"]
    }
  ]
}

Rotation Types

Type	Description
`weekly`	Rotate every week
`daily`	Rotate every day
`custom`	Custom rotation period

Response

{
  "schedule_id": "sch_api_primary",
  "name": "API Team Primary",
  "current_oncall": {
    "user_id": "usr_001",
    "name": "John Smith",
    "until": "2026-01-27T09:00:00-05:00"
  },
  "next_oncall": {
    "user_id": "usr_002",
    "name": "Jane Doe",
    "starts": "2026-01-27T09:00:00-05:00"
  },
  "created_at": "2026-01-24T21:00:00Z"
}

List Schedules

GET /api/v1/alerting/schedules

Get Schedule

GET /api/v1/alerting/schedules/{schedule_id}

Get Current On-Call

GET /api/v1/alerting/schedules/{schedule_id}/oncall

Response

{
  "schedule_id": "sch_api_primary",
  "current": {
    "layer": "primary",
    "user_id": "usr_001",
    "name": "John Smith",
    "email": "john@nebusai.com",
    "phone": "+1234567890",
    "since": "2026-01-20T09:00:00-05:00",
    "until": "2026-01-27T09:00:00-05:00"
  },
  "secondary": {
    "user_id": "usr_005",
    "name": "Mike Johnson"
  }
}

Create Override

POST /api/v1/alerting/schedules/{schedule_id}/overrides

Request Body

{
  "user_id": "usr_003",
  "start": "2026-01-25T09:00:00-05:00",
  "end": "2026-01-26T09:00:00-05:00",
  "reason": "Covering for John's PTO"
}

Escalation Policies

Create Escalation Policy

POST /api/v1/alerting/escalation-policies

Request Body

{
  "name": "API Services Escalation",
  "description": "Escalation policy for all API services",
  "services": ["api-gateway", "auth-service", "commerce-service"],
  "levels": [
    {
      "level": 1,
      "timeout_minutes": 5,
      "targets": [
        {"type": "schedule", "id": "sch_api_primary"}
      ],
      "notification_channels": ["sms", "push"]
    },
    {
      "level": 2,
      "timeout_minutes": 10,
      "targets": [
        {"type": "schedule", "id": "sch_api_secondary"},
        {"type": "user", "id": "usr_manager_001"}
      ],
      "notification_channels": ["sms", "voice", "email"]
    },
    {
      "level": 3,
      "timeout_minutes": 15,
      "targets": [
        {"type": "user", "id": "usr_director_001"},
        {"type": "user", "id": "usr_cto"}
      ],
      "notification_channels": ["voice", "sms", "slack"]
    }
  ],
  "repeat_policy": {
    "enabled": true,
    "repeat_after_minutes": 30,
    "max_repeats": 3
  }
}

Response

{
  "policy_id": "esc_api_services",
  "name": "API Services Escalation",
  "services": ["api-gateway", "auth-service", "commerce-service"],
  "levels": 3,
  "created_at": "2026-01-24T21:00:00Z"
}

List Escalation Policies

GET /api/v1/alerting/escalation-policies

Get Escalation Policy

GET /api/v1/alerting/escalation-policies/{policy_id}

Notification Channels

Available Channels

Channel	Provider	Configuration
`sms`	Twilio	Phone number
`voice`	Twilio	Phone number + TTS message
`email`	SendGrid	Email address
`slack`	Slack API	Channel webhook URL
`teams`	MS Teams	Channel webhook URL
`push`	FCM/APNs	Device token
`in_app`	WebSocket	Real-time Cockpit

Configure User Notifications

PUT /api/v1/alerting/users/{user_id}/notifications

Request Body

{
  "channels": {
    "sms": {
      "enabled": true,
      "phone": "+1234567890"
    },
    "voice": {
      "enabled": true,
      "phone": "+1234567890",
      "voice": "en-US-Neural2-F"
    },
    "email": {
      "enabled": true,
      "address": "john@nebusai.com"
    },
    "slack": {
      "enabled": true,
      "user_id": "U12345678"
    },
    "push": {
      "enabled": true,
      "devices": ["device_token_1", "device_token_2"]
    }
  },
  "quiet_hours": {
    "enabled": true,
    "start": "22:00",
    "end": "07:00",
    "timezone": "America/New_York",
    "bypass_severity": ["P1"]
  }
}

AIOps Engine

Get AIOps Insights

GET /api/v1/alerting/aiops/insights

Response

{
  "insights": [
    {
      "type": "correlation",
      "confidence": 0.92,
      "description": "3 alerts appear to have the same root cause",
      "related_alerts": ["alert_001", "alert_002", "alert_003"],
      "suggested_root_cause": "Database connection pool exhaustion",
      "suggested_action": "Increase connection pool size"
    },
    {
      "type": "anomaly",
      "confidence": 0.87,
      "description": "Error rate anomaly detected",
      "metric": "error_rate",
      "expected": 0.01,
      "actual": 0.15,
      "suggested_action": "Review recent deployments"
    },
    {
      "type": "prediction",
      "confidence": 0.78,
      "description": "Predicted disk full in 4 hours",
      "metric": "disk_usage",
      "current": 0.85,
      "predicted": 1.0,
      "prediction_time": "2026-01-25T01:00:00Z"
    }
  ]
}

Get Alert Correlation

GET /api/v1/alerting/aiops/correlate/{alert_id}

Response

{
  "alert_id": "alert_001",
  "correlated_alerts": [
    {
      "alert_id": "alert_002",
      "correlation_score": 0.95,
      "correlation_reason": "Same service, temporal proximity"
    },
    {
      "alert_id": "alert_003",
      "correlation_score": 0.82,
      "correlation_reason": "Downstream dependency"
    }
  ],
  "root_cause_analysis": {
    "probable_cause": "Database connection timeout",
    "confidence": 0.89,
    "evidence": [
      "Connection pool metrics show exhaustion",
      "Database latency increased 5x"
    ],
    "suggested_remediation": [
      "Increase connection pool size",
      "Scale database read replicas"
    ]
  }
}

Suppress Similar Alerts

POST /api/v1/alerting/aiops/suppress

Request Body

{
  "alert_id": "alert_001",
  "suppress_similar": true,
  "duration_minutes": 60,
  "reason": "Known issue, fix in progress"
}

Maintenance Windows

Create Maintenance Window

POST /api/v1/alerting/maintenance

Request Body

{
  "name": "Database Maintenance",
  "description": "Scheduled database upgrade",
  "services": ["database", "api-gateway"],
  "start": "2026-01-25T02:00:00Z",
  "end": "2026-01-25T04:00:00Z",
  "suppress_all": false,
  "suppress_severity": ["P3", "P4", "P5"]
}

List Maintenance Windows

GET /api/v1/alerting/maintenance

Incident Management

Create Incident

POST /api/v1/alerting/incidents

Request Body

{
  "title": "API Gateway Outage",
  "severity": "P1",
  "description": "Complete API gateway failure affecting all services",
  "related_alerts": ["alert_001", "alert_002"],
  "commander": "usr_001"
}

Response

{
  "incident_id": "inc_001",
  "title": "API Gateway Outage",
  "severity": "P1",
  "status": "investigating",
  "commander": {
    "user_id": "usr_001",
    "name": "John Smith"
  },
  "chat_channel": {
    "id": "ch_inc_001",
    "name": "#incident-001-api-gateway"
  },
  "timeline": [
    {
      "timestamp": "2026-01-24T21:00:00Z",
      "type": "created",
      "message": "Incident created"
    }
  ],
  "created_at": "2026-01-24T21:00:00Z"
}

Update Incident Status

PUT /api/v1/alerting/incidents/{incident_id}/status

Request Body

{
  "status": "identified",
  "update": "Root cause identified: memory leak in auth service"
}

Status Values

Status	Description
`investigating`	Initial investigation
`identified`	Root cause identified
`monitoring`	Fix deployed, monitoring
`resolved`	Incident resolved
`postmortem`	Postmortem in progress
`closed`	Fully closed

Webhooks

Event	Description
`alert.created`	New alert created
`alert.acknowledged`	Alert acknowledged
`alert.resolved`	Alert resolved
`alert.escalated`	Alert escalated to next level
`incident.created`	Incident created
`incident.updated`	Incident status updated
`oncall.handoff`	On-call rotation handoff
`schedule.override`	Schedule override created

Error Responses

Status	Code	Description
400	`invalid_severity`	Invalid severity level
400	`invalid_schedule`	Invalid schedule configuration
404	`alert_not_found`	Alert ID not found
404	`schedule_not_found`	Schedule ID not found
409	`already_acknowledged`	Alert already acknowledged
409	`already_resolved`	Alert already resolved

Notifications API - Notification delivery
Cockpit Guide - Real-time alerting dashboard
Olympus Chat Guide - Incident chat channels
AIOps Guide - ML-powered alert intelligence

Overview​

Key Features​

Alert Ingestion​

Create Alert​

Prometheus AlertManager Webhook​

Cloud Monitoring Webhook​

Sentry Webhook​

Alert Management​

List Alerts​

Get Alert​

Acknowledge Alert​

Resolve Alert​

Escalate Alert​

On-Call Schedules​

Create Schedule​

List Schedules​

Get Schedule​

Get Current On-Call​

Create Override​

Escalation Policies​

Create Escalation Policy​

List Escalation Policies​

Get Escalation Policy​

Notification Channels​

Available Channels​

Configure User Notifications​

AIOps Engine​

Get AIOps Insights​

Get Alert Correlation​

Suppress Similar Alerts​

Maintenance Windows​

Create Maintenance Window​

List Maintenance Windows​

Incident Management​

Create Incident​

Update Incident Status​

Webhooks​

Error Responses​

Related Documentation​

Overview

Key Features

Alert Ingestion

Create Alert

Prometheus AlertManager Webhook

Cloud Monitoring Webhook

Sentry Webhook

Alert Management

List Alerts

Get Alert

Acknowledge Alert

Resolve Alert

Escalate Alert

On-Call Schedules

Create Schedule

List Schedules

Get Schedule

Get Current On-Call

Create Override

Escalation Policies

Create Escalation Policy

List Escalation Policies

Get Escalation Policy

Notification Channels

Available Channels

Configure User Notifications

AIOps Engine

Get AIOps Insights

Get Alert Correlation

Suppress Similar Alerts

Maintenance Windows

Create Maintenance Window

List Maintenance Windows

Incident Management

Create Incident

Update Incident Status

Webhooks

Error Responses

Related Documentation