Admin API
This endpoint requires admin-level roles (platform_admin, tenant_admin, or system_admin). Accessible via the API gateway at /v1/platform/*.
Alerting & On-Call API
Custom alerting and on-call management platform replacing PagerDuty with edge-first architecture and AIOps capabilities.
Overview
| Attribute | Value |
|---|---|
| Base Path | /api/v1/alerting |
| Authentication | Bearer Token |
| Required Roles | platform_admin, system_admin, super_admin, support_agent, tenant_admin, restaurant_manager, manager |
Key Features
- Edge-First Architecture - Primary on Cloudflare Workers, GCP Cloud Run failover
- Multi-Source Ingestion - Prometheus, Cloud Monitoring, Sentry, custom webhooks
- AIOps Engine - Anomaly detection, alert correlation, predictive alerting
- On-Call Management - Rotations, overrides, escalation policies
- Multi-Channel Notifications - SMS, voice, email, push, Slack, Teams
Alert Ingestion
Create Alert
POST /api/v1/alerting/alerts
Create a new alert from any source.
Request Body
{
"source": "prometheus",
"title": "High CPU Usage on api-gateway-prod",
"description": "CPU usage exceeded 90% for 5 minutes",
"severity": "P2",
"service": "api-gateway",
"environment": "production",
"labels": {
"pod": "api-gateway-prod-7b8f9",
"region": "us-central1"
},
"annotations": {
"runbook_url": "https://docs.olympuscloud.ai/runbooks/high-cpu",
"dashboard_url": "https://grafana.olympuscloud.ai/d/cpu"
},
"fingerprint": "hash_abc123"
}
Severity Levels
| Level | Description | Default Timeout |
|---|---|---|
P1 | Critical - Immediate attention | 2 minutes |
P2 | High - Urgent response needed | 5 minutes |
P3 | Medium - Normal priority | 15 minutes |
P4 | Low - Can wait | 1 hour |
P5 | Info - Notification only | N/A |
Response
{
"alert_id": "alert_001",
"status": "firing",
"severity": "P2",
"title": "High CPU Usage on api-gateway-prod",
"service": "api-gateway",
"assigned_to": {
"user_id": "usr_oncall_001",
"name": "John Smith",
"notification_sent": true
},
"escalation_policy_id": "esc_api_team",
"aiops": {
"correlated_alerts": ["alert_002", "alert_003"],
"root_cause_probability": 0.85,
"suggested_action": "Scale api-gateway deployment"
},
"created_at": "2026-01-24T21:00:00Z"
}
Prometheus AlertManager Webhook
POST /api/v1/alerting/webhooks/prometheus
Native Prometheus AlertManager integration.
Request Body (AlertManager format)
{
"version": "4",
"groupKey": "...",
"status": "firing",
"receiver": "olympus-alerting",
"groupLabels": {},
"commonLabels": {
"alertname": "HighCPU",
"severity": "critical"
},
"alerts": [
{
"status": "firing",
"labels": {
"alertname": "HighCPU",
"instance": "api-gateway:9090"
},
"annotations": {
"summary": "High CPU on api-gateway"
},
"startsAt": "2026-01-24T21:00:00Z"
}
]
}
Cloud Monitoring Webhook
POST /api/v1/alerting/webhooks/gcp
Google Cloud Monitoring integration.
Sentry Webhook
POST /api/v1/alerting/webhooks/sentry
Sentry error tracking integration.
Alert Management
List Alerts
GET /api/v1/alerting/alerts
Query Parameters
| Parameter | Type | Description |
|---|---|---|
status | string | firing, acknowledged, resolved |
severity | string | P1, P2, P3, P4, P5 |
service | string | Filter by service name |
assigned_to | uuid | Filter by assignee |
since | datetime | Alerts since timestamp |
Response
{
"alerts": [
{
"alert_id": "alert_001",
"status": "firing",
"severity": "P2",
"title": "High CPU Usage",
"service": "api-gateway",
"created_at": "2026-01-24T21:00:00Z",
"acknowledged_at": null,
"assigned_to": "usr_oncall_001"
}
],
"total": 15,
"firing": 3,
"acknowledged": 5,
"resolved": 7
}
Get Alert
GET /api/v1/alerting/alerts/{alert_id}
Acknowledge Alert
POST /api/v1/alerting/alerts/{alert_id}/acknowledge
Request Body
{
"message": "Investigating the issue",
"snooze_minutes": 30
}
Response
{
"alert_id": "alert_001",
"status": "acknowledged",
"acknowledged_by": "usr_001",
"acknowledged_at": "2026-01-24T21:05:00Z",
"snooze_until": "2026-01-24T21:35:00Z"
}
Resolve Alert
POST /api/v1/alerting/alerts/{alert_id}/resolve
Request Body
{
"resolution_notes": "Scaled deployment to 5 replicas",
"root_cause": "Traffic spike from marketing campaign"
}
Escalate Alert
POST /api/v1/alerting/alerts/{alert_id}/escalate
Request Body
{
"reason": "Unable to access production systems",
"skip_levels": 1
}
On-Call Schedules
Create Schedule
POST /api/v1/alerting/schedules
Request Body
{
"name": "API Team Primary",
"description": "Primary on-call for API services",
"timezone": "America/New_York",
"rotation_type": "weekly",
"handoff_time": "09:00",
"handoff_day": "monday",
"participants": [
{"user_id": "usr_001", "order": 1},
{"user_id": "usr_002", "order": 2},
{"user_id": "usr_003", "order": 3},
{"user_id": "usr_004", "order": 4}
],
"layers": [
{
"name": "primary",
"priority": 1
},
{
"name": "secondary",
"priority": 2,
"participants": ["usr_005", "usr_006"]
}
]
}
Rotation Types
| Type | Description |
|---|---|
weekly | Rotate every week |
daily | Rotate every day |
custom | Custom rotation period |
Response
{
"schedule_id": "sch_api_primary",
"name": "API Team Primary",
"current_oncall": {
"user_id": "usr_001",
"name": "John Smith",
"until": "2026-01-27T09:00:00-05:00"
},
"next_oncall": {
"user_id": "usr_002",
"name": "Jane Doe",
"starts": "2026-01-27T09:00:00-05:00"
},
"created_at": "2026-01-24T21:00:00Z"
}
List Schedules
GET /api/v1/alerting/schedules
Get Schedule
GET /api/v1/alerting/schedules/{schedule_id}
Get Current On-Call
GET /api/v1/alerting/schedules/{schedule_id}/oncall
Response
{
"schedule_id": "sch_api_primary",
"current": {
"layer": "primary",
"user_id": "usr_001",
"name": "John Smith",
"email": "john@nebusai.com",
"phone": "+1234567890",
"since": "2026-01-20T09:00:00-05:00",
"until": "2026-01-27T09:00:00-05:00"
},
"secondary": {
"user_id": "usr_005",
"name": "Mike Johnson"
}
}
Create Override
POST /api/v1/alerting/schedules/{schedule_id}/overrides
Request Body
{
"user_id": "usr_003",
"start": "2026-01-25T09:00:00-05:00",
"end": "2026-01-26T09:00:00-05:00",
"reason": "Covering for John's PTO"
}
Escalation Policies
Create Escalation Policy
POST /api/v1/alerting/escalation-policies
Request Body
{
"name": "API Services Escalation",
"description": "Escalation policy for all API services",
"services": ["api-gateway", "auth-service", "commerce-service"],
"levels": [
{
"level": 1,
"timeout_minutes": 5,
"targets": [
{"type": "schedule", "id": "sch_api_primary"}
],
"notification_channels": ["sms", "push"]
},
{
"level": 2,
"timeout_minutes": 10,
"targets": [
{"type": "schedule", "id": "sch_api_secondary"},
{"type": "user", "id": "usr_manager_001"}
],
"notification_channels": ["sms", "voice", "email"]
},
{
"level": 3,
"timeout_minutes": 15,
"targets": [
{"type": "user", "id": "usr_director_001"},
{"type": "user", "id": "usr_cto"}
],
"notification_channels": ["voice", "sms", "slack"]
}
],
"repeat_policy": {
"enabled": true,
"repeat_after_minutes": 30,
"max_repeats": 3
}
}
Response
{
"policy_id": "esc_api_services",
"name": "API Services Escalation",
"services": ["api-gateway", "auth-service", "commerce-service"],
"levels": 3,
"created_at": "2026-01-24T21:00:00Z"
}
List Escalation Policies
GET /api/v1/alerting/escalation-policies
Get Escalation Policy
GET /api/v1/alerting/escalation-policies/{policy_id}
Notification Channels
Available Channels
| Channel | Provider | Configuration |
|---|---|---|
sms | Twilio | Phone number |
voice | Twilio | Phone number + TTS message |
email | SendGrid | Email address |
slack | Slack API | Channel webhook URL |
teams | MS Teams | Channel webhook URL |
push | FCM/APNs | Device token |
in_app | WebSocket | Real-time Cockpit |
Configure User Notifications
PUT /api/v1/alerting/users/{user_id}/notifications
Request Body
{
"channels": {
"sms": {
"enabled": true,
"phone": "+1234567890"
},
"voice": {
"enabled": true,
"phone": "+1234567890",
"voice": "en-US-Neural2-F"
},
"email": {
"enabled": true,
"address": "john@nebusai.com"
},
"slack": {
"enabled": true,
"user_id": "U12345678"
},
"push": {
"enabled": true,
"devices": ["device_token_1", "device_token_2"]
}
},
"quiet_hours": {
"enabled": true,
"start": "22:00",
"end": "07:00",
"timezone": "America/New_York",
"bypass_severity": ["P1"]
}
}
AIOps Engine
Get AIOps Insights
GET /api/v1/alerting/aiops/insights
Response
{
"insights": [
{
"type": "correlation",
"confidence": 0.92,
"description": "3 alerts appear to have the same root cause",
"related_alerts": ["alert_001", "alert_002", "alert_003"],
"suggested_root_cause": "Database connection pool exhaustion",
"suggested_action": "Increase connection pool size"
},
{
"type": "anomaly",
"confidence": 0.87,
"description": "Error rate anomaly detected",
"metric": "error_rate",
"expected": 0.01,
"actual": 0.15,
"suggested_action": "Review recent deployments"
},
{
"type": "prediction",
"confidence": 0.78,
"description": "Predicted disk full in 4 hours",
"metric": "disk_usage",
"current": 0.85,
"predicted": 1.0,
"prediction_time": "2026-01-25T01:00:00Z"
}
]
}
Get Alert Correlation
GET /api/v1/alerting/aiops/correlate/{alert_id}
Response
{
"alert_id": "alert_001",
"correlated_alerts": [
{
"alert_id": "alert_002",
"correlation_score": 0.95,
"correlation_reason": "Same service, temporal proximity"
},
{
"alert_id": "alert_003",
"correlation_score": 0.82,
"correlation_reason": "Downstream dependency"
}
],
"root_cause_analysis": {
"probable_cause": "Database connection timeout",
"confidence": 0.89,
"evidence": [
"Connection pool metrics show exhaustion",
"Database latency increased 5x"
],
"suggested_remediation": [
"Increase connection pool size",
"Scale database read replicas"
]
}
}
Suppress Similar Alerts
POST /api/v1/alerting/aiops/suppress
Request Body
{
"alert_id": "alert_001",
"suppress_similar": true,
"duration_minutes": 60,
"reason": "Known issue, fix in progress"
}
Maintenance Windows
Create Maintenance Window
POST /api/v1/alerting/maintenance
Request Body
{
"name": "Database Maintenance",
"description": "Scheduled database upgrade",
"services": ["database", "api-gateway"],
"start": "2026-01-25T02:00:00Z",
"end": "2026-01-25T04:00:00Z",
"suppress_all": false,
"suppress_severity": ["P3", "P4", "P5"]
}
List Maintenance Windows
GET /api/v1/alerting/maintenance
Incident Management
Create Incident
POST /api/v1/alerting/incidents
Request Body
{
"title": "API Gateway Outage",
"severity": "P1",
"description": "Complete API gateway failure affecting all services",
"related_alerts": ["alert_001", "alert_002"],
"commander": "usr_001"
}
Response
{
"incident_id": "inc_001",
"title": "API Gateway Outage",
"severity": "P1",
"status": "investigating",
"commander": {
"user_id": "usr_001",
"name": "John Smith"
},
"chat_channel": {
"id": "ch_inc_001",
"name": "#incident-001-api-gateway"
},
"timeline": [
{
"timestamp": "2026-01-24T21:00:00Z",
"type": "created",
"message": "Incident created"
}
],
"created_at": "2026-01-24T21:00:00Z"
}
Update Incident Status
PUT /api/v1/alerting/incidents/{incident_id}/status
Request Body
{
"status": "identified",
"update": "Root cause identified: memory leak in auth service"
}
Status Values
| Status | Description |
|---|---|
investigating | Initial investigation |
identified | Root cause identified |
monitoring | Fix deployed, monitoring |
resolved | Incident resolved |
postmortem | Postmortem in progress |
closed | Fully closed |
Webhooks
| Event | Description |
|---|---|
alert.created | New alert created |
alert.acknowledged | Alert acknowledged |
alert.resolved | Alert resolved |
alert.escalated | Alert escalated to next level |
incident.created | Incident created |
incident.updated | Incident status updated |
oncall.handoff | On-call rotation handoff |
schedule.override | Schedule override created |
Error Responses
| Status | Code | Description |
|---|---|---|
| 400 | invalid_severity | Invalid severity level |
| 400 | invalid_schedule | Invalid schedule configuration |
| 404 | alert_not_found | Alert ID not found |
| 404 | schedule_not_found | Schedule ID not found |
| 409 | already_acknowledged | Alert already acknowledged |
| 409 | already_resolved | Alert already resolved |
Related Documentation
- Notifications API - Notification delivery
- Cockpit Guide - Real-time alerting dashboard
- Olympus Chat Guide - Incident chat channels
- AIOps Guide - ML-powered alert intelligence