Skip to main content
Admin API

This endpoint requires admin-level roles (platform_admin, tenant_admin, or system_admin). Accessible via the API gateway at /v1/platform/*.

Alerting & On-Call API

Custom alerting and on-call management platform replacing PagerDuty with edge-first architecture and AIOps capabilities.

Overview

AttributeValue
Base Path/api/v1/alerting
AuthenticationBearer Token
Required Rolesplatform_admin, system_admin, super_admin, support_agent, tenant_admin, restaurant_manager, manager

Key Features

  • Edge-First Architecture - Primary on Cloudflare Workers, GCP Cloud Run failover
  • Multi-Source Ingestion - Prometheus, Cloud Monitoring, Sentry, custom webhooks
  • AIOps Engine - Anomaly detection, alert correlation, predictive alerting
  • On-Call Management - Rotations, overrides, escalation policies
  • Multi-Channel Notifications - SMS, voice, email, push, Slack, Teams

Alert Ingestion

Create Alert

POST /api/v1/alerting/alerts

Create a new alert from any source.

Request Body

{
"source": "prometheus",
"title": "High CPU Usage on api-gateway-prod",
"description": "CPU usage exceeded 90% for 5 minutes",
"severity": "P2",
"service": "api-gateway",
"environment": "production",
"labels": {
"pod": "api-gateway-prod-7b8f9",
"region": "us-central1"
},
"annotations": {
"runbook_url": "https://docs.olympuscloud.ai/runbooks/high-cpu",
"dashboard_url": "https://grafana.olympuscloud.ai/d/cpu"
},
"fingerprint": "hash_abc123"
}

Severity Levels

LevelDescriptionDefault Timeout
P1Critical - Immediate attention2 minutes
P2High - Urgent response needed5 minutes
P3Medium - Normal priority15 minutes
P4Low - Can wait1 hour
P5Info - Notification onlyN/A

Response

{
"alert_id": "alert_001",
"status": "firing",
"severity": "P2",
"title": "High CPU Usage on api-gateway-prod",
"service": "api-gateway",
"assigned_to": {
"user_id": "usr_oncall_001",
"name": "John Smith",
"notification_sent": true
},
"escalation_policy_id": "esc_api_team",
"aiops": {
"correlated_alerts": ["alert_002", "alert_003"],
"root_cause_probability": 0.85,
"suggested_action": "Scale api-gateway deployment"
},
"created_at": "2026-01-24T21:00:00Z"
}

Prometheus AlertManager Webhook

POST /api/v1/alerting/webhooks/prometheus

Native Prometheus AlertManager integration.

Request Body (AlertManager format)

{
"version": "4",
"groupKey": "...",
"status": "firing",
"receiver": "olympus-alerting",
"groupLabels": {},
"commonLabels": {
"alertname": "HighCPU",
"severity": "critical"
},
"alerts": [
{
"status": "firing",
"labels": {
"alertname": "HighCPU",
"instance": "api-gateway:9090"
},
"annotations": {
"summary": "High CPU on api-gateway"
},
"startsAt": "2026-01-24T21:00:00Z"
}
]
}

Cloud Monitoring Webhook

POST /api/v1/alerting/webhooks/gcp

Google Cloud Monitoring integration.

Sentry Webhook

POST /api/v1/alerting/webhooks/sentry

Sentry error tracking integration.


Alert Management

List Alerts

GET /api/v1/alerting/alerts

Query Parameters

ParameterTypeDescription
statusstringfiring, acknowledged, resolved
severitystringP1, P2, P3, P4, P5
servicestringFilter by service name
assigned_touuidFilter by assignee
sincedatetimeAlerts since timestamp

Response

{
"alerts": [
{
"alert_id": "alert_001",
"status": "firing",
"severity": "P2",
"title": "High CPU Usage",
"service": "api-gateway",
"created_at": "2026-01-24T21:00:00Z",
"acknowledged_at": null,
"assigned_to": "usr_oncall_001"
}
],
"total": 15,
"firing": 3,
"acknowledged": 5,
"resolved": 7
}

Get Alert

GET /api/v1/alerting/alerts/{alert_id}

Acknowledge Alert

POST /api/v1/alerting/alerts/{alert_id}/acknowledge

Request Body

{
"message": "Investigating the issue",
"snooze_minutes": 30
}

Response

{
"alert_id": "alert_001",
"status": "acknowledged",
"acknowledged_by": "usr_001",
"acknowledged_at": "2026-01-24T21:05:00Z",
"snooze_until": "2026-01-24T21:35:00Z"
}

Resolve Alert

POST /api/v1/alerting/alerts/{alert_id}/resolve

Request Body

{
"resolution_notes": "Scaled deployment to 5 replicas",
"root_cause": "Traffic spike from marketing campaign"
}

Escalate Alert

POST /api/v1/alerting/alerts/{alert_id}/escalate

Request Body

{
"reason": "Unable to access production systems",
"skip_levels": 1
}

On-Call Schedules

Create Schedule

POST /api/v1/alerting/schedules

Request Body

{
"name": "API Team Primary",
"description": "Primary on-call for API services",
"timezone": "America/New_York",
"rotation_type": "weekly",
"handoff_time": "09:00",
"handoff_day": "monday",
"participants": [
{"user_id": "usr_001", "order": 1},
{"user_id": "usr_002", "order": 2},
{"user_id": "usr_003", "order": 3},
{"user_id": "usr_004", "order": 4}
],
"layers": [
{
"name": "primary",
"priority": 1
},
{
"name": "secondary",
"priority": 2,
"participants": ["usr_005", "usr_006"]
}
]
}

Rotation Types

TypeDescription
weeklyRotate every week
dailyRotate every day
customCustom rotation period

Response

{
"schedule_id": "sch_api_primary",
"name": "API Team Primary",
"current_oncall": {
"user_id": "usr_001",
"name": "John Smith",
"until": "2026-01-27T09:00:00-05:00"
},
"next_oncall": {
"user_id": "usr_002",
"name": "Jane Doe",
"starts": "2026-01-27T09:00:00-05:00"
},
"created_at": "2026-01-24T21:00:00Z"
}

List Schedules

GET /api/v1/alerting/schedules

Get Schedule

GET /api/v1/alerting/schedules/{schedule_id}

Get Current On-Call

GET /api/v1/alerting/schedules/{schedule_id}/oncall

Response

{
"schedule_id": "sch_api_primary",
"current": {
"layer": "primary",
"user_id": "usr_001",
"name": "John Smith",
"email": "john@nebusai.com",
"phone": "+1234567890",
"since": "2026-01-20T09:00:00-05:00",
"until": "2026-01-27T09:00:00-05:00"
},
"secondary": {
"user_id": "usr_005",
"name": "Mike Johnson"
}
}

Create Override

POST /api/v1/alerting/schedules/{schedule_id}/overrides

Request Body

{
"user_id": "usr_003",
"start": "2026-01-25T09:00:00-05:00",
"end": "2026-01-26T09:00:00-05:00",
"reason": "Covering for John's PTO"
}

Escalation Policies

Create Escalation Policy

POST /api/v1/alerting/escalation-policies

Request Body

{
"name": "API Services Escalation",
"description": "Escalation policy for all API services",
"services": ["api-gateway", "auth-service", "commerce-service"],
"levels": [
{
"level": 1,
"timeout_minutes": 5,
"targets": [
{"type": "schedule", "id": "sch_api_primary"}
],
"notification_channels": ["sms", "push"]
},
{
"level": 2,
"timeout_minutes": 10,
"targets": [
{"type": "schedule", "id": "sch_api_secondary"},
{"type": "user", "id": "usr_manager_001"}
],
"notification_channels": ["sms", "voice", "email"]
},
{
"level": 3,
"timeout_minutes": 15,
"targets": [
{"type": "user", "id": "usr_director_001"},
{"type": "user", "id": "usr_cto"}
],
"notification_channels": ["voice", "sms", "slack"]
}
],
"repeat_policy": {
"enabled": true,
"repeat_after_minutes": 30,
"max_repeats": 3
}
}

Response

{
"policy_id": "esc_api_services",
"name": "API Services Escalation",
"services": ["api-gateway", "auth-service", "commerce-service"],
"levels": 3,
"created_at": "2026-01-24T21:00:00Z"
}

List Escalation Policies

GET /api/v1/alerting/escalation-policies

Get Escalation Policy

GET /api/v1/alerting/escalation-policies/{policy_id}

Notification Channels

Available Channels

ChannelProviderConfiguration
smsTwilioPhone number
voiceTwilioPhone number + TTS message
emailSendGridEmail address
slackSlack APIChannel webhook URL
teamsMS TeamsChannel webhook URL
pushFCM/APNsDevice token
in_appWebSocketReal-time Cockpit

Configure User Notifications

PUT /api/v1/alerting/users/{user_id}/notifications

Request Body

{
"channels": {
"sms": {
"enabled": true,
"phone": "+1234567890"
},
"voice": {
"enabled": true,
"phone": "+1234567890",
"voice": "en-US-Neural2-F"
},
"email": {
"enabled": true,
"address": "john@nebusai.com"
},
"slack": {
"enabled": true,
"user_id": "U12345678"
},
"push": {
"enabled": true,
"devices": ["device_token_1", "device_token_2"]
}
},
"quiet_hours": {
"enabled": true,
"start": "22:00",
"end": "07:00",
"timezone": "America/New_York",
"bypass_severity": ["P1"]
}
}

AIOps Engine

Get AIOps Insights

GET /api/v1/alerting/aiops/insights

Response

{
"insights": [
{
"type": "correlation",
"confidence": 0.92,
"description": "3 alerts appear to have the same root cause",
"related_alerts": ["alert_001", "alert_002", "alert_003"],
"suggested_root_cause": "Database connection pool exhaustion",
"suggested_action": "Increase connection pool size"
},
{
"type": "anomaly",
"confidence": 0.87,
"description": "Error rate anomaly detected",
"metric": "error_rate",
"expected": 0.01,
"actual": 0.15,
"suggested_action": "Review recent deployments"
},
{
"type": "prediction",
"confidence": 0.78,
"description": "Predicted disk full in 4 hours",
"metric": "disk_usage",
"current": 0.85,
"predicted": 1.0,
"prediction_time": "2026-01-25T01:00:00Z"
}
]
}

Get Alert Correlation

GET /api/v1/alerting/aiops/correlate/{alert_id}

Response

{
"alert_id": "alert_001",
"correlated_alerts": [
{
"alert_id": "alert_002",
"correlation_score": 0.95,
"correlation_reason": "Same service, temporal proximity"
},
{
"alert_id": "alert_003",
"correlation_score": 0.82,
"correlation_reason": "Downstream dependency"
}
],
"root_cause_analysis": {
"probable_cause": "Database connection timeout",
"confidence": 0.89,
"evidence": [
"Connection pool metrics show exhaustion",
"Database latency increased 5x"
],
"suggested_remediation": [
"Increase connection pool size",
"Scale database read replicas"
]
}
}

Suppress Similar Alerts

POST /api/v1/alerting/aiops/suppress

Request Body

{
"alert_id": "alert_001",
"suppress_similar": true,
"duration_minutes": 60,
"reason": "Known issue, fix in progress"
}

Maintenance Windows

Create Maintenance Window

POST /api/v1/alerting/maintenance

Request Body

{
"name": "Database Maintenance",
"description": "Scheduled database upgrade",
"services": ["database", "api-gateway"],
"start": "2026-01-25T02:00:00Z",
"end": "2026-01-25T04:00:00Z",
"suppress_all": false,
"suppress_severity": ["P3", "P4", "P5"]
}

List Maintenance Windows

GET /api/v1/alerting/maintenance

Incident Management

Create Incident

POST /api/v1/alerting/incidents

Request Body

{
"title": "API Gateway Outage",
"severity": "P1",
"description": "Complete API gateway failure affecting all services",
"related_alerts": ["alert_001", "alert_002"],
"commander": "usr_001"
}

Response

{
"incident_id": "inc_001",
"title": "API Gateway Outage",
"severity": "P1",
"status": "investigating",
"commander": {
"user_id": "usr_001",
"name": "John Smith"
},
"chat_channel": {
"id": "ch_inc_001",
"name": "#incident-001-api-gateway"
},
"timeline": [
{
"timestamp": "2026-01-24T21:00:00Z",
"type": "created",
"message": "Incident created"
}
],
"created_at": "2026-01-24T21:00:00Z"
}

Update Incident Status

PUT /api/v1/alerting/incidents/{incident_id}/status

Request Body

{
"status": "identified",
"update": "Root cause identified: memory leak in auth service"
}

Status Values

StatusDescription
investigatingInitial investigation
identifiedRoot cause identified
monitoringFix deployed, monitoring
resolvedIncident resolved
postmortemPostmortem in progress
closedFully closed

Webhooks

EventDescription
alert.createdNew alert created
alert.acknowledgedAlert acknowledged
alert.resolvedAlert resolved
alert.escalatedAlert escalated to next level
incident.createdIncident created
incident.updatedIncident status updated
oncall.handoffOn-call rotation handoff
schedule.overrideSchedule override created

Error Responses

StatusCodeDescription
400invalid_severityInvalid severity level
400invalid_scheduleInvalid schedule configuration
404alert_not_foundAlert ID not found
404schedule_not_foundSchedule ID not found
409already_acknowledgedAlert already acknowledged
409already_resolvedAlert already resolved