Gateway Documentation

This documents the API gateway architecture and behavior. These are not callable endpoints — they describe how the gateway processes requests.

Rate Limiting & Throttling

The Olympus Cloud API Gateway implements a two-layer rate limiting system to ensure fair usage and protect platform stability.

Always Monitor Rate Limit Headers

Every successful API response includes X-Rate-Limit-Remaining and X-Rate-Limit-Reset headers. Monitor these proactively to avoid hitting 429 errors. Implement exponential backoff and respect the Retry-After header when rate limited.

Overview

Rate limiting is applied at two levels:

Layer	Scope	Backend	Algorithm
Global	Per-endpoint, per-role	In-memory or Redis	Token bucket
Tenant	Per-tenant	Redis	Sliding window

┌─────────────────────────────────────────────────────────────────┐
│                    API REQUEST FLOW                              │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Request → [Global Rate Limit] → [Auth] → [Tenant Rate Limit]  │
│                    │                              │              │
│                    ▼                              ▼              │
│              Check endpoint/role           Check tenant tier     │
│              limits & burst                limits & quota        │
│                    │                              │              │
│                    ▼                              ▼              │
│              ┌─────────┐                   ┌─────────┐          │
│              │  Allow  │                   │  Allow  │ → API    │
│              │   or    │                   │   or    │          │
│              │  429    │                   │  429    │          │
│              └─────────┘                   └─────────┘          │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Rate Limit Tiers

Endpoint-Specific Limits

Endpoint	Limit	Burst	Notes
`/api/v1/auth/login`	60/min	10	Security-critical
`/api/v1/auth/register`	30/min	5	Prevent abuse
`/api/v1/graphql`	2000/min	100	High-volume queries
`/api/v1/restaurant/orders`	1000/min	50	Order operations
`/healthz`, `/health`	300/min	30	Health checks
`/metrics`	60/min	10	Monitoring

User Role Limits

Role	Limit	Burst	Description
`super_admin`	5000/min	200	Platform administrators
`admin`	5000/min	200	Tenant administrators
`manager`	2000/min	100	Restaurant managers
`staff`	1000/min	50	Restaurant staff
`customer`	500/min	25	End customers
`guest`	100/min	10	Unauthenticated users

Subscription Tier Limits

Tier	Limit	Burst	Monthly Quota
Free	50/min	10	100,000
Trial	100/min	20	500,000
Standard	100/min	20	1,000,000
Professional	500/min	100	10,000,000
Enterprise	Unlimited	Unlimited	Unlimited

Per-Tenant Production Limits (RPS)

Tier	Production	Dev/Staging
Standard	100 RPS	500 RPS
Premium	500 RPS	2500 RPS
Enterprise	2000 RPS	10000 RPS

Response Headers

Successful Requests

All successful responses include rate limit headers:

HTTP/1.1 200 OK
X-Rate-Limit-Limit: 1000
X-Rate-Limit-Remaining: 847
X-Rate-Limit-Reset: 1706108460
X-Rate-Limit-Tier: professional
X-RateLimit-Tenant: tenant-xyz789

Header	Description
`X-Rate-Limit-Limit`	Maximum requests allowed per window
`X-Rate-Limit-Remaining`	Requests remaining in current window
`X-Rate-Limit-Reset`	Unix timestamp when limit resets
`X-Rate-Limit-Tier`	Applied subscription tier
`X-RateLimit-Tenant`	Tenant identifier

Rate Limited Responses

When rate limited, you'll receive a 429 Too Many Requests response:

HTTP/1.1 429 Too Many Requests
X-Rate-Limit-Limit: 100
X-Rate-Limit-Remaining: 0
X-Rate-Limit-Reset: 1706108520
Retry-After: 60
Content-Type: application/json

{
  "error": "rate_limit_exceeded",
  "message": "Tenant rate limit exceeded. Please try again later.",
  "limit": 100,
  "reset_at": 1706108520,
  "retry_after": 60
}

Rate Limit Key Strategy

Rate limits are applied based on the following key priority:

User ID (authenticated) - user:{user_id}
API Key (if provided) - api:{api_key}
IP Address (fallback) - ip:{client_ip}

┌─────────────────────────────────────────────────────────────────┐
│                  RATE LIMIT KEY SELECTION                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Request received                                                │
│       │                                                          │
│       ▼                                                          │
│  Has user_id? ──Yes──▶ Use "user:{user_id}"                     │
│       │                                                          │
│       No                                                         │
│       ▼                                                          │
│  Has API key? ──Yes──▶ Use "api:{api_key}"                      │
│       │                                                          │
│       No                                                         │
│       ▼                                                          │
│  Use "ip:{client_ip}"                                           │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Implementation Details

Token Bucket Algorithm (Global)

The global rate limiter uses the token bucket algorithm:

Bucket capacity: Equal to burst limit
Refill rate: Tokens added per second
Request cost: 1 token per request

Example: 60 req/min with burst of 10
- Refill rate: 1 token/second
- Bucket capacity: 10 tokens
- Initial tokens: 10 (allows burst)
- Sustained rate: 60 requests/minute

Sliding Window Algorithm (Tenant)

The tenant rate limiter uses a sliding window:

Window size: 1 second (for RPS limits)
Storage: Redis sorted sets
Atomic operations: Lua scripts for check-and-increment
Automatic cleanup: TTL-based expiration

Handling Rate Limits

Best Practices

Monitor headers: Track X-Rate-Limit-Remaining proactively
Implement backoff: Use exponential backoff on 429 responses
Respect Retry-After: Wait the specified duration before retrying
Batch requests: Combine multiple operations where possible
Cache responses: Reduce unnecessary API calls

Retry Strategy

async function apiRequestWithRetry(url, options, maxRetries = 3) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    const response = await fetch(url, options);

    if (response.status === 429) {
      const retryAfter = response.headers.get('Retry-After') || 60;
      console.log(`Rate limited. Retrying in ${retryAfter}s`);
      await sleep(retryAfter * 1000);
      continue;
    }

    return response;
  }
  throw new Error('Max retries exceeded');
}

Monitoring Rate Limits

function checkRateLimitHeaders(response) {
  const remaining = parseInt(response.headers.get('X-Rate-Limit-Remaining'));
  const limit = parseInt(response.headers.get('X-Rate-Limit-Limit'));
  const reset = parseInt(response.headers.get('X-Rate-Limit-Reset'));

  const usagePercent = ((limit - remaining) / limit) * 100;

  if (usagePercent > 80) {
    console.warn(`Rate limit warning: ${usagePercent.toFixed(1)}% used`);
    console.warn(`Resets at: ${new Date(reset * 1000).toISOString()}`);
  }

  return { remaining, limit, reset, usagePercent };
}

Special Cases

Admin Override

Admin users (super_admin, admin) receive higher limits regardless of subscription tier:

Minimum 5000 req/min even on Free tier
Burst allowance of 200 requests
Useful for administrative operations

Security Endpoints

Security Endpoint Limits Cannot Be Overridden

Authentication endpoints (/auth/login, /auth/register, /auth/forgot-password) have hard rate limits that apply to all tiers, including Enterprise. These limits exist to prevent brute force attacks and abuse, and cannot be increased via subscription upgrades or support requests.

Security-critical endpoints have strict limits that cannot be overridden:

Endpoint	Hard Limit	Reason
`/auth/login`	60/min	Brute force protection
`/auth/register`	30/min	Account spam prevention
`/auth/forgot-password`	10/min	Abuse prevention

Fail-Open Behavior

If the Redis rate limiter is unavailable, the system fails open:

Requests are allowed through
Warning logged for monitoring
Graceful degradation prevents outages

Increasing Limits

Upgrade Subscription

Contact sales to upgrade your subscription tier for higher limits:

Current Tier	Upgrade Options
Free	Trial, Standard
Standard	Professional
Professional	Enterprise
Enterprise	Custom limits available

Request Limit Increase

For Enterprise customers, custom limits can be configured:

Open a support ticket
Provide use case justification
Specify required limits (RPS, burst, monthly quota)
Limits applied within 24 hours

Monitoring & Alerting

Metrics Available

Metric	Description
`rate_limit_requests_total`	Total requests by endpoint
`rate_limit_exceeded_total`	429 responses by tenant
`rate_limit_remaining`	Current remaining quota
`rate_limit_latency_ms`	Rate check latency

Recommended Alerts

# Alert when approaching limit
- alert: RateLimitWarning
  expr: rate_limit_remaining < (rate_limit_limit * 0.2)
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "Rate limit usage above 80%"

# Alert on frequent 429s
- alert: RateLimitExceeded
  expr: rate(rate_limit_exceeded_total[5m]) > 10
  for: 2m
  labels:
    severity: critical
  annotations:
    summary: "High rate of 429 responses"

FAQ

Why am I getting 429 errors?

Check your current tier limits
Review request patterns for spikes
Implement caching to reduce calls
Consider upgrading your subscription

How do burst limits work?

Burst allows temporary spikes above the sustained rate:

Standard: 100/min with burst of 20
You can send 20 requests instantly
Then must wait for token refill (1.67/sec)

Can I get unlimited API access?

Enterprise tier includes unlimited rate limits. Contact sales for pricing.

Do WebSocket connections count against limits?

WebSocket connections are rate-limited separately:

Connection establishment: 10/min per user
Messages: 100/min per connection

Authentication - API authentication
WebSocket Connection - Real-time APIs
Error Handling - Error responses

Overview​

Rate Limit Tiers​

Endpoint-Specific Limits​

User Role Limits​

Subscription Tier Limits​

Per-Tenant Production Limits (RPS)​

Response Headers​

Successful Requests​

Rate Limited Responses​

Rate Limit Key Strategy​

Implementation Details​

Token Bucket Algorithm (Global)​

Sliding Window Algorithm (Tenant)​

Handling Rate Limits​

Best Practices​

Retry Strategy​

Monitoring Rate Limits​

Special Cases​

Admin Override​

Security Endpoints​

Fail-Open Behavior​

Increasing Limits​

Upgrade Subscription​

Request Limit Increase​

Monitoring & Alerting​

Metrics Available​

Recommended Alerts​

FAQ​

Why am I getting 429 errors?​

How do burst limits work?​

Can I get unlimited API access?​

Do WebSocket connections count against limits?​

Related Documentation​