Skip to main content
Gateway Documentation

This documents the API gateway architecture and behavior. These are not callable endpoints — they describe how the gateway processes requests.

Rate Limiting & Throttling

The Olympus Cloud API Gateway implements a two-layer rate limiting system to ensure fair usage and protect platform stability.

Always Monitor Rate Limit Headers

Every successful API response includes X-Rate-Limit-Remaining and X-Rate-Limit-Reset headers. Monitor these proactively to avoid hitting 429 errors. Implement exponential backoff and respect the Retry-After header when rate limited.

Overview

Rate limiting is applied at two levels:

LayerScopeBackendAlgorithm
GlobalPer-endpoint, per-roleIn-memory or RedisToken bucket
TenantPer-tenantRedisSliding window
┌─────────────────────────────────────────────────────────────────┐
│ API REQUEST FLOW │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Request → [Global Rate Limit] → [Auth] → [Tenant Rate Limit] │
│ │ │ │
│ ▼ ▼ │
│ Check endpoint/role Check tenant tier │
│ limits & burst limits & quota │
│ │ │ │
│ ▼ ▼ │
│ ┌─────────┐ ┌─────────┐ │
│ │ Allow │ │ Allow │ → API │
│ │ or │ │ or │ │
│ │ 429 │ │ 429 │ │
│ └─────────┘ └─────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘

Rate Limit Tiers

Endpoint-Specific Limits

EndpointLimitBurstNotes
/api/v1/auth/login60/min10Security-critical
/api/v1/auth/register30/min5Prevent abuse
/api/v1/graphql2000/min100High-volume queries
/api/v1/restaurant/orders1000/min50Order operations
/healthz, /health300/min30Health checks
/metrics60/min10Monitoring

User Role Limits

RoleLimitBurstDescription
super_admin5000/min200Platform administrators
admin5000/min200Tenant administrators
manager2000/min100Restaurant managers
staff1000/min50Restaurant staff
customer500/min25End customers
guest100/min10Unauthenticated users

Subscription Tier Limits

TierLimitBurstMonthly Quota
Free50/min10100,000
Trial100/min20500,000
Standard100/min201,000,000
Professional500/min10010,000,000
EnterpriseUnlimitedUnlimitedUnlimited

Per-Tenant Production Limits (RPS)

TierProductionDev/Staging
Standard100 RPS500 RPS
Premium500 RPS2500 RPS
Enterprise2000 RPS10000 RPS

Response Headers

Successful Requests

All successful responses include rate limit headers:

HTTP/1.1 200 OK
X-Rate-Limit-Limit: 1000
X-Rate-Limit-Remaining: 847
X-Rate-Limit-Reset: 1706108460
X-Rate-Limit-Tier: professional
X-RateLimit-Tenant: tenant-xyz789
HeaderDescription
X-Rate-Limit-LimitMaximum requests allowed per window
X-Rate-Limit-RemainingRequests remaining in current window
X-Rate-Limit-ResetUnix timestamp when limit resets
X-Rate-Limit-TierApplied subscription tier
X-RateLimit-TenantTenant identifier

Rate Limited Responses

When rate limited, you'll receive a 429 Too Many Requests response:

HTTP/1.1 429 Too Many Requests
X-Rate-Limit-Limit: 100
X-Rate-Limit-Remaining: 0
X-Rate-Limit-Reset: 1706108520
Retry-After: 60
Content-Type: application/json

{
"error": "rate_limit_exceeded",
"message": "Tenant rate limit exceeded. Please try again later.",
"limit": 100,
"reset_at": 1706108520,
"retry_after": 60
}

Rate Limit Key Strategy

Rate limits are applied based on the following key priority:

  1. User ID (authenticated) - user:{user_id}
  2. API Key (if provided) - api:{api_key}
  3. IP Address (fallback) - ip:{client_ip}
┌─────────────────────────────────────────────────────────────────┐
│ RATE LIMIT KEY SELECTION │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Request received │
│ │ │
│ ▼ │
│ Has user_id? ──Yes──▶ Use "user:{user_id}" │
│ │ │
│ No │
│ ▼ │
│ Has API key? ──Yes──▶ Use "api:{api_key}" │
│ │ │
│ No │
│ ▼ │
│ Use "ip:{client_ip}" │
│ │
└─────────────────────────────────────────────────────────────────┘

Implementation Details

Token Bucket Algorithm (Global)

The global rate limiter uses the token bucket algorithm:

  • Bucket capacity: Equal to burst limit
  • Refill rate: Tokens added per second
  • Request cost: 1 token per request
Example: 60 req/min with burst of 10
- Refill rate: 1 token/second
- Bucket capacity: 10 tokens
- Initial tokens: 10 (allows burst)
- Sustained rate: 60 requests/minute

Sliding Window Algorithm (Tenant)

The tenant rate limiter uses a sliding window:

  • Window size: 1 second (for RPS limits)
  • Storage: Redis sorted sets
  • Atomic operations: Lua scripts for check-and-increment
  • Automatic cleanup: TTL-based expiration

Handling Rate Limits

Best Practices

  1. Monitor headers: Track X-Rate-Limit-Remaining proactively
  2. Implement backoff: Use exponential backoff on 429 responses
  3. Respect Retry-After: Wait the specified duration before retrying
  4. Batch requests: Combine multiple operations where possible
  5. Cache responses: Reduce unnecessary API calls

Retry Strategy

async function apiRequestWithRetry(url, options, maxRetries = 3) {
for (let attempt = 0; attempt < maxRetries; attempt++) {
const response = await fetch(url, options);

if (response.status === 429) {
const retryAfter = response.headers.get('Retry-After') || 60;
console.log(`Rate limited. Retrying in ${retryAfter}s`);
await sleep(retryAfter * 1000);
continue;
}

return response;
}
throw new Error('Max retries exceeded');
}

Monitoring Rate Limits

function checkRateLimitHeaders(response) {
const remaining = parseInt(response.headers.get('X-Rate-Limit-Remaining'));
const limit = parseInt(response.headers.get('X-Rate-Limit-Limit'));
const reset = parseInt(response.headers.get('X-Rate-Limit-Reset'));

const usagePercent = ((limit - remaining) / limit) * 100;

if (usagePercent > 80) {
console.warn(`Rate limit warning: ${usagePercent.toFixed(1)}% used`);
console.warn(`Resets at: ${new Date(reset * 1000).toISOString()}`);
}

return { remaining, limit, reset, usagePercent };
}

Special Cases

Admin Override

Admin users (super_admin, admin) receive higher limits regardless of subscription tier:

  • Minimum 5000 req/min even on Free tier
  • Burst allowance of 200 requests
  • Useful for administrative operations

Security Endpoints

Security Endpoint Limits Cannot Be Overridden

Authentication endpoints (/auth/login, /auth/register, /auth/forgot-password) have hard rate limits that apply to all tiers, including Enterprise. These limits exist to prevent brute force attacks and abuse, and cannot be increased via subscription upgrades or support requests.

Security-critical endpoints have strict limits that cannot be overridden:

EndpointHard LimitReason
/auth/login60/minBrute force protection
/auth/register30/minAccount spam prevention
/auth/forgot-password10/minAbuse prevention

Fail-Open Behavior

If the Redis rate limiter is unavailable, the system fails open:

  • Requests are allowed through
  • Warning logged for monitoring
  • Graceful degradation prevents outages

Increasing Limits

Upgrade Subscription

Contact sales to upgrade your subscription tier for higher limits:

Current TierUpgrade Options
FreeTrial, Standard
StandardProfessional
ProfessionalEnterprise
EnterpriseCustom limits available

Request Limit Increase

For Enterprise customers, custom limits can be configured:

  1. Open a support ticket
  2. Provide use case justification
  3. Specify required limits (RPS, burst, monthly quota)
  4. Limits applied within 24 hours

Monitoring & Alerting

Metrics Available

MetricDescription
rate_limit_requests_totalTotal requests by endpoint
rate_limit_exceeded_total429 responses by tenant
rate_limit_remainingCurrent remaining quota
rate_limit_latency_msRate check latency
# Alert when approaching limit
- alert: RateLimitWarning
expr: rate_limit_remaining < (rate_limit_limit * 0.2)
for: 5m
labels:
severity: warning
annotations:
summary: "Rate limit usage above 80%"

# Alert on frequent 429s
- alert: RateLimitExceeded
expr: rate(rate_limit_exceeded_total[5m]) > 10
for: 2m
labels:
severity: critical
annotations:
summary: "High rate of 429 responses"

FAQ

Why am I getting 429 errors?

  1. Check your current tier limits
  2. Review request patterns for spikes
  3. Implement caching to reduce calls
  4. Consider upgrading your subscription

How do burst limits work?

Burst allows temporary spikes above the sustained rate:

  • Standard: 100/min with burst of 20
  • You can send 20 requests instantly
  • Then must wait for token refill (1.67/sec)

Can I get unlimited API access?

Enterprise tier includes unlimited rate limits. Contact sales for pricing.

Do WebSocket connections count against limits?

WebSocket connections are rate-limited separately:

  • Connection establishment: 10/min per user
  • Messages: 100/min per connection