This documents the API gateway architecture and behavior. These are not callable endpoints — they describe how the gateway processes requests.
Rate Limiting & Throttling
The Olympus Cloud API Gateway implements a two-layer rate limiting system to ensure fair usage and protect platform stability.
Every successful API response includes X-Rate-Limit-Remaining and X-Rate-Limit-Reset headers. Monitor these proactively to avoid hitting 429 errors. Implement exponential backoff and respect the Retry-After header when rate limited.
Overview
Rate limiting is applied at two levels:
| Layer | Scope | Backend | Algorithm |
|---|---|---|---|
| Global | Per-endpoint, per-role | In-memory or Redis | Token bucket |
| Tenant | Per-tenant | Redis | Sliding window |
┌─────────────────────────────────────────────────────────────────┐
│ API REQUEST FLOW │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Request → [Global Rate Limit] → [Auth] → [Tenant Rate Limit] │
│ │ │ │
│ ▼ ▼ │
│ Check endpoint/role Check tenant tier │
│ limits & burst limits & quota │
│ │ │ │
│ ▼ ▼ │
│ ┌─────────┐ ┌─────────┐ │
│ │ Allow │ │ Allow │ → API │
│ │ or │ │ or │ │
│ │ 429 │ │ 429 │ │
│ └─────────┘ └─────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
Rate Limit Tiers
Endpoint-Specific Limits
| Endpoint | Limit | Burst | Notes |
|---|---|---|---|
/api/v1/auth/login | 60/min | 10 | Security-critical |
/api/v1/auth/register | 30/min | 5 | Prevent abuse |
/api/v1/graphql | 2000/min | 100 | High-volume queries |
/api/v1/restaurant/orders | 1000/min | 50 | Order operations |
/healthz, /health | 300/min | 30 | Health checks |
/metrics | 60/min | 10 | Monitoring |
User Role Limits
| Role | Limit | Burst | Description |
|---|---|---|---|
super_admin | 5000/min | 200 | Platform administrators |
admin | 5000/min | 200 | Tenant administrators |
manager | 2000/min | 100 | Restaurant managers |
staff | 1000/min | 50 | Restaurant staff |
customer | 500/min | 25 | End customers |
guest | 100/min | 10 | Unauthenticated users |
Subscription Tier Limits
| Tier | Limit | Burst | Monthly Quota |
|---|---|---|---|
| Free | 50/min | 10 | 100,000 |
| Trial | 100/min | 20 | 500,000 |
| Standard | 100/min | 20 | 1,000,000 |
| Professional | 500/min | 100 | 10,000,000 |
| Enterprise | Unlimited | Unlimited | Unlimited |
Per-Tenant Production Limits (RPS)
| Tier | Production | Dev/Staging |
|---|---|---|
| Standard | 100 RPS | 500 RPS |
| Premium | 500 RPS | 2500 RPS |
| Enterprise | 2000 RPS | 10000 RPS |
Response Headers
Successful Requests
All successful responses include rate limit headers:
HTTP/1.1 200 OK
X-Rate-Limit-Limit: 1000
X-Rate-Limit-Remaining: 847
X-Rate-Limit-Reset: 1706108460
X-Rate-Limit-Tier: professional
X-RateLimit-Tenant: tenant-xyz789
| Header | Description |
|---|---|
X-Rate-Limit-Limit | Maximum requests allowed per window |
X-Rate-Limit-Remaining | Requests remaining in current window |
X-Rate-Limit-Reset | Unix timestamp when limit resets |
X-Rate-Limit-Tier | Applied subscription tier |
X-RateLimit-Tenant | Tenant identifier |
Rate Limited Responses
When rate limited, you'll receive a 429 Too Many Requests response:
HTTP/1.1 429 Too Many Requests
X-Rate-Limit-Limit: 100
X-Rate-Limit-Remaining: 0
X-Rate-Limit-Reset: 1706108520
Retry-After: 60
Content-Type: application/json
{
"error": "rate_limit_exceeded",
"message": "Tenant rate limit exceeded. Please try again later.",
"limit": 100,
"reset_at": 1706108520,
"retry_after": 60
}
Rate Limit Key Strategy
Rate limits are applied based on the following key priority:
- User ID (authenticated) -
user:{user_id} - API Key (if provided) -
api:{api_key} - IP Address (fallback) -
ip:{client_ip}
┌─────────────────────────────────────────────────────────────────┐
│ RATE LIMIT KEY SELECTION │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Request received │
│ │ │
│ ▼ │
│ Has user_id? ──Yes──▶ Use "user:{user_id}" │
│ │ │
│ No │
│ ▼ │
│ Has API key? ──Yes──▶ Use "api:{api_key}" │
│ │ │
│ No │
│ ▼ │
│ Use "ip:{client_ip}" │
│ │
└─────────────────────────────────────────────────────────────────┘
Implementation Details
Token Bucket Algorithm (Global)
The global rate limiter uses the token bucket algorithm:
- Bucket capacity: Equal to burst limit
- Refill rate: Tokens added per second
- Request cost: 1 token per request
Example: 60 req/min with burst of 10
- Refill rate: 1 token/second
- Bucket capacity: 10 tokens
- Initial tokens: 10 (allows burst)
- Sustained rate: 60 requests/minute
Sliding Window Algorithm (Tenant)
The tenant rate limiter uses a sliding window:
- Window size: 1 second (for RPS limits)
- Storage: Redis sorted sets
- Atomic operations: Lua scripts for check-and-increment
- Automatic cleanup: TTL-based expiration
Handling Rate Limits
Best Practices
- Monitor headers: Track
X-Rate-Limit-Remainingproactively - Implement backoff: Use exponential backoff on 429 responses
- Respect Retry-After: Wait the specified duration before retrying
- Batch requests: Combine multiple operations where possible
- Cache responses: Reduce unnecessary API calls
Retry Strategy
async function apiRequestWithRetry(url, options, maxRetries = 3) {
for (let attempt = 0; attempt < maxRetries; attempt++) {
const response = await fetch(url, options);
if (response.status === 429) {
const retryAfter = response.headers.get('Retry-After') || 60;
console.log(`Rate limited. Retrying in ${retryAfter}s`);
await sleep(retryAfter * 1000);
continue;
}
return response;
}
throw new Error('Max retries exceeded');
}
Monitoring Rate Limits
function checkRateLimitHeaders(response) {
const remaining = parseInt(response.headers.get('X-Rate-Limit-Remaining'));
const limit = parseInt(response.headers.get('X-Rate-Limit-Limit'));
const reset = parseInt(response.headers.get('X-Rate-Limit-Reset'));
const usagePercent = ((limit - remaining) / limit) * 100;
if (usagePercent > 80) {
console.warn(`Rate limit warning: ${usagePercent.toFixed(1)}% used`);
console.warn(`Resets at: ${new Date(reset * 1000).toISOString()}`);
}
return { remaining, limit, reset, usagePercent };
}
Special Cases
Admin Override
Admin users (super_admin, admin) receive higher limits regardless of subscription tier:
- Minimum 5000 req/min even on Free tier
- Burst allowance of 200 requests
- Useful for administrative operations
Security Endpoints
Authentication endpoints (/auth/login, /auth/register, /auth/forgot-password) have hard rate limits that apply to all tiers, including Enterprise. These limits exist to prevent brute force attacks and abuse, and cannot be increased via subscription upgrades or support requests.
Security-critical endpoints have strict limits that cannot be overridden:
| Endpoint | Hard Limit | Reason |
|---|---|---|
/auth/login | 60/min | Brute force protection |
/auth/register | 30/min | Account spam prevention |
/auth/forgot-password | 10/min | Abuse prevention |
Fail-Open Behavior
If the Redis rate limiter is unavailable, the system fails open:
- Requests are allowed through
- Warning logged for monitoring
- Graceful degradation prevents outages
Increasing Limits
Upgrade Subscription
Contact sales to upgrade your subscription tier for higher limits:
| Current Tier | Upgrade Options |
|---|---|
| Free | Trial, Standard |
| Standard | Professional |
| Professional | Enterprise |
| Enterprise | Custom limits available |
Request Limit Increase
For Enterprise customers, custom limits can be configured:
- Open a support ticket
- Provide use case justification
- Specify required limits (RPS, burst, monthly quota)
- Limits applied within 24 hours
Monitoring & Alerting
Metrics Available
| Metric | Description |
|---|---|
rate_limit_requests_total | Total requests by endpoint |
rate_limit_exceeded_total | 429 responses by tenant |
rate_limit_remaining | Current remaining quota |
rate_limit_latency_ms | Rate check latency |
Recommended Alerts
# Alert when approaching limit
- alert: RateLimitWarning
expr: rate_limit_remaining < (rate_limit_limit * 0.2)
for: 5m
labels:
severity: warning
annotations:
summary: "Rate limit usage above 80%"
# Alert on frequent 429s
- alert: RateLimitExceeded
expr: rate(rate_limit_exceeded_total[5m]) > 10
for: 2m
labels:
severity: critical
annotations:
summary: "High rate of 429 responses"
FAQ
Why am I getting 429 errors?
- Check your current tier limits
- Review request patterns for spikes
- Implement caching to reduce calls
- Consider upgrading your subscription
How do burst limits work?
Burst allows temporary spikes above the sustained rate:
- Standard: 100/min with burst of 20
- You can send 20 requests instantly
- Then must wait for token refill (1.67/sec)
Can I get unlimited API access?
Enterprise tier includes unlimited rate limits. Contact sales for pricing.
Do WebSocket connections count against limits?
WebSocket connections are rate-limited separately:
- Connection establishment: 10/min per user
- Messages: 100/min per connection
Related Documentation
- Authentication - API authentication
- WebSocket Connection - Real-time APIs
- Error Handling - Error responses