Authenticated API
Voice AI endpoints require a valid JWT Bearer token with staff roles. Accessible via the API gateway at /v1/voice-ai/* and /v1/speech/*.
Voice AI API
Complete voice-enabled ordering for drive-thru and phone systems with multi-language NLU, semantic menu search, and offline resilience.
Base Path: /api/v1/voice
Overview
The Voice AI API enables natural conversation ordering:
| Feature | Description |
|---|---|
| Speech Recognition | Convert audio to text with noise handling |
| Multi-Language NLU | 6 languages: English, Spanish, French, German, Portuguese, Chinese |
| Menu RAG Search | Semantic menu matching via Cloudflare Vectorize |
| Intent Parsing | 20+ intent types including drive-thru specific |
| Modifier Extraction | Add, remove, substitute, quantity, preparation |
| ACP AI Router | Cost-optimized tiered inference (95%+ savings) |
| Offline Queue | Redis-backed command queuing with retry |
| Drive-Thru Lanes | Lane-aware sessions with speed metrics |
Architecture
┌─────────────────────────────────────────────────────────────────────────┐
│ Voice AI Pipeline │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ Customer Audio → STT → NLU Engine → Order Builder → TTS → Audio │
│ │ │ │ │
│ ↓ ↓ ↓ │
│ Transcript Intent Cart Update │
│ │ │ │ │
│ └─────────┼──────────────┘ │
│ ↓ │
│ ┌─────────────────┐ │
│ │ ACP Router │ │
│ │ (Tier Selection)│ │
│ └────────┬────────┘ │
│ │ │
│ ┌────────────────────┼────────────────────┐ │
│ ↓ ↓ ↓ │
│ T1: Llama 4 T2: Gemini 2.0 T4: Claude Haiku │
│ (FREE) ($0.10/M) ($1.00/M) │
│ Simple queries Complex orders Ambiguity │
└─────────────────────────────────────────────────────────────────────────┘
AI Model Tiers
Voice AI uses the ACP AI Router for cost-optimized inference:
| Tier | Model | Cost (per M tokens) | Use Case |
|---|---|---|---|
| T1 | Llama 4 Scout (Workers AI) | FREE | Greetings, simple queries |
| T2 | Gemini 2.0 Flash | $0.10/$0.40 | Order parsing, modifiers |
| T3 | Gemini 3 Flash | $0.50/$3.00 | Complex conversations |
| T4 | Claude Haiku 4.5 | $1.00/$5.00 | Ambiguity resolution |
Cost Savings
The tiered approach provides 95%+ cost savings compared to using a single high-tier model for all requests.
Error Responses
Session Not Found (404)
{
"error": {
"code": "SESSION_NOT_FOUND",
"message": "Voice session does not exist or has expired"
}
}
Audio Processing Error (400)
{
"error": {
"code": "AUDIO_PROCESSING_ERROR",
"message": "Could not process audio",
"details": "Audio format not supported"
}
}
What's in This Section
| Page | Description |
|---|---|
| Voice Sessions | Start, process, get state, cancel, clarify, and complete voice ordering sessions |
| NLU & Intent (Hey Maximus) | Intent types, multi-language NLU, menu RAG search, modifiers, noise preprocessing, allergen detection |
| Drive-Thru | Drive-thru lane management, session phases, lane types, and drive-thru specific endpoints |
| Streaming & Configuration | WebSocket streaming audio, voice configuration, and analytics |
| Offline Queue | Offline command queuing, priority levels, sync, and dead letter queue |
Related Documentation
- AI Gateway - AI infrastructure
- Voice AI Manager Guide - Voice AI setup
- Orders API - Order processing
- Drive-Thru API - Drive-thru operations
- Drive-Thru Voice AI Guide - Staff guide