Skip to main content
Authenticated API

Voice AI endpoints require a valid JWT Bearer token with staff roles. Accessible via the API gateway at /v1/voice-ai/* and /v1/speech/*.

Voice AI API

Complete voice-enabled ordering for drive-thru and phone systems with multi-language NLU, semantic menu search, and offline resilience.

Base Path: /api/v1/voice

Overview

The Voice AI API enables natural conversation ordering:

FeatureDescription
Speech RecognitionConvert audio to text with noise handling
Multi-Language NLU6 languages: English, Spanish, French, German, Portuguese, Chinese
Menu RAG SearchSemantic menu matching via Cloudflare Vectorize
Intent Parsing20+ intent types including drive-thru specific
Modifier ExtractionAdd, remove, substitute, quantity, preparation
ACP AI RouterCost-optimized tiered inference (95%+ savings)
Offline QueueRedis-backed command queuing with retry
Drive-Thru LanesLane-aware sessions with speed metrics

Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│ Voice AI Pipeline │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ Customer Audio → STT → NLU Engine → Order Builder → TTS → Audio │
│ │ │ │ │
│ ↓ ↓ ↓ │
│ Transcript Intent Cart Update │
│ │ │ │ │
│ └─────────┼──────────────┘ │
│ ↓ │
│ ┌─────────────────┐ │
│ │ ACP Router │ │
│ │ (Tier Selection)│ │
│ └────────┬────────┘ │
│ │ │
│ ┌────────────────────┼────────────────────┐ │
│ ↓ ↓ ↓ │
│ T1: Llama 4 T2: Gemini 2.0 T4: Claude Haiku │
│ (FREE) ($0.10/M) ($1.00/M) │
│ Simple queries Complex orders Ambiguity │
└─────────────────────────────────────────────────────────────────────────┘

AI Model Tiers

Voice AI uses the ACP AI Router for cost-optimized inference:

TierModelCost (per M tokens)Use Case
T1Llama 4 Scout (Workers AI)FREEGreetings, simple queries
T2Gemini 2.0 Flash$0.10/$0.40Order parsing, modifiers
T3Gemini 3 Flash$0.50/$3.00Complex conversations
T4Claude Haiku 4.5$1.00/$5.00Ambiguity resolution
Cost Savings

The tiered approach provides 95%+ cost savings compared to using a single high-tier model for all requests.


Error Responses

Session Not Found (404)

{
"error": {
"code": "SESSION_NOT_FOUND",
"message": "Voice session does not exist or has expired"
}
}

Audio Processing Error (400)

{
"error": {
"code": "AUDIO_PROCESSING_ERROR",
"message": "Could not process audio",
"details": "Audio format not supported"
}
}

What's in This Section

PageDescription
Voice SessionsStart, process, get state, cancel, clarify, and complete voice ordering sessions
NLU & Intent (Hey Maximus)Intent types, multi-language NLU, menu RAG search, modifiers, noise preprocessing, allergen detection
Drive-ThruDrive-thru lane management, session phases, lane types, and drive-thru specific endpoints
Streaming & ConfigurationWebSocket streaming audio, voice configuration, and analytics
Offline QueueOffline command queuing, priority levels, sync, and dead letter queue