Authenticated API

Voice AI endpoints require a valid JWT Bearer token with staff roles. Accessible via the API gateway at /v1/voice-ai/* and /v1/speech/*.

Voice AI API

Complete voice-enabled ordering for drive-thru and phone systems with multi-language NLU, semantic menu search, and offline resilience.

Base Path: /api/v1/voice

Overview

The Voice AI API enables natural conversation ordering:

Feature	Description
Speech Recognition	Convert audio to text with noise handling
Multi-Language NLU	6 languages: English, Spanish, French, German, Portuguese, Chinese
Menu RAG Search	Semantic menu matching via Cloudflare Vectorize
Intent Parsing	20+ intent types including drive-thru specific
Modifier Extraction	Add, remove, substitute, quantity, preparation
ACP AI Router	Cost-optimized tiered inference (95%+ savings)
Offline Queue	Redis-backed command queuing with retry
Drive-Thru Lanes	Lane-aware sessions with speed metrics

Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│                           Voice AI Pipeline                              │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  Customer Audio → STT → NLU Engine → Order Builder → TTS → Audio        │
│                    │         │              │                            │
│                    ↓         ↓              ↓                            │
│              Transcript   Intent      Cart Update                        │
│                    │         │              │                            │
│                    └─────────┼──────────────┘                            │
│                              ↓                                           │
│                    ┌─────────────────┐                                   │
│                    │   ACP Router    │                                   │
│                    │  (Tier Selection)│                                  │
│                    └────────┬────────┘                                   │
│                             │                                            │
│        ┌────────────────────┼────────────────────┐                       │
│        ↓                    ↓                    ↓                       │
│   T1: Llama 4          T2: Gemini 2.0       T4: Claude Haiku            │
│   (FREE)               ($0.10/M)            ($1.00/M)                   │
│   Simple queries       Complex orders       Ambiguity                   │
└─────────────────────────────────────────────────────────────────────────┘

AI Model Tiers

Voice AI uses the ACP AI Router for cost-optimized inference:

Tier	Model	Cost (per M tokens)	Use Case
T1	Llama 4 Scout (Workers AI)	FREE	Greetings, simple queries
T2	Gemini 2.0 Flash	$0.10/$0.40	Order parsing, modifiers
T3	Gemini 3 Flash	$0.50/$3.00	Complex conversations
T4	Claude Haiku 4.5	$1.00/$5.00	Ambiguity resolution

Cost Savings

The tiered approach provides 95%+ cost savings compared to using a single high-tier model for all requests.

Error Responses

Session Not Found (404)

{
  "error": {
    "code": "SESSION_NOT_FOUND",
    "message": "Voice session does not exist or has expired"
  }
}

Audio Processing Error (400)

{
  "error": {
    "code": "AUDIO_PROCESSING_ERROR",
    "message": "Could not process audio",
    "details": "Audio format not supported"
  }
}

What's in This Section

Page	Description
Voice Sessions	Start, process, get state, cancel, clarify, and complete voice ordering sessions
NLU & Intent (Hey Maximus)	Intent types, multi-language NLU, menu RAG search, modifiers, noise preprocessing, allergen detection
Drive-Thru	Drive-thru lane management, session phases, lane types, and drive-thru specific endpoints
Streaming & Configuration	WebSocket streaming audio, voice configuration, and analytics
Offline Queue	Offline command queuing, priority levels, sync, and dead letter queue

AI Gateway - AI infrastructure
Voice AI Manager Guide - Voice AI setup
Orders API - Order processing
Drive-Thru API - Drive-thru operations
Drive-Thru Voice AI Guide - Staff guide

Overview​

Architecture​

AI Model Tiers​

Error Responses​

Session Not Found (404)​

Audio Processing Error (400)​

What's in This Section​

Related Documentation​