Internal Service API

These endpoints are called internally by the Go API Gateway's chef_mode_voice.go handler. Client applications connect via WebSocket at /api/v1/chef-mode/voice/ws through the gateway rather than calling these endpoints directly.

Voice AI Chef Mode API

REST endpoints powering the Chef Mode voice assistant for hands-free kitchen operation. Chef Mode enables kitchen staff to interact with AI using voice commands while keeping their hands free during food preparation and service.

The system supports recipe guidance, ingredient substitutions, cooking timers, plating suggestions, food safety information, and station-specific assistance (grill, saute, fry, pastry, prep).

Issue References: #492 (Chef Mode Voice WebSocket), Epic #705 (Conversational AI Interface)

Overview

Attribute	Value
Base Path	`/api/ai/voice`
Router Tag	`voice-ai`
Authentication	Internal service-to-service (Go Gateway to Python Analytics)
AI Model Tier	T2 (Gemini 2.0 Flash) for fast, low-latency responses
Max Context	Last 10 conversation turns
Max Response Tokens	500 (concise kitchen-friendly answers)

Architecture Flow

Flutter Client (KDS Shell)
        |
        | WebSocket: /api/v1/chef-mode/voice/ws
        v
Go API Gateway (chef_mode_voice.go)
        |
        | REST: POST /api/ai/voice/stream
        | REST: POST /api/ai/voice/query
        v
Python Analytics Service (voice_ai_chef_routes.py)
        |
        | Speech-to-Text / Text-to-Speech
        | ACP AI Router (T2 tier)
        v
AI Response returned to client

Health Check

Check if the Voice AI Chef Mode service is healthy and available.

GET /api/ai/voice/health

Response

{
  "status": "healthy",
  "service": "voice-ai-chef"
}

Field	Type	Description
`status`	string	Service health status: `"healthy"` or `"unhealthy"`
`service`	string	Service identifier

Stream Audio for Transcription

Send a base64-encoded audio chunk for speech-to-text transcription. The Go API Gateway calls this endpoint for each audio chunk received over the WebSocket connection from the Flutter client.

POST /api/ai/voice/stream
Content-Type: application/json

Request Body

{
  "session_id": "voice-1708012345678",
  "audio": "UklGRiQAAABXQVZFZm10IBAAAAABAAEA...",
  "encoding": "LINEAR16",
  "sample_rate_hertz": 16000
}

Field	Type	Required	Default	Description
`session_id`	string	Yes	--	Voice session identifier
`audio`	string	Yes	--	Base64-encoded audio data
`encoding`	string	No	`"LINEAR16"`	Audio encoding format
`sample_rate_hertz`	integer	No	`16000`	Audio sample rate in Hz

Response

{
  "session_id": "voice-1708012345678",
  "transcript": "How long should I sear the ribeye?",
  "is_final": true,
  "confidence": 0.94,
  "latency_ms": 185
}

Field	Type	Description
`session_id`	string	Voice session identifier
`transcript`	string or null	Transcribed text, or `null` if no speech detected
`is_final`	boolean	Whether this is a final transcription result
`confidence`	float	Transcription confidence score (0.0 to 1.0)
`latency_ms`	integer	Processing time in milliseconds

Process Text Query

Send a text query to the Chef Mode AI assistant and receive a contextual response. This endpoint is called by the Go API Gateway after transcription completes, or when the user sends a text query directly.

The AI uses a kitchen-specialized system prompt and the T2 model tier (Gemini 2.0 Flash) for fast, low-latency responses optimized for service environments.

POST /api/ai/voice/query
Content-Type: application/json

Request Body

{
  "session_id": "voice-1708012345678",
  "text": "How long should I sear the ribeye?",
  "context": [
    {
      "role": "user",
      "content": "What temp for medium-rare ribeye?",
      "timestamp": "2026-02-20T18:30:00Z"
    },
    {
      "role": "assistant",
      "content": "For medium-rare ribeye, pull it off the heat at 130F internal. It will carry over to about 135F while resting.",
      "timestamp": "2026-02-20T18:30:01Z"
    }
  ],
  "tenant_id": "550e8400-e29b-41d4-a716-446655449100",
  "location_id": "550e8400-e29b-41d4-a716-446655449110",
  "station": "grill"
}

Field	Type	Required	Description
`session_id`	string	Yes	Voice session identifier
`text`	string	Yes	Text query from the user
`context`	array	No	Previous conversation turns (max 10 retained)
`context[].role`	string	Yes	`"user"` or `"assistant"`
`context[].content`	string	Yes	Message content
`context[].timestamp`	string	No	ISO 8601 timestamp
`tenant_id`	string	Yes	Tenant identifier
`location_id`	string	No	Location identifier
`station`	string	No	Kitchen station: `"grill"`, `"saute"`, `"fry"`, `"pastry"`, `"prep"`, etc.

Response

{
  "session_id": "voice-1708012345678",
  "text": "For a 1-inch ribeye, sear 3-4 minutes per side on high heat. Use the hand test or a thermometer to check doneness. Let it rest 5 minutes before plating.",
  "audio": null,
  "latency_ms": 320
}

Field	Type	Description
`session_id`	string	Voice session identifier
`text`	string	AI response text
`audio`	string or null	Base64-encoded audio response (reserved for future TTS integration)
`latency_ms`	integer	Processing time in milliseconds

Station Context

When a station value is provided, the AI system prompt is augmented with station-specific context, improving the relevance of responses for that kitchen area.

Station	Guidance Focus
`grill`	Searing, temperatures, timing, grill marks
`saute`	Pan techniques, heat control, sauce building
`fry`	Oil temperatures, breading, drain times
`pastry`	Baking temps, dough handling, decoration
`prep`	Knife work, mise en place, batch prep

Graceful Degradation

If the AI service is unavailable, the endpoint returns a fallback response instead of an error, ensuring the voice session can continue:

{
  "session_id": "voice-1708012345678",
  "text": "I'm sorry, I couldn't process your request at this time. Please try again.",
  "audio": null,
  "latency_ms": 5
}

Transcribe and Respond

Combined one-shot endpoint that transcribes audio and processes the result with AI in a single request. Useful for simpler interaction flows where streaming is not needed.

POST /api/ai/voice/transcribe-and-respond
Content-Type: application/json

Request Body

{
  "session_id": "voice-1708012345678",
  "audio": "UklGRiQAAABXQVZFZm10IBAAAAABAAEA...",
  "context": [],
  "tenant_id": "550e8400-e29b-41d4-a716-446655449100",
  "station": "grill"
}

Field	Type	Required	Description
`session_id`	string	Yes	Voice session identifier
`audio`	string	Yes	Base64-encoded audio data
`context`	array	No	Previous conversation turns
`tenant_id`	string	No	Tenant identifier
`station`	string	No	Kitchen station identifier

Response

Returns a VoiceQueryResponse with the AI response to the transcribed audio:

{
  "session_id": "voice-1708012345678",
  "text": "For a 1-inch ribeye, sear 3-4 minutes per side on high heat. Let it rest 5 minutes before plating.",
  "audio": null,
  "latency_ms": 520
}

If transcription fails or produces no usable text, the endpoint returns a prompt to retry:

{
  "session_id": "voice-1708012345678",
  "text": "I didn't catch that. Could you please repeat?",
  "audio": null,
  "latency_ms": 150
}

Get Session Context

Retrieve conversation context for an active voice session.

GET /api/ai/voice/sessions/{session_id}/context

Path Parameters

Parameter	Type	Description
`session_id`	string	Voice session identifier

Response

{
  "session_id": "voice-1708012345678",
  "context": [],
  "message": "Session context is managed client-side"
}

Field	Type	Description
`session_id`	string	Voice session identifier
`context`	array	Conversation turns (currently empty; context is managed client-side)
`message`	string	Status message

note

Session context is currently managed client-side in the Go API Gateway's VoiceSession struct. The gateway maintains up to 10 conversation turns per session and passes them to the /query endpoint with each request. This endpoint is reserved for future server-side session persistence via Redis or Cloud Spanner.

Error Handling

HTTP Status Codes

Status	Description
200	Success
400	Invalid request (e.g., malformed base64 audio)
500	Internal server error

Error Response Format

{
  "detail": "Invalid base64 audio data"
}

Common Errors

Error	Endpoint	Cause
`Invalid base64 audio data`	`/stream`	The `audio` field contains data that is not valid base64
`I'm sorry, I couldn't process your request at this time.`	`/query`	AI Gateway unreachable or returned an error (returned as 200 with fallback text)
`I didn't catch that. Could you please repeat?`	`/transcribe-and-respond`	Transcription failed or produced no usable text (returned as 200 with fallback text)

Resilience Behavior

The /query and /transcribe-and-respond endpoints are designed to return fallback text responses (HTTP 200) rather than error status codes when the AI service is unavailable. This ensures the WebSocket voice session remains active and the user receives feedback, even during partial outages.

Data Models

VoiceStreamRequest

Field	Type	Required	Default	Description
`session_id`	string	Yes	--	Voice session ID
`audio`	string	Yes	--	Base64-encoded audio data
`encoding`	string	No	`"LINEAR16"`	Audio encoding format
`sample_rate_hertz`	integer	No	`16000`	Sample rate in Hz

VoiceStreamResponse

Field	Type	Description
`session_id`	string	Voice session ID
`transcript`	string or null	Transcribed text
`is_final`	boolean	Whether the transcription is final
`confidence`	float	Confidence score (0.0-1.0)
`latency_ms`	integer	Processing latency in milliseconds

VoiceQueryRequest

Field	Type	Required	Description
`session_id`	string	Yes	Voice session ID
`text`	string	Yes	Text query from user
`context`	array of ConversationTurn	No	Conversation history
`tenant_id`	string	Yes	Tenant identifier
`location_id`	string	No	Location identifier
`station`	string	No	Kitchen station (grill, saute, fry, pastry, prep)

VoiceQueryResponse

Field	Type	Description
`session_id`	string	Voice session ID
`text`	string	AI response text
`audio`	string or null	Base64-encoded audio (reserved for future TTS)
`latency_ms`	integer	Processing latency in milliseconds

ConversationTurn

Field	Type	Required	Description
`role`	string	Yes	`"user"` or `"assistant"`
`content`	string	Yes	Message content
`timestamp`	string	No	ISO 8601 timestamp

Voice AI API - Full Voice AI API reference (Hey Maximus)
Voice Sessions - Voice session lifecycle management
AI Gateway - ACP AI Router and model tiers
KDS API - Kitchen Display System integration

Overview​

Architecture Flow​

Health Check​

Stream Audio for Transcription​

Process Text Query​

Station Context​

Graceful Degradation​

Transcribe and Respond​

Get Session Context​

Error Handling​

HTTP Status Codes​

Error Response Format​

Common Errors​

Resilience Behavior​

Data Models​

VoiceStreamRequest​

VoiceStreamResponse​

VoiceQueryRequest​

VoiceQueryResponse​

ConversationTurn​

Related Resources​

Overview

Architecture Flow

Health Check

Stream Audio for Transcription

Process Text Query

Station Context

Graceful Degradation

Transcribe and Respond

Get Session Context

Error Handling

HTTP Status Codes

Error Response Format

Common Errors

Resilience Behavior

Data Models

VoiceStreamRequest

VoiceStreamResponse

VoiceQueryRequest

VoiceQueryResponse

ConversationTurn

Related Resources