Authenticated API
This endpoint requires a valid JWT Bearer token. Accessible via the API gateway at /v1/ai/*.
ML Training API
Train machine learning models using historical data from ClickHouse, with automated feature engineering and Vertex AI integration.
Overview
| Attribute | Value |
|---|---|
| Base Path | /api/v1/ml/training |
| Authentication | Bearer Token |
| Required Roles | ml_engineer, data_scientist, analytics_admin, platform_admin, system_admin, super_admin |
Key Features
- Data Extraction - Pull training data from ClickHouse analytics
- Feature Engineering - Automated time, lag, and rolling features
- Dataset Preparation - Train/validation/test splits with scaling
- Vertex AI Integration - Deploy trained models to GCP
Model Types
Available Models
| Model Type | Description | Input Data |
|---|---|---|
demand_forecast | Predict hourly/daily demand | Sales history |
staffing_forecast | Optimal staff scheduling | Labor + Sales history |
inventory_forecast | Inventory reorder prediction | Inventory history |
churn_prediction | Customer churn risk | Customer activity |
menu_optimization | Item performance prediction | Sales + Menu data |
List Model Types
GET /api/v1/ml/training/model-types
Response
{
"model_types": [
{
"id": "demand_forecast",
"name": "Demand Forecasting",
"description": "Predict customer demand by hour/day",
"required_data": ["sales_history"],
"minimum_records": 1000,
"recommended_records": 10000,
"features": ["time_features", "lag_features", "rolling_features"]
},
{
"id": "staffing_forecast",
"name": "Staffing Optimization",
"description": "Predict optimal staff levels",
"required_data": ["sales_history", "labor_history"],
"minimum_records": 500,
"recommended_records": 5000
}
]
}
Data Availability
Check Data Availability
GET /api/v1/ml/training/data-availability
Check what historical data is available for training.
Query Parameters
| Parameter | Type | Description |
|---|---|---|
location_id | uuid | Filter by location |
model_type | string | Check for specific model type |
Response
{
"location_id": "loc_001",
"data_availability": {
"sales_history": {
"available": true,
"record_count": 45230,
"date_range": {
"start": "2024-01-01",
"end": "2025-12-31"
},
"completeness": 0.98
},
"labor_history": {
"available": true,
"record_count": 12450,
"date_range": {
"start": "2024-03-01",
"end": "2025-12-31"
},
"completeness": 0.95
},
"inventory_history": {
"available": true,
"record_count": 28340,
"date_range": {
"start": "2024-01-01",
"end": "2025-12-31"
},
"completeness": 0.92
}
},
"trainable_models": [
"demand_forecast",
"staffing_forecast",
"inventory_forecast"
]
}
Data Extraction
Extract Training Data
POST /api/v1/ml/training/extract
Extract and prepare raw training data from ClickHouse.
Request Body
{
"location_id": "loc_001",
"model_type": "demand_forecast",
"date_range": {
"start": "2024-01-01",
"end": "2025-12-31"
},
"granularity": "hourly",
"include_features": true
}
Response
{
"extraction_id": "ext_abc123",
"status": "completed",
"records_extracted": 17520,
"features_generated": 45,
"feature_groups": {
"time_features": ["hour_of_day", "day_of_week", "month", "is_weekend", "is_holiday"],
"lag_features": ["demand_lag_1h", "demand_lag_24h", "demand_lag_7d"],
"rolling_features": ["demand_rolling_mean_24h", "demand_rolling_std_24h"]
},
"output_location": "gs://olympus-ml-data/extractions/ext_abc123.parquet"
}
Dataset Preparation
Prepare Dataset
POST /api/v1/ml/training/datasets/prepare
Prepare a training dataset with proper splits and scaling.
Request Body
{
"extraction_id": "ext_abc123",
"dataset_name": "demand_forecast_loc001_2024",
"split_config": {
"train_ratio": 0.7,
"validation_ratio": 0.15,
"test_ratio": 0.15,
"shuffle": true,
"random_seed": 42
},
"scaling": {
"method": "standard",
"fit_on": "train"
},
"target_column": "demand"
}
Response
{
"dataset_id": "ds_xyz789",
"name": "demand_forecast_loc001_2024",
"status": "ready",
"splits": {
"train": {
"records": 12264,
"location": "gs://olympus-ml-data/datasets/ds_xyz789/train.parquet"
},
"validation": {
"records": 2628,
"location": "gs://olympus-ml-data/datasets/ds_xyz789/validation.parquet"
},
"test": {
"records": 2628,
"location": "gs://olympus-ml-data/datasets/ds_xyz789/test.parquet"
}
},
"feature_stats": {
"total_features": 45,
"numeric_features": 42,
"categorical_features": 3
},
"scaling_params": {
"method": "standard",
"mean": {...},
"std": {...}
},
"created_at": "2026-01-24T20:30:00Z"
}
List Datasets
GET /api/v1/ml/training/datasets
Query Parameters
| Parameter | Type | Description |
|---|---|---|
location_id | uuid | Filter by location |
model_type | string | Filter by model type |
status | string | ready, preparing, failed |
Response
{
"datasets": [
{
"dataset_id": "ds_xyz789",
"name": "demand_forecast_loc001_2024",
"model_type": "demand_forecast",
"location_id": "loc_001",
"status": "ready",
"total_records": 17520,
"created_at": "2026-01-24T20:30:00Z"
}
],
"total": 5
}
Get Dataset
GET /api/v1/ml/training/datasets/{dataset_id}
Get Dataset Split Data
GET /api/v1/ml/training/datasets/{dataset_id}/data/{split}
Path Parameters
| Parameter | Description |
|---|---|
split | train, validation, or test |
Training Jobs
Start Training Job
POST /api/v1/ml/training/start
Start a model training job on Vertex AI.
Request Body
{
"dataset_id": "ds_xyz789",
"model_type": "demand_forecast",
"model_name": "demand_loc001_v1",
"hyperparameters": {
"learning_rate": 0.001,
"epochs": 100,
"batch_size": 32,
"hidden_layers": [128, 64, 32]
},
"training_config": {
"machine_type": "n1-standard-4",
"accelerator_type": "NVIDIA_TESLA_T4",
"accelerator_count": 1
},
"auto_deploy": true
}
Response
{
"job_id": "job_train_001",
"status": "queued",
"model_name": "demand_loc001_v1",
"dataset_id": "ds_xyz789",
"vertex_job_id": "projects/olympus/locations/us-central1/trainingPipelines/12345",
"estimated_duration_minutes": 45,
"created_at": "2026-01-24T21:00:00Z"
}
Get Training Job Status
GET /api/v1/ml/training/{job_id}/status
Response
{
"job_id": "job_train_001",
"status": "training",
"progress": 0.65,
"current_epoch": 65,
"total_epochs": 100,
"metrics": {
"train_loss": 0.0234,
"validation_loss": 0.0312,
"train_mae": 12.5,
"validation_mae": 15.2
},
"started_at": "2026-01-24T21:05:00Z",
"estimated_completion": "2026-01-24T21:45:00Z"
}
List Training Jobs
GET /api/v1/ml/training/jobs
Query Parameters
| Parameter | Type | Description |
|---|---|---|
status | string | queued, training, completed, failed |
location_id | uuid | Filter by location |
model_type | string | Filter by model type |
Trained Models
List Trained Models
GET /api/v1/ml/training/models
Response
{
"models": [
{
"model_id": "model_demand_001",
"name": "demand_loc001_v1",
"model_type": "demand_forecast",
"location_id": "loc_001",
"status": "deployed",
"metrics": {
"test_mae": 14.3,
"test_rmse": 18.7,
"test_r2": 0.89
},
"vertex_model_id": "projects/olympus/locations/us-central1/models/demand_001",
"endpoint_id": "projects/olympus/locations/us-central1/endpoints/ep_001",
"created_at": "2026-01-24T22:00:00Z"
}
],
"total": 3
}
Get Model Details
GET /api/v1/ml/training/models/{model_id}
Response
{
"model_id": "model_demand_001",
"name": "demand_loc001_v1",
"model_type": "demand_forecast",
"location_id": "loc_001",
"status": "deployed",
"training_job_id": "job_train_001",
"dataset_id": "ds_xyz789",
"hyperparameters": {
"learning_rate": 0.001,
"epochs": 100,
"batch_size": 32,
"hidden_layers": [128, 64, 32]
},
"metrics": {
"train": {
"mae": 10.2,
"rmse": 13.5,
"r2": 0.94
},
"validation": {
"mae": 13.1,
"rmse": 16.8,
"r2": 0.91
},
"test": {
"mae": 14.3,
"rmse": 18.7,
"r2": 0.89
}
},
"feature_importance": [
{"feature": "hour_of_day", "importance": 0.23},
{"feature": "day_of_week", "importance": 0.18},
{"feature": "demand_lag_24h", "importance": 0.15}
],
"vertex_model_id": "projects/olympus/locations/us-central1/models/demand_001",
"endpoint_id": "projects/olympus/locations/us-central1/endpoints/ep_001",
"created_at": "2026-01-24T22:00:00Z",
"deployed_at": "2026-01-24T22:15:00Z"
}
Feature Engineering
Time Features
| Feature | Description |
|---|---|
hour_of_day | Hour (0-23) |
day_of_week | Day (0-6, Monday=0) |
day_of_month | Day (1-31) |
month | Month (1-12) |
quarter | Quarter (1-4) |
is_weekend | Boolean |
is_holiday | Boolean (US holidays) |
Lag Features
| Feature | Description |
|---|---|
demand_lag_1h | Demand 1 hour ago |
demand_lag_24h | Demand 24 hours ago |
demand_lag_7d | Demand 7 days ago |
demand_lag_14d | Demand 14 days ago |
Rolling Features
| Feature | Description |
|---|---|
demand_rolling_mean_24h | 24-hour rolling mean |
demand_rolling_std_24h | 24-hour rolling std |
demand_rolling_mean_7d | 7-day rolling mean |
demand_rolling_max_7d | 7-day rolling max |
Webhooks
| Event | Description |
|---|---|
training.started | Training job started |
training.progress | Training progress update |
training.completed | Training completed successfully |
training.failed | Training failed |
model.deployed | Model deployed to endpoint |
Error Responses
| Status | Code | Description |
|---|---|---|
| 400 | insufficient_data | Not enough historical data |
| 400 | invalid_hyperparameters | Invalid hyperparameter values |
| 404 | dataset_not_found | Dataset ID not found |
| 409 | training_in_progress | Training already in progress for location |
| 500 | vertex_error | Vertex AI error |
Related Documentation
- Data Ingestion API - Import historical data
- ML Forecasting API - Use trained models
- Recommendations API - ML-powered recommendations
- Analytics Dashboard - View predictions