Skip to main content
Authenticated API

This endpoint requires a valid JWT Bearer token. Accessible via the API gateway at /v1/ai/*.

ML Training API

Train machine learning models using historical data from ClickHouse, with automated feature engineering and Vertex AI integration.

Overview

AttributeValue
Base Path/api/v1/ml/training
AuthenticationBearer Token
Required Rolesml_engineer, data_scientist, analytics_admin, platform_admin, system_admin, super_admin

Key Features

  • Data Extraction - Pull training data from ClickHouse analytics
  • Feature Engineering - Automated time, lag, and rolling features
  • Dataset Preparation - Train/validation/test splits with scaling
  • Vertex AI Integration - Deploy trained models to GCP

Model Types

Available Models

Model TypeDescriptionInput Data
demand_forecastPredict hourly/daily demandSales history
staffing_forecastOptimal staff schedulingLabor + Sales history
inventory_forecastInventory reorder predictionInventory history
churn_predictionCustomer churn riskCustomer activity
menu_optimizationItem performance predictionSales + Menu data

List Model Types

GET /api/v1/ml/training/model-types

Response

{
"model_types": [
{
"id": "demand_forecast",
"name": "Demand Forecasting",
"description": "Predict customer demand by hour/day",
"required_data": ["sales_history"],
"minimum_records": 1000,
"recommended_records": 10000,
"features": ["time_features", "lag_features", "rolling_features"]
},
{
"id": "staffing_forecast",
"name": "Staffing Optimization",
"description": "Predict optimal staff levels",
"required_data": ["sales_history", "labor_history"],
"minimum_records": 500,
"recommended_records": 5000
}
]
}

Data Availability

Check Data Availability

GET /api/v1/ml/training/data-availability

Check what historical data is available for training.

Query Parameters

ParameterTypeDescription
location_iduuidFilter by location
model_typestringCheck for specific model type

Response

{
"location_id": "loc_001",
"data_availability": {
"sales_history": {
"available": true,
"record_count": 45230,
"date_range": {
"start": "2024-01-01",
"end": "2025-12-31"
},
"completeness": 0.98
},
"labor_history": {
"available": true,
"record_count": 12450,
"date_range": {
"start": "2024-03-01",
"end": "2025-12-31"
},
"completeness": 0.95
},
"inventory_history": {
"available": true,
"record_count": 28340,
"date_range": {
"start": "2024-01-01",
"end": "2025-12-31"
},
"completeness": 0.92
}
},
"trainable_models": [
"demand_forecast",
"staffing_forecast",
"inventory_forecast"
]
}

Data Extraction

Extract Training Data

POST /api/v1/ml/training/extract

Extract and prepare raw training data from ClickHouse.

Request Body

{
"location_id": "loc_001",
"model_type": "demand_forecast",
"date_range": {
"start": "2024-01-01",
"end": "2025-12-31"
},
"granularity": "hourly",
"include_features": true
}

Response

{
"extraction_id": "ext_abc123",
"status": "completed",
"records_extracted": 17520,
"features_generated": 45,
"feature_groups": {
"time_features": ["hour_of_day", "day_of_week", "month", "is_weekend", "is_holiday"],
"lag_features": ["demand_lag_1h", "demand_lag_24h", "demand_lag_7d"],
"rolling_features": ["demand_rolling_mean_24h", "demand_rolling_std_24h"]
},
"output_location": "gs://olympus-ml-data/extractions/ext_abc123.parquet"
}

Dataset Preparation

Prepare Dataset

POST /api/v1/ml/training/datasets/prepare

Prepare a training dataset with proper splits and scaling.

Request Body

{
"extraction_id": "ext_abc123",
"dataset_name": "demand_forecast_loc001_2024",
"split_config": {
"train_ratio": 0.7,
"validation_ratio": 0.15,
"test_ratio": 0.15,
"shuffle": true,
"random_seed": 42
},
"scaling": {
"method": "standard",
"fit_on": "train"
},
"target_column": "demand"
}

Response

{
"dataset_id": "ds_xyz789",
"name": "demand_forecast_loc001_2024",
"status": "ready",
"splits": {
"train": {
"records": 12264,
"location": "gs://olympus-ml-data/datasets/ds_xyz789/train.parquet"
},
"validation": {
"records": 2628,
"location": "gs://olympus-ml-data/datasets/ds_xyz789/validation.parquet"
},
"test": {
"records": 2628,
"location": "gs://olympus-ml-data/datasets/ds_xyz789/test.parquet"
}
},
"feature_stats": {
"total_features": 45,
"numeric_features": 42,
"categorical_features": 3
},
"scaling_params": {
"method": "standard",
"mean": {...},
"std": {...}
},
"created_at": "2026-01-24T20:30:00Z"
}

List Datasets

GET /api/v1/ml/training/datasets

Query Parameters

ParameterTypeDescription
location_iduuidFilter by location
model_typestringFilter by model type
statusstringready, preparing, failed

Response

{
"datasets": [
{
"dataset_id": "ds_xyz789",
"name": "demand_forecast_loc001_2024",
"model_type": "demand_forecast",
"location_id": "loc_001",
"status": "ready",
"total_records": 17520,
"created_at": "2026-01-24T20:30:00Z"
}
],
"total": 5
}

Get Dataset

GET /api/v1/ml/training/datasets/{dataset_id}

Get Dataset Split Data

GET /api/v1/ml/training/datasets/{dataset_id}/data/{split}

Path Parameters

ParameterDescription
splittrain, validation, or test

Training Jobs

Start Training Job

POST /api/v1/ml/training/start

Start a model training job on Vertex AI.

Request Body

{
"dataset_id": "ds_xyz789",
"model_type": "demand_forecast",
"model_name": "demand_loc001_v1",
"hyperparameters": {
"learning_rate": 0.001,
"epochs": 100,
"batch_size": 32,
"hidden_layers": [128, 64, 32]
},
"training_config": {
"machine_type": "n1-standard-4",
"accelerator_type": "NVIDIA_TESLA_T4",
"accelerator_count": 1
},
"auto_deploy": true
}

Response

{
"job_id": "job_train_001",
"status": "queued",
"model_name": "demand_loc001_v1",
"dataset_id": "ds_xyz789",
"vertex_job_id": "projects/olympus/locations/us-central1/trainingPipelines/12345",
"estimated_duration_minutes": 45,
"created_at": "2026-01-24T21:00:00Z"
}

Get Training Job Status

GET /api/v1/ml/training/{job_id}/status

Response

{
"job_id": "job_train_001",
"status": "training",
"progress": 0.65,
"current_epoch": 65,
"total_epochs": 100,
"metrics": {
"train_loss": 0.0234,
"validation_loss": 0.0312,
"train_mae": 12.5,
"validation_mae": 15.2
},
"started_at": "2026-01-24T21:05:00Z",
"estimated_completion": "2026-01-24T21:45:00Z"
}

List Training Jobs

GET /api/v1/ml/training/jobs

Query Parameters

ParameterTypeDescription
statusstringqueued, training, completed, failed
location_iduuidFilter by location
model_typestringFilter by model type

Trained Models

List Trained Models

GET /api/v1/ml/training/models

Response

{
"models": [
{
"model_id": "model_demand_001",
"name": "demand_loc001_v1",
"model_type": "demand_forecast",
"location_id": "loc_001",
"status": "deployed",
"metrics": {
"test_mae": 14.3,
"test_rmse": 18.7,
"test_r2": 0.89
},
"vertex_model_id": "projects/olympus/locations/us-central1/models/demand_001",
"endpoint_id": "projects/olympus/locations/us-central1/endpoints/ep_001",
"created_at": "2026-01-24T22:00:00Z"
}
],
"total": 3
}

Get Model Details

GET /api/v1/ml/training/models/{model_id}

Response

{
"model_id": "model_demand_001",
"name": "demand_loc001_v1",
"model_type": "demand_forecast",
"location_id": "loc_001",
"status": "deployed",
"training_job_id": "job_train_001",
"dataset_id": "ds_xyz789",
"hyperparameters": {
"learning_rate": 0.001,
"epochs": 100,
"batch_size": 32,
"hidden_layers": [128, 64, 32]
},
"metrics": {
"train": {
"mae": 10.2,
"rmse": 13.5,
"r2": 0.94
},
"validation": {
"mae": 13.1,
"rmse": 16.8,
"r2": 0.91
},
"test": {
"mae": 14.3,
"rmse": 18.7,
"r2": 0.89
}
},
"feature_importance": [
{"feature": "hour_of_day", "importance": 0.23},
{"feature": "day_of_week", "importance": 0.18},
{"feature": "demand_lag_24h", "importance": 0.15}
],
"vertex_model_id": "projects/olympus/locations/us-central1/models/demand_001",
"endpoint_id": "projects/olympus/locations/us-central1/endpoints/ep_001",
"created_at": "2026-01-24T22:00:00Z",
"deployed_at": "2026-01-24T22:15:00Z"
}

Feature Engineering

Time Features

FeatureDescription
hour_of_dayHour (0-23)
day_of_weekDay (0-6, Monday=0)
day_of_monthDay (1-31)
monthMonth (1-12)
quarterQuarter (1-4)
is_weekendBoolean
is_holidayBoolean (US holidays)

Lag Features

FeatureDescription
demand_lag_1hDemand 1 hour ago
demand_lag_24hDemand 24 hours ago
demand_lag_7dDemand 7 days ago
demand_lag_14dDemand 14 days ago

Rolling Features

FeatureDescription
demand_rolling_mean_24h24-hour rolling mean
demand_rolling_std_24h24-hour rolling std
demand_rolling_mean_7d7-day rolling mean
demand_rolling_max_7d7-day rolling max

Webhooks

EventDescription
training.startedTraining job started
training.progressTraining progress update
training.completedTraining completed successfully
training.failedTraining failed
model.deployedModel deployed to endpoint

Error Responses

StatusCodeDescription
400insufficient_dataNot enough historical data
400invalid_hyperparametersInvalid hyperparameter values
404dataset_not_foundDataset ID not found
409training_in_progressTraining already in progress for location
500vertex_errorVertex AI error