Analytics Platform (ClickHouse Cloud)

Real-time analytics infrastructure powered by ClickHouse Cloud with edge aggregation for sub-second query performance across all Olympus products.

Overview

The analytics platform provides:

Sub-second queries for real-time dashboards
High concurrency (1000+ queries/second per node)
Edge aggregation via Cloudflare Durable Objects
Cost efficiency (70%+ savings vs BigQuery)
Unified analytics across all products

┌─────────────────────────────────────────────────────────────────┐
│                    ANALYTICS ARCHITECTURE                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │                    Edge Layer                            │   │
│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐      │   │
│  │  │  Analytics  │  │  Analytics  │  │  Analytics  │      │   │
│  │  │  Aggregator │  │  Aggregator │  │  Aggregator │      │   │
│  │  │  (DO)       │  │  (DO)       │  │  (DO)       │      │   │
│  │  └──────┬──────┘  └──────┬──────┘  └──────┬──────┘      │   │
│  │         │                │                │              │   │
│  │         └────────────────┼────────────────┘              │   │
│  │                          │                               │   │
│  └──────────────────────────┼───────────────────────────────┘   │
│                             ▼                                    │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │                 ClickHouse Cloud                         │   │
│  │  ┌─────────────────────────────────────────────────┐    │   │
│  │  │  Core Analytics    │  Creator Analytics          │    │   │
│  │  │  - Orders          │  - Engagement               │    │   │
│  │  │  - Payments        │  - Revenue                  │    │   │
│  │  │  - Inventory       │  - Audience                 │    │   │
│  │  │  - Labor           │  - Social                   │    │   │
│  │  └─────────────────────────────────────────────────┘    │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Architecture Components

Edge Aggregation Layer

Cloudflare Durable Objects provide real-time pre-aggregation:

Component	Function	Latency
Analytics Aggregator DO	Pre-aggregates metrics by tenant	under 10ms
Dashboard Cache	Caches common dashboard queries	under 5ms
Real-time Counter	Live order/revenue counters	under 1ms

Edge aggregation benefits:

Reduces ClickHouse query volume by 80%
Enables sub-100ms dashboard refresh
Handles burst traffic (peak hours)
Geographic distribution for global users

ClickHouse Cloud

OLAP database optimized for analytics workloads:

Feature	Specification
Query Performance	Sub-second for most queries
Concurrency	1000+ concurrent queries/node
Storage	Columnar compression (10-20x)
Retention	90 days hot, 2 years cold
Replication	3x replication for durability

Analytics REST API

Query analytics data via the REST API:

# Get sales summary
curl -s "https://dev.api.olympuscloud.ai/v1/analytics/sales/summary?start_date=2026-01-01&end_date=2026-01-31&granularity=day" \
  -H "Authorization: Bearer $TOKEN" | jq .

# Get revenue analytics
curl -s "https://dev.api.olympuscloud.ai/v1/analytics/revenue?period=monthly" \
  -H "Authorization: Bearer $TOKEN" | jq .

The Python analytics service uses FastAPI route modules internally to serve these endpoints. ClickHouse queries are executed server-side with automatic caching.

Database Schema

Core Analytics Tables

orders_analytics

CREATE TABLE orders_analytics (
    tenant_id String,
    location_id String,
    order_id String,
    order_date DateTime,
    order_type Enum('dine_in', 'takeout', 'delivery', 'drive_thru'),
    subtotal Decimal(10, 2),
    tax Decimal(10, 2),
    tip Decimal(10, 2),
    total Decimal(10, 2),
    item_count UInt16,
    guest_count UInt8,
    payment_method String,
    employee_id String,
    -- Materialized columns for fast aggregation
    order_hour UInt8 MATERIALIZED toHour(order_date),
    order_day Date MATERIALIZED toDate(order_date),
    order_week Date MATERIALIZED toStartOfWeek(order_date)
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(order_date)
ORDER BY (tenant_id, location_id, order_date, order_id)
TTL order_date + INTERVAL 90 DAY

labor_analytics

CREATE TABLE labor_analytics (
    tenant_id String,
    location_id String,
    employee_id String,
    shift_date Date,
    clock_in DateTime,
    clock_out Nullable(DateTime),
    scheduled_hours Decimal(5, 2),
    actual_hours Decimal(5, 2),
    overtime_hours Decimal(5, 2),
    hourly_rate Decimal(8, 2),
    labor_cost Decimal(10, 2),
    role String,
    department String
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(shift_date)
ORDER BY (tenant_id, location_id, shift_date, employee_id)

inventory_analytics

CREATE TABLE inventory_analytics (
    tenant_id String,
    location_id String,
    item_id String,
    recorded_at DateTime,
    quantity_on_hand Decimal(10, 3),
    quantity_reserved Decimal(10, 3),
    unit_cost Decimal(10, 4),
    total_value Decimal(12, 2),
    reorder_point Decimal(10, 3),
    days_of_stock Decimal(5, 1)
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(recorded_at)
ORDER BY (tenant_id, location_id, recorded_at, item_id)

Creator Analytics Tables

engagement_analytics

CREATE TABLE engagement_analytics (
    tenant_id String,
    creator_id String,
    persona_id String,
    interaction_time DateTime,
    interaction_type Enum('chat', 'voice', 'video', 'tip', 'subscription'),
    duration_seconds UInt32,
    message_count UInt16,
    sentiment_score Float32,
    revenue Decimal(10, 2),
    platform String,
    country_code String
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(interaction_time)
ORDER BY (tenant_id, creator_id, interaction_time)

Query Patterns

Common Dashboard Queries

Sales Summary

SELECT
    toStartOfDay(order_date) AS date,
    count() AS order_count,
    sum(total) AS total_sales,
    avg(total) AS avg_ticket,
    sum(tip) AS total_tips
FROM orders_analytics
WHERE tenant_id = {tenant_id:String}
  AND order_date >= {start_date:DateTime}
  AND order_date < {end_date:DateTime}
GROUP BY date
ORDER BY date

Labor Cost Analysis

SELECT
    department,
    sum(labor_cost) AS total_labor_cost,
    sum(actual_hours) AS total_hours,
    avg(hourly_rate) AS avg_hourly_rate,
    sum(overtime_hours) AS overtime_hours
FROM labor_analytics
WHERE tenant_id = {tenant_id:String}
  AND shift_date >= {start_date:Date}
  AND shift_date <= {end_date:Date}
GROUP BY department

Real-time Order Count (Edge Cached)

SELECT
    count() AS orders_today,
    sum(total) AS sales_today
FROM orders_analytics
WHERE tenant_id = {tenant_id:String}
  AND order_date >= today()

Performance Optimization

Optimization	Technique
Partitioning	Monthly partitions for time-range queries
Ordering	tenant_id first for multi-tenant isolation
Materialized views	Pre-aggregated hourly/daily summaries
Projections	Alternative sort orders for specific queries
TTL	Automatic data expiration (90 days hot)

Data Ingestion

Real-time Events

Events flow through edge aggregators:

App Events → Cloudflare Worker → Analytics Aggregator DO → ClickHouse
                                       ↓
                              Local aggregation
                              (1-minute windows)

Batch Processing

Historical data and corrections:

async def ingest_batch(records: list[dict]):
    """Batch insert records to ClickHouse."""
    async with client.get_connection() as conn:
        await conn.execute(
            """
            INSERT INTO orders_analytics
            FORMAT JSONEachRow
            """,
            records
        )

Data Quality

Check	Frequency	Action
Duplicate detection	Per-insert	Dedup by order_id
Schema validation	Per-batch	Reject invalid records
Latency monitoring	Continuous	Alert if > 5min lag
Completeness check	Hourly	Compare event counts

Dashboards & Visualizations

Built-in Dashboards

Dashboard	Metrics	Refresh
Sales Overview	Revenue, orders, avg ticket	1 min
Labor Dashboard	Hours, costs, productivity	5 min
Inventory Health	Stock levels, turnover	15 min
Real-time Feed	Live orders, payments	Real-time

Custom Reports

# Generate custom report
report = await client.generate_report(
    report_type="sales_by_category",
    tenant_id="tenant-123",
    parameters={
        "start_date": "2026-01-01",
        "end_date": "2026-01-31",
        "group_by": ["category", "day"],
        "include_comparisons": True
    },
    format="csv"
)

Export Formats

Format	Use Case
CSV	Spreadsheet analysis
JSON	API consumption
Parquet	Data warehouse integration
PDF	Management reports

Cost Structure

ClickHouse Cloud Pricing

Component	Cost	Notes
Compute	$0.30/GB scanned	Query-based
Storage	$0.024/GB/month	Compressed
Ingestion	Included	No streaming surcharge
Backups	$0.012/GB/month	Automatic

Estimated Monthly Costs

Tier	Data Volume	Queries/day	Estimated Cost
Starter	10 GB	1,000	$50-75/mo
Growth	100 GB	10,000	$100-150/mo
Enterprise	1 TB	100,000	$500-800/mo

Cost Optimization

Cost Savings

Edge caching alone reduces ClickHouse query volume by 80%. Combine it with materialized views and TTL-based data expiration to keep analytics costs under control as data volume grows.

Edge caching - Reduces query volume 80%
Materialized views - Pre-compute expensive aggregations
Data retention - Auto-expire old data (TTL)
Query optimization - Use appropriate granularity

Security & Compliance

Access Control

Level	Mechanism
Network	Private Link, IP allowlisting
Authentication	Service accounts, API keys
Authorization	Row-level security by tenant_id
Encryption	TLS in-transit, AES-256 at-rest

Audit Logging

-- Query audit log
SELECT
    query_id,
    user,
    query,
    query_duration_ms,
    read_rows,
    read_bytes
FROM system.query_log
WHERE event_time >= now() - INTERVAL 1 HOUR
ORDER BY event_time DESC

Data Retention

warning

TTL policies automatically expire data from hot storage. Ensure cold storage archival is configured before relying on TTL, or data older than the retention period will be permanently lost.

Data Type	Hot Storage	Cold Storage	Total
Orders	90 days	2 years	2+ years
Labor	90 days	7 years	7+ years
Inventory	30 days	1 year	1+ year

Monitoring & Alerting

Key Metrics

Metric	Threshold	Alert
Query latency p99	> 2s	Warning
Query latency p99	> 5s	Critical
Ingestion lag	> 5 min	Warning
Disk usage	> 80%	Warning
Error rate	> 1%	Critical

Health Checks

async def health_check():
    """Check ClickHouse cluster health."""
    result = await client.execute("SELECT 1")
    return {
        "status": "healthy",
        "latency_ms": result.elapsed_ms,
        "cluster_status": await get_cluster_status()
    }

Migration Guide

From BigQuery

If migrating from BigQuery:

Export data using BigQuery export to GCS
Transform schema to ClickHouse format
Load data using ClickHouse s3 table function
Validate row counts and aggregates
Switch traffic via feature flag

Schema Mapping

BigQuery Type	ClickHouse Type
STRING	String
INT64	Int64
FLOAT64	Float64
NUMERIC	Decimal(38, 9)
TIMESTAMP	DateTime64(3)
DATE	Date
BOOL	UInt8
ARRAY	Array(T)
STRUCT	Tuple(...) or Nested

Architecture Overview - System architecture
Data Sync & IoT - Real-time data flows
API Analytics - Analytics API documentation
Reports API - Reports API reference

Overview​

Architecture Components​

Edge Aggregation Layer​

ClickHouse Cloud​

Analytics REST API​

Database Schema​

Core Analytics Tables​

orders_analytics​

labor_analytics​

inventory_analytics​

Creator Analytics Tables​

engagement_analytics​

Query Patterns​

Common Dashboard Queries​

Sales Summary​

Labor Cost Analysis​

Real-time Order Count (Edge Cached)​

Performance Optimization​

Data Ingestion​

Real-time Events​

Batch Processing​

Data Quality​

Dashboards & Visualizations​

Built-in Dashboards​

Custom Reports​

Export Formats​

Cost Structure​

ClickHouse Cloud Pricing​

Estimated Monthly Costs​

Cost Optimization​

Security & Compliance​

Access Control​

Audit Logging​

Data Retention​

Monitoring & Alerting​

Key Metrics​

Health Checks​

Migration Guide​

From BigQuery​

Schema Mapping​

Related Documentation​