Skip to main content

Analytics Platform (ClickHouse Cloud)

Real-time analytics infrastructure powered by ClickHouse Cloud with edge aggregation for sub-second query performance across all Olympus products.

Overview

The analytics platform provides:

  • Sub-second queries for real-time dashboards
  • High concurrency (1000+ queries/second per node)
  • Edge aggregation via Cloudflare Durable Objects
  • Cost efficiency (70%+ savings vs BigQuery)
  • Unified analytics across all products
┌─────────────────────────────────────────────────────────────────┐
│ ANALYTICS ARCHITECTURE │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Edge Layer │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ Analytics │ │ Analytics │ │ Analytics │ │ │
│ │ │ Aggregator │ │ Aggregator │ │ Aggregator │ │ │
│ │ │ (DO) │ │ (DO) │ │ (DO) │ │ │
│ │ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │ │
│ │ │ │ │ │ │
│ │ └────────────────┼────────────────┘ │ │
│ │ │ │ │
│ └──────────────────────────┼───────────────────────────────┘ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ ClickHouse Cloud │ │
│ │ ┌─────────────────────────────────────────────────┐ │ │
│ │ │ Core Analytics │ Creator Analytics │ │ │
│ │ │ - Orders │ - Engagement │ │ │
│ │ │ - Payments │ - Revenue │ │ │
│ │ │ - Inventory │ - Audience │ │ │
│ │ │ - Labor │ - Social │ │ │
│ │ └─────────────────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘

Architecture Components

Edge Aggregation Layer

Cloudflare Durable Objects provide real-time pre-aggregation:

ComponentFunctionLatency
Analytics Aggregator DOPre-aggregates metrics by tenantunder 10ms
Dashboard CacheCaches common dashboard queriesunder 5ms
Real-time CounterLive order/revenue countersunder 1ms

Edge aggregation benefits:

  • Reduces ClickHouse query volume by 80%
  • Enables sub-100ms dashboard refresh
  • Handles burst traffic (peak hours)
  • Geographic distribution for global users

ClickHouse Cloud

OLAP database optimized for analytics workloads:

FeatureSpecification
Query PerformanceSub-second for most queries
Concurrency1000+ concurrent queries/node
StorageColumnar compression (10-20x)
Retention90 days hot, 2 years cold
Replication3x replication for durability

Analytics REST API

Query analytics data via the REST API:

# Get sales summary
curl -s "https://dev.api.olympuscloud.ai/v1/analytics/sales/summary?start_date=2026-01-01&end_date=2026-01-31&granularity=day" \
-H "Authorization: Bearer $TOKEN" | jq .

# Get revenue analytics
curl -s "https://dev.api.olympuscloud.ai/v1/analytics/revenue?period=monthly" \
-H "Authorization: Bearer $TOKEN" | jq .

The Python analytics service uses FastAPI route modules internally to serve these endpoints. ClickHouse queries are executed server-side with automatic caching.


Database Schema

Core Analytics Tables

orders_analytics

CREATE TABLE orders_analytics (
tenant_id String,
location_id String,
order_id String,
order_date DateTime,
order_type Enum('dine_in', 'takeout', 'delivery', 'drive_thru'),
subtotal Decimal(10, 2),
tax Decimal(10, 2),
tip Decimal(10, 2),
total Decimal(10, 2),
item_count UInt16,
guest_count UInt8,
payment_method String,
employee_id String,
-- Materialized columns for fast aggregation
order_hour UInt8 MATERIALIZED toHour(order_date),
order_day Date MATERIALIZED toDate(order_date),
order_week Date MATERIALIZED toStartOfWeek(order_date)
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(order_date)
ORDER BY (tenant_id, location_id, order_date, order_id)
TTL order_date + INTERVAL 90 DAY

labor_analytics

CREATE TABLE labor_analytics (
tenant_id String,
location_id String,
employee_id String,
shift_date Date,
clock_in DateTime,
clock_out Nullable(DateTime),
scheduled_hours Decimal(5, 2),
actual_hours Decimal(5, 2),
overtime_hours Decimal(5, 2),
hourly_rate Decimal(8, 2),
labor_cost Decimal(10, 2),
role String,
department String
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(shift_date)
ORDER BY (tenant_id, location_id, shift_date, employee_id)

inventory_analytics

CREATE TABLE inventory_analytics (
tenant_id String,
location_id String,
item_id String,
recorded_at DateTime,
quantity_on_hand Decimal(10, 3),
quantity_reserved Decimal(10, 3),
unit_cost Decimal(10, 4),
total_value Decimal(12, 2),
reorder_point Decimal(10, 3),
days_of_stock Decimal(5, 1)
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(recorded_at)
ORDER BY (tenant_id, location_id, recorded_at, item_id)

Creator Analytics Tables

engagement_analytics

CREATE TABLE engagement_analytics (
tenant_id String,
creator_id String,
persona_id String,
interaction_time DateTime,
interaction_type Enum('chat', 'voice', 'video', 'tip', 'subscription'),
duration_seconds UInt32,
message_count UInt16,
sentiment_score Float32,
revenue Decimal(10, 2),
platform String,
country_code String
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(interaction_time)
ORDER BY (tenant_id, creator_id, interaction_time)

Query Patterns

Common Dashboard Queries

Sales Summary

SELECT
toStartOfDay(order_date) AS date,
count() AS order_count,
sum(total) AS total_sales,
avg(total) AS avg_ticket,
sum(tip) AS total_tips
FROM orders_analytics
WHERE tenant_id = {tenant_id:String}
AND order_date >= {start_date:DateTime}
AND order_date < {end_date:DateTime}
GROUP BY date
ORDER BY date

Labor Cost Analysis

SELECT
department,
sum(labor_cost) AS total_labor_cost,
sum(actual_hours) AS total_hours,
avg(hourly_rate) AS avg_hourly_rate,
sum(overtime_hours) AS overtime_hours
FROM labor_analytics
WHERE tenant_id = {tenant_id:String}
AND shift_date >= {start_date:Date}
AND shift_date <= {end_date:Date}
GROUP BY department

Real-time Order Count (Edge Cached)

SELECT
count() AS orders_today,
sum(total) AS sales_today
FROM orders_analytics
WHERE tenant_id = {tenant_id:String}
AND order_date >= today()

Performance Optimization

OptimizationTechnique
PartitioningMonthly partitions for time-range queries
Orderingtenant_id first for multi-tenant isolation
Materialized viewsPre-aggregated hourly/daily summaries
ProjectionsAlternative sort orders for specific queries
TTLAutomatic data expiration (90 days hot)

Data Ingestion

Real-time Events

Events flow through edge aggregators:

App Events → Cloudflare Worker → Analytics Aggregator DO → ClickHouse

Local aggregation
(1-minute windows)

Batch Processing

Historical data and corrections:

async def ingest_batch(records: list[dict]):
"""Batch insert records to ClickHouse."""
async with client.get_connection() as conn:
await conn.execute(
"""
INSERT INTO orders_analytics
FORMAT JSONEachRow
""",
records
)

Data Quality

CheckFrequencyAction
Duplicate detectionPer-insertDedup by order_id
Schema validationPer-batchReject invalid records
Latency monitoringContinuousAlert if > 5min lag
Completeness checkHourlyCompare event counts

Dashboards & Visualizations

Built-in Dashboards

DashboardMetricsRefresh
Sales OverviewRevenue, orders, avg ticket1 min
Labor DashboardHours, costs, productivity5 min
Inventory HealthStock levels, turnover15 min
Real-time FeedLive orders, paymentsReal-time

Custom Reports

# Generate custom report
report = await client.generate_report(
report_type="sales_by_category",
tenant_id="tenant-123",
parameters={
"start_date": "2026-01-01",
"end_date": "2026-01-31",
"group_by": ["category", "day"],
"include_comparisons": True
},
format="csv"
)

Export Formats

FormatUse Case
CSVSpreadsheet analysis
JSONAPI consumption
ParquetData warehouse integration
PDFManagement reports

Cost Structure

ClickHouse Cloud Pricing

ComponentCostNotes
Compute$0.30/GB scannedQuery-based
Storage$0.024/GB/monthCompressed
IngestionIncludedNo streaming surcharge
Backups$0.012/GB/monthAutomatic

Estimated Monthly Costs

TierData VolumeQueries/dayEstimated Cost
Starter10 GB1,000$50-75/mo
Growth100 GB10,000$100-150/mo
Enterprise1 TB100,000$500-800/mo

Cost Optimization

Cost Savings

Edge caching alone reduces ClickHouse query volume by 80%. Combine it with materialized views and TTL-based data expiration to keep analytics costs under control as data volume grows.

  1. Edge caching - Reduces query volume 80%
  2. Materialized views - Pre-compute expensive aggregations
  3. Data retention - Auto-expire old data (TTL)
  4. Query optimization - Use appropriate granularity

Security & Compliance

Access Control

LevelMechanism
NetworkPrivate Link, IP allowlisting
AuthenticationService accounts, API keys
AuthorizationRow-level security by tenant_id
EncryptionTLS in-transit, AES-256 at-rest

Audit Logging

-- Query audit log
SELECT
query_id,
user,
query,
query_duration_ms,
read_rows,
read_bytes
FROM system.query_log
WHERE event_time >= now() - INTERVAL 1 HOUR
ORDER BY event_time DESC

Data Retention

warning

TTL policies automatically expire data from hot storage. Ensure cold storage archival is configured before relying on TTL, or data older than the retention period will be permanently lost.

Data TypeHot StorageCold StorageTotal
Orders90 days2 years2+ years
Labor90 days7 years7+ years
Inventory30 days1 year1+ year

Monitoring & Alerting

Key Metrics

MetricThresholdAlert
Query latency p99> 2sWarning
Query latency p99> 5sCritical
Ingestion lag> 5 minWarning
Disk usage> 80%Warning
Error rate> 1%Critical

Health Checks

async def health_check():
"""Check ClickHouse cluster health."""
result = await client.execute("SELECT 1")
return {
"status": "healthy",
"latency_ms": result.elapsed_ms,
"cluster_status": await get_cluster_status()
}

Migration Guide

From BigQuery

If migrating from BigQuery:

  1. Export data using BigQuery export to GCS
  2. Transform schema to ClickHouse format
  3. Load data using ClickHouse s3 table function
  4. Validate row counts and aggregates
  5. Switch traffic via feature flag

Schema Mapping

BigQuery TypeClickHouse Type
STRINGString
INT64Int64
FLOAT64Float64
NUMERICDecimal(38, 9)
TIMESTAMPDateTime64(3)
DATEDate
BOOLUInt8
ARRAYArray(T)
STRUCTTuple(...) or Nested