Analytics Platform (ClickHouse Cloud)
Real-time analytics infrastructure powered by ClickHouse Cloud with edge aggregation for sub-second query performance across all Olympus products.
Overview
The analytics platform provides:
- Sub-second queries for real-time dashboards
- High concurrency (1000+ queries/second per node)
- Edge aggregation via Cloudflare Durable Objects
- Cost efficiency (70%+ savings vs BigQuery)
- Unified analytics across all products
┌─────────────────────────────────────────────────────────────────┐
│ ANALYTICS ARCHITECTURE │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Edge Layer │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ Analytics │ │ Analytics │ │ Analytics │ │ │
│ │ │ Aggregator │ │ Aggregator │ │ Aggregator │ │ │
│ │ │ (DO) │ │ (DO) │ │ (DO) │ │ │
│ │ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │ │
│ │ │ │ │ │ │
│ │ └────────────────┼────────────────┘ │ │
│ │ │ │ │
│ └──────────────────────────┼───────────────────────────────┘ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ ClickHouse Cloud │ │
│ │ ┌─────────────────────────────────────────────────┐ │ │
│ │ │ Core Analytics │ Creator Analytics │ │ │
│ │ │ - Orders │ - Engagement │ │ │
│ │ │ - Payments │ - Revenue │ │ │
│ │ │ - Inventory │ - Audience │ │ │
│ │ │ - Labor │ - Social │ │ │
│ │ └─────────────────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
Architecture Components
Edge Aggregation Layer
Cloudflare Durable Objects provide real-time pre-aggregation:
| Component | Function | Latency |
|---|---|---|
| Analytics Aggregator DO | Pre-aggregates metrics by tenant | under 10ms |
| Dashboard Cache | Caches common dashboard queries | under 5ms |
| Real-time Counter | Live order/revenue counters | under 1ms |
Edge aggregation benefits:
- Reduces ClickHouse query volume by 80%
- Enables sub-100ms dashboard refresh
- Handles burst traffic (peak hours)
- Geographic distribution for global users
ClickHouse Cloud
OLAP database optimized for analytics workloads:
| Feature | Specification |
|---|---|
| Query Performance | Sub-second for most queries |
| Concurrency | 1000+ concurrent queries/node |
| Storage | Columnar compression (10-20x) |
| Retention | 90 days hot, 2 years cold |
| Replication | 3x replication for durability |
Analytics REST API
Query analytics data via the REST API:
# Get sales summary
curl -s "https://dev.api.olympuscloud.ai/v1/analytics/sales/summary?start_date=2026-01-01&end_date=2026-01-31&granularity=day" \
-H "Authorization: Bearer $TOKEN" | jq .
# Get revenue analytics
curl -s "https://dev.api.olympuscloud.ai/v1/analytics/revenue?period=monthly" \
-H "Authorization: Bearer $TOKEN" | jq .
The Python analytics service uses FastAPI route modules internally to serve these endpoints. ClickHouse queries are executed server-side with automatic caching.
Database Schema
Core Analytics Tables
orders_analytics
CREATE TABLE orders_analytics (
tenant_id String,
location_id String,
order_id String,
order_date DateTime,
order_type Enum('dine_in', 'takeout', 'delivery', 'drive_thru'),
subtotal Decimal(10, 2),
tax Decimal(10, 2),
tip Decimal(10, 2),
total Decimal(10, 2),
item_count UInt16,
guest_count UInt8,
payment_method String,
employee_id String,
-- Materialized columns for fast aggregation
order_hour UInt8 MATERIALIZED toHour(order_date),
order_day Date MATERIALIZED toDate(order_date),
order_week Date MATERIALIZED toStartOfWeek(order_date)
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(order_date)
ORDER BY (tenant_id, location_id, order_date, order_id)
TTL order_date + INTERVAL 90 DAY
labor_analytics
CREATE TABLE labor_analytics (
tenant_id String,
location_id String,
employee_id String,
shift_date Date,
clock_in DateTime,
clock_out Nullable(DateTime),
scheduled_hours Decimal(5, 2),
actual_hours Decimal(5, 2),
overtime_hours Decimal(5, 2),
hourly_rate Decimal(8, 2),
labor_cost Decimal(10, 2),
role String,
department String
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(shift_date)
ORDER BY (tenant_id, location_id, shift_date, employee_id)
inventory_analytics
CREATE TABLE inventory_analytics (
tenant_id String,
location_id String,
item_id String,
recorded_at DateTime,
quantity_on_hand Decimal(10, 3),
quantity_reserved Decimal(10, 3),
unit_cost Decimal(10, 4),
total_value Decimal(12, 2),
reorder_point Decimal(10, 3),
days_of_stock Decimal(5, 1)
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(recorded_at)
ORDER BY (tenant_id, location_id, recorded_at, item_id)
Creator Analytics Tables
engagement_analytics
CREATE TABLE engagement_analytics (
tenant_id String,
creator_id String,
persona_id String,
interaction_time DateTime,
interaction_type Enum('chat', 'voice', 'video', 'tip', 'subscription'),
duration_seconds UInt32,
message_count UInt16,
sentiment_score Float32,
revenue Decimal(10, 2),
platform String,
country_code String
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(interaction_time)
ORDER BY (tenant_id, creator_id, interaction_time)
Query Patterns
Common Dashboard Queries
Sales Summary
SELECT
toStartOfDay(order_date) AS date,
count() AS order_count,
sum(total) AS total_sales,
avg(total) AS avg_ticket,
sum(tip) AS total_tips
FROM orders_analytics
WHERE tenant_id = {tenant_id:String}
AND order_date >= {start_date:DateTime}
AND order_date < {end_date:DateTime}
GROUP BY date
ORDER BY date
Labor Cost Analysis
SELECT
department,
sum(labor_cost) AS total_labor_cost,
sum(actual_hours) AS total_hours,
avg(hourly_rate) AS avg_hourly_rate,
sum(overtime_hours) AS overtime_hours
FROM labor_analytics
WHERE tenant_id = {tenant_id:String}
AND shift_date >= {start_date:Date}
AND shift_date <= {end_date:Date}
GROUP BY department
Real-time Order Count (Edge Cached)
SELECT
count() AS orders_today,
sum(total) AS sales_today
FROM orders_analytics
WHERE tenant_id = {tenant_id:String}
AND order_date >= today()
Performance Optimization
| Optimization | Technique |
|---|---|
| Partitioning | Monthly partitions for time-range queries |
| Ordering | tenant_id first for multi-tenant isolation |
| Materialized views | Pre-aggregated hourly/daily summaries |
| Projections | Alternative sort orders for specific queries |
| TTL | Automatic data expiration (90 days hot) |
Data Ingestion
Real-time Events
Events flow through edge aggregators:
App Events → Cloudflare Worker → Analytics Aggregator DO → ClickHouse
↓
Local aggregation
(1-minute windows)
Batch Processing
Historical data and corrections:
async def ingest_batch(records: list[dict]):
"""Batch insert records to ClickHouse."""
async with client.get_connection() as conn:
await conn.execute(
"""
INSERT INTO orders_analytics
FORMAT JSONEachRow
""",
records
)
Data Quality
| Check | Frequency | Action |
|---|---|---|
| Duplicate detection | Per-insert | Dedup by order_id |
| Schema validation | Per-batch | Reject invalid records |
| Latency monitoring | Continuous | Alert if > 5min lag |
| Completeness check | Hourly | Compare event counts |
Dashboards & Visualizations
Built-in Dashboards
| Dashboard | Metrics | Refresh |
|---|---|---|
| Sales Overview | Revenue, orders, avg ticket | 1 min |
| Labor Dashboard | Hours, costs, productivity | 5 min |
| Inventory Health | Stock levels, turnover | 15 min |
| Real-time Feed | Live orders, payments | Real-time |
Custom Reports
# Generate custom report
report = await client.generate_report(
report_type="sales_by_category",
tenant_id="tenant-123",
parameters={
"start_date": "2026-01-01",
"end_date": "2026-01-31",
"group_by": ["category", "day"],
"include_comparisons": True
},
format="csv"
)
Export Formats
| Format | Use Case |
|---|---|
| CSV | Spreadsheet analysis |
| JSON | API consumption |
| Parquet | Data warehouse integration |
| Management reports |
Cost Structure
ClickHouse Cloud Pricing
| Component | Cost | Notes |
|---|---|---|
| Compute | $0.30/GB scanned | Query-based |
| Storage | $0.024/GB/month | Compressed |
| Ingestion | Included | No streaming surcharge |
| Backups | $0.012/GB/month | Automatic |
Estimated Monthly Costs
| Tier | Data Volume | Queries/day | Estimated Cost |
|---|---|---|---|
| Starter | 10 GB | 1,000 | $50-75/mo |
| Growth | 100 GB | 10,000 | $100-150/mo |
| Enterprise | 1 TB | 100,000 | $500-800/mo |
Cost Optimization
Edge caching alone reduces ClickHouse query volume by 80%. Combine it with materialized views and TTL-based data expiration to keep analytics costs under control as data volume grows.
- Edge caching - Reduces query volume 80%
- Materialized views - Pre-compute expensive aggregations
- Data retention - Auto-expire old data (TTL)
- Query optimization - Use appropriate granularity
Security & Compliance
Access Control
| Level | Mechanism |
|---|---|
| Network | Private Link, IP allowlisting |
| Authentication | Service accounts, API keys |
| Authorization | Row-level security by tenant_id |
| Encryption | TLS in-transit, AES-256 at-rest |
Audit Logging
-- Query audit log
SELECT
query_id,
user,
query,
query_duration_ms,
read_rows,
read_bytes
FROM system.query_log
WHERE event_time >= now() - INTERVAL 1 HOUR
ORDER BY event_time DESC
Data Retention
TTL policies automatically expire data from hot storage. Ensure cold storage archival is configured before relying on TTL, or data older than the retention period will be permanently lost.
| Data Type | Hot Storage | Cold Storage | Total |
|---|---|---|---|
| Orders | 90 days | 2 years | 2+ years |
| Labor | 90 days | 7 years | 7+ years |
| Inventory | 30 days | 1 year | 1+ year |
Monitoring & Alerting
Key Metrics
| Metric | Threshold | Alert |
|---|---|---|
| Query latency p99 | > 2s | Warning |
| Query latency p99 | > 5s | Critical |
| Ingestion lag | > 5 min | Warning |
| Disk usage | > 80% | Warning |
| Error rate | > 1% | Critical |
Health Checks
async def health_check():
"""Check ClickHouse cluster health."""
result = await client.execute("SELECT 1")
return {
"status": "healthy",
"latency_ms": result.elapsed_ms,
"cluster_status": await get_cluster_status()
}
Migration Guide
From BigQuery
If migrating from BigQuery:
- Export data using BigQuery export to GCS
- Transform schema to ClickHouse format
- Load data using ClickHouse
s3table function - Validate row counts and aggregates
- Switch traffic via feature flag
Schema Mapping
| BigQuery Type | ClickHouse Type |
|---|---|
| STRING | String |
| INT64 | Int64 |
| FLOAT64 | Float64 |
| NUMERIC | Decimal(38, 9) |
| TIMESTAMP | DateTime64(3) |
| DATE | Date |
| BOOL | UInt8 |
| ARRAY | Array(T) |
| STRUCT | Tuple(...) or Nested |
Related Documentation
- Architecture Overview - System architecture
- Data Sync & IoT - Real-time data flows
- API Analytics - Analytics API documentation
- Reports API - Reports API reference