GoTech Demo
02

Distributed Monitoring

Live RED metrics across 5 microservices. Anomaly injected at ~8s to demonstrate alert escalation.

IDLE
Scale100K
DAU
60K
Peak QPS
30K
WS Conns
10K
Data/Year
10 TB
Go Instances
6
Service Health (RED Metrics)
ServiceRateError %p50p95p99CPUMemory
Distributed Trace (POST /api/pages)
Error Budget (99.9% SLO)
Monthly allowance: 43.2 minutes of downtime. When budget runs out, freeze deployments and focus on reliability.
Active Alerts
No active alerts. System healthy.

Observability Stack

RED Method

Rate (requests/s), Errors (%), Duration (latency). The three signals that tell you if a service is healthy. Every API endpoint exposes these via Prometheus.

Distributed Tracing

OpenTelemetry propagates trace_id across all services. One request = one trace with spans from every service it touches. Find the bottleneck in seconds.

Error Budget

SLO 99.9% = 43 min/month downtime budget. When budget is consumed, freeze features and fix reliability. This prevents the "move fast break things" trap.