Troubleshooting playbook

Troubleshooting decision paths for Nextbase platform incidents

Structured runbook guiding on-call engineers through common outage scenarios with explicit escalation gates.

Owner: Platform Reliability Guild Last reviewed July 28, 2025 #incidents-auth

Select a decision path to jump to the relevant troubleshooting branch.

Step 1

Verify SLO breach

Confirm authentication requests exceed error budget thresholds.

If error rate > 10% for 3 minutes, initiate incident response.

Query failed auth attempts

kubectl logs deploy/auth-proxy --tail=200 | grep -i "status=500"

Check for rate limiter failures or upstream timeouts.

Step 2

Invalidate stale cache

Flush rogue tokens if edge cache desync caused auth drift.

Purge CDN auth cache

destructive

curl -X POST https://api.nextbase.com/admin/cache/purge -H "Authorization: Bearer $TOKEN" -d '{"surface":"auth"}'

Expect 2-3 minute warmup after purge.

Authentication service outage