System Status

L7AI Content Intelligence Pipeline — live infrastructure monitoring

Checking...

3-Layer Monitoring Architecture

No single point of failure in alerting. Each layer monitors the layer above it.

LAYER 1

n8n Health Monitor

Every 5 min

n8n workflow calls /api/health/monitor every 5 minutes. If any service is down → immediate Telegram alert with severity level and response times.

↓ But what if n8n is down? ↓

LAYER 2

Coolify Container Health

Always on

Coolify monitors all Docker containers on both servers. If n8n's container crashes → Coolify auto-restarts it and logs the event. Container health checks ensure crashed services recover without manual intervention.

↓ But what if the whole server is down? ↓

LAYER 3

External Uptime Monitor

External

External service (UptimeRobot) pings leonelulloa.com/api/health from outside. If the entire server is unreachable → email + push notification. This catches scenarios where everything on the server has failed.

🤖

Telegram Bot — Interactive Commands

Two-way communication: the bot alerts me when something breaks, and I can ask it for status anytime from my phone.

/status→ Check all services now, get response times

/help→ List available commands

status→ Also works without the slash — fuzzy matching

Incident Response Protocol

Detection

Telegram Bot alerts on workflow failures
n8n execution logs show failed runs with error context
This health check page (auto-refreshes every 30s)
Coolify dashboard shows container health

Diagnosis

SSH into affected server
docker ps — check container states
docker logs <container> — read error output
Check Coolify dashboard for resource usage
Review n8n execution history for failure chain

Resolution

Restart failed container via Coolify or Docker CLI
Fix configuration and redeploy via Coolify
Scale resources if capacity issue
Rollback to previous version if code regression
Manual workflow re-run in n8n for data recovery

Prevention

Document root cause in incident log
Add monitoring for the failure pattern
Update n8n workflow with better error handling
Add retry logic or fallback where missing
Update this protocol if new pattern discovered

🔧 Real Incident: Hetzner IPv4 CDN Blocking

Hetzner servers were blocked by CDN providers that require IPv4. Instead of migrating infrastructure, I built Cloudflare Worker proxies to route requests through Cloudflare's network, and configured Docker IPv6 bridges for container-to-container communication. Total fix time: 4 hours. Cost: $0/month (Cloudflare Workers free tier). Result: all services restored with better resilience than before — traffic now routes through Cloudflare's global network instead of direct server connections.

All checks are read-only. No sensitive data (IPs, keys, credentials) is exposed through this page.