← Back to Portfolio

System Status

L7AI Content Intelligence Pipeline — live infrastructure monitoring

Checking...

3-Layer Monitoring Architecture

No single point of failure in alerting. Each layer monitors the layer above it.

LAYER 1

n8n Health Monitor

Every 5 min

n8n workflow calls /api/health/monitor every 5 minutes. If any service is down → immediate Telegram alert with severity level and response times.

↓ But what if n8n is down? ↓
LAYER 2

Coolify Container Health

Always on

Coolify monitors all Docker containers on both servers. If n8n's container crashes → Coolify auto-restarts it and logs the event. Container health checks ensure crashed services recover without manual intervention.

↓ But what if the whole server is down? ↓
LAYER 3

External Uptime Monitor

External

External service (UptimeRobot) pings leonelulloa.com/api/health from outside. If the entire server is unreachable → email + push notification. This catches scenarios where everything on the server has failed.

🤖

Telegram Bot — Interactive Commands

Two-way communication: the bot alerts me when something breaks, and I can ask it for status anytime from my phone.

/status→ Check all services now, get response times
/help→ List available commands
status→ Also works without the slash — fuzzy matching

Incident Response Protocol

01

Detection

  • Telegram Bot alerts on workflow failures
  • n8n execution logs show failed runs with error context
  • This health check page (auto-refreshes every 30s)
  • Coolify dashboard shows container health
02

Diagnosis

  • SSH into affected server
  • docker ps — check container states
  • docker logs <container> — read error output
  • Check Coolify dashboard for resource usage
  • Review n8n execution history for failure chain
03

Resolution

  • Restart failed container via Coolify or Docker CLI
  • Fix configuration and redeploy via Coolify
  • Scale resources if capacity issue
  • Rollback to previous version if code regression
  • Manual workflow re-run in n8n for data recovery
04

Prevention

  • Document root cause in incident log
  • Add monitoring for the failure pattern
  • Update n8n workflow with better error handling
  • Add retry logic or fallback where missing
  • Update this protocol if new pattern discovered

🔧 Real Incident: Hetzner IPv4 CDN Blocking

Hetzner servers were blocked by CDN providers that require IPv4. Instead of migrating infrastructure, I built Cloudflare Worker proxies to route requests through Cloudflare's network, and configured Docker IPv6 bridges for container-to-container communication. Total fix time: 4 hours. Cost: $0/month (Cloudflare Workers free tier). Result: all services restored with better resilience than before — traffic now routes through Cloudflare's global network instead of direct server connections.

All checks are read-only. No sensitive data (IPs, keys, credentials) is exposed through this page.