PagerDuty Integration for Missed Cron Heartbeat

In the world of backend systems, cron jobs are the silent workhorses. They handle everything from daily data synchronizations and database backups to report generation and cache invalidation. When these jobs run reliably, all is well. But what happens when they don't? Often, nothing immediately obvious. A daily backup job might fail silently for days, only for you to discover the issue when you actually need to restore data. A critical data import might stop running, leading to stale information and poor business decisions.

Traditional monitoring often focuses on what's up – is the server running? Is the database reachable? But cron jobs represent a different challenge: verifying that something ran successfully and on schedule. This is where heartbeat monitoring shines, and when combined with a robust on-call system like PagerDuty, it transforms silent failures into actionable alerts.

Why Monitor Cron Jobs (and Why PagerDuty)?

Imagine a scenario where your nightly data aggregation script fails. If you're only looking at logs, you might eventually spot an error message – if you're diligent about log review. But what if the script didn't even start? What if the cron daemon itself crashed? In these cases, there's no error log entry from the script because it never executed. This is a "failure to run" scenario, and it's notoriously difficult to detect with traditional log-based monitoring.

Heartbeat monitoring flips the script: instead of looking for errors, you confirm success. Your job "pings" a unique URL when it completes successfully. If that ping doesn't arrive within the expected timeframe, it means something went wrong.

For critical cron jobs, a missed heartbeat isn't just an informational alert – it's an incident. This is precisely why PagerDuty integration is crucial. PagerDuty isn't just another notification system; it's designed for incident management. It ensures that the right person (or team) is notified, escalates if necessary, and tracks the incident through to resolution. For an engineer on call, receiving a PagerDuty alert means "drop what you're doing, this needs attention."

The Heartfly Approach to Heartbeats

Heartfly provides simple, reliable cron and scheduled job monitoring. The core concept is straightforward:

  1. You create a monitor in Heartfly for a specific job, defining its expected schedule (e.g., "every 5 minutes," "daily at 03:00 UTC").
  2. Heartfly provides a unique heartbeat URL for that monitor.
  3. You integrate this URL into your cron job. The job, upon successful completion, makes a simple HTTP GET request to this URL.
  4. Heartfly expects to receive this ping within the configured interval. If