Bash cron monitoring without external dependencies

Cron jobs are the silent workhorses of many systems, handling everything from daily backups and log rotations to critical data synchronization and certificate renewals. When they run smoothly, you barely notice them. But when they fail silently, the consequences can range from minor annoyances to catastrophic data loss or service outages. That's why monitoring your cron jobs is non-negotiable for any robust system.

You might be thinking: "I need to monitor my cron jobs, but I want to keep things simple. Can I do it without introducing new tools or external services? Just plain old Bash and the tools already on my server?"

The answer is yes, you can implement a basic form of cron job monitoring using only Bash and standard Unix utilities. However, it comes with significant caveats and limitations, especially when it comes to detecting the most insidious failure: when a job simply doesn't run at all. This article will walk you through several dependency-free approaches, highlighting their strengths, weaknesses, and the inherent trade-offs.

Why Monitor Cron Jobs?

Before diving into the "how," let's quickly reiterate the "why." You monitor cron jobs to ensure:

  • Data Integrity: Critical data pipelines complete successfully.
  • Service Availability: Regular tasks like cache invalidation or service restarts happen as expected.
  • Security: Certificate renewal jobs run before expiration, keeping your services secure.
  • Resource Management: Cleanup scripts prevent disk space exhaustion.
  • Operational Efficiency: Catching issues early prevents larger problems and reduces manual intervention.

A cron job that fails silently is a ticking time bomb. You need to know when something goes wrong, and ideally, before it impacts your users or data.

The Core Idea: Self-Reporting Jobs

The fundamental principle behind dependency-free cron monitoring is that the job itself, or a closely related script, is responsible for reporting its status. There's no external observer actively checking in on the job; instead, the job signals its state using local files, logs, or simple network checks.

Method 1: File-Based Heartbeats

One of the simplest ways to implement a "heartbeat" is by having your cron job touch a file. A separate monitoring script then checks the modification time of this file to determine if the job is running or has recently completed.

How it works:

  1. Your cron job starts by touch-ing a "start" file.
  2. It then performs its work.
  3. Upon successful completion, it touch-es a "success" file.
  4. A separate cron job (the monitor) periodically checks the "success" file's age. If it's too old, it triggers an alert.

Example Implementation:

Let's say you have a cron job that runs every 10 minutes to process a queue, and you expect it to finish within 5 minutes.

Your job (/usr/local/bin/process_queue.sh):

#!/bin/bash

JOB_NAME="my_queue_processor"
STATE_DIR="/var/run/cron_monitor"
mkdir -p "$STATE_DIR"

TOUCH_FILE="$STATE_DIR/${JOB_NAME}.success"
LOCK_FILE="$STATE_DIR/${JOB_NAME}.lock"

# Simple lock to prevent multiple instances
if ( set -o noclobber; echo "$$" > "$LOCK_FILE") 2> /dev/null; then
    trap "rm -f '$LOCK_FILE'; exit" INT TERM EXIT

    echo "$(date): $JOB_NAME started." >> /var/log/${JOB_NAME}.log

    # Perform the actual work
    /usr/bin/php /var/www/my_app/process_queue.php >> /var/log/${JOB_NAME}.log 2>&1
    STATUS=$?

    if [ $STATUS -eq 0 ]; then
        echo "$(date): $JOB_NAME finished successfully." >> /var/log/${JOB_NAME}.log
        touch "$TOUCH_FILE" # Update success timestamp
    else
        echo "$(date): $JOB_NAME failed with status $STATUS." >> /var/log/${JOB_NAME}.log
        # Optionally, touch a separate failure file or send an immediate alert
    fi

    rm -f "$LOCK_FILE"
else
    echo "$(date): $JOB_NAME is already running. Exiting." >> /var/log/${JOB_NAME}.log
    exit 1
fi

Your monitoring cron job (/etc/cron.d/cron_monitor):

# Run every 5 minutes to check jobs
*/5 * * * * root /usr/local/bin/check_cron_heartbeats.sh

Your monitoring script (/usr/local/bin/check_cron_heartbeats.sh):

```bash

!/bin/bash

STATE_DIR="/var/run/cron_monitor" ALERT_EMAIL="admin@example.com" ALERT_THRESHOLD_MINUTES=15 # If a job hasn't touched its file in this many minutes

find "$STATE_DIR" -maxdepth 1 -name "*.success" -print0 | while IFS= read -r -d $'\0' file; do JOB_NAME=$(basename "$file" .success) LAST_TOUCH=$(stat -c %Y "$file") CURRENT_TIME=$(date +%s) AGE_SECONDS=$((CURRENT