Restic Cron Monitoring with Curl

Restic is an exceptional backup program, lauded for its efficiency, deduplication, encryption, and ease of use. It's a fantastic tool for ensuring your critical data is safe. However, the best backup tool in the world is useless if your backups aren't actually running, or worse, are silently failing. You've automated your Restic jobs with cron, which is great – but automation doesn't mean "set it and forget it."

The cold, hard truth is that automated jobs fail. Disks fill up, network connections drop, credentials expire, permissions change, or the Restic process itself might hang. When a backup job fails silently, you're left with a false sense of security until the day you desperately need to restore data, only to find your backups are weeks or months out of date.

This article will guide you through a practical, engineer-centric approach to monitoring your Restic cron jobs using simple curl commands and a heartbeat monitoring service. We'll show you how to get alerted when your Restic jobs don't run, or don't complete successfully, turning silent failures into actionable alerts.

Why Monitor Restic Backups?

You've chosen Restic because it's robust and reliable. But that reliability applies to the tool itself, not necessarily the environment it runs in. Here's why you absolutely need to monitor your Restic cron jobs:

  • Silent Failures: This is the biggest threat. A cron job simply doesn't run, or runs but exits with an error code, and you're none the wiser. No email, no log entry (that you check regularly), just a gaping hole in your backup strategy.
  • Resource Exhaustion: Source disk full, destination repository full, insufficient memory for Restic to operate, or network bandwidth saturation can all prevent a successful backup.
  • Connectivity Issues: The server hosting your Restic job might lose network access to the remote repository (S3, Backblaze B2, SFTP, etc.).
  • Authentication Problems: API keys rotate, passwords change, or IAM policies are updated, leading to Restic being unable to authenticate with the backend.
  • Permissions: File system permissions might change, preventing Restic from reading necessary files or writing temporary data.
  • Cron Daemon Issues: The cron daemon itself might fail, or your specific cron job might be misconfigured and never even execute.

In any of these scenarios, without active monitoring, you're flying blind. The goal isn't just to run backups, but to know they're running successfully.

The Heartbeat Monitoring Concept

Heartbeat monitoring is an "inverse" monitoring approach perfectly suited for scheduled tasks like cron jobs. Instead of actively checking a service's status, the service (or your script) reports its status to a monitoring service.

Here's how it works:

  1. You configure a unique "heartbeat URL" for each job you want to monitor in your monitoring service (like Heartfly).
  2. You tell the monitoring service the expected frequency of this job (e.g., "every 24 hours").
  3. Your cron job, when it runs, sends a simple HTTP request (a "ping" or "heartbeat") to that unique URL.
  4. If the monitoring service doesn't receive a ping within the expected timeframe (plus a grace period), it assumes the job failed to run or complete, and triggers an alert (Slack, Discord, email, PagerDuty, etc.).

This method is lightweight, highly effective, and alerts you precisely when a scheduled task doesn't happen as expected.

Integrating Heartbeats with Restic and Cron

To integrate heartbeat monitoring, you'll modify your Restic backup script to include curl commands that hit your monitoring service's heartbeat URLs.

There are a few strategies, each offering different levels of granularity:

  • Simple Success Ping: The script pings the heartbeat URL only after Restic completes successfully. If the ping isn't received, it means the job failed or didn't run.
  • Start and End Pings: The script pings at the beginning of execution, and again at the end (on success). This helps differentiate between a job that didn't start at all and one that started but failed midway.
  • Success and Failure Pings: The script pings a "success" URL on completion or a "failure" URL if Restic exits with an error. This provides immediate, specific feedback on job outcomes.

For Restic, which can be a long-running process, a combination of start and end pings is often ideal.

Real-World Example 1: Basic Success Monitoring

Let's start with a straightforward example: a daily Restic backup of /var/www to a remote S3 bucket. We want to be alerted if this job doesn't complete successfully within 25 hours.

First, you'd set up a new monitor in Heartfly, giving it a name like "Daily Restic /var/www Backup" and setting its expected interval to "Daily" or "Every 24 hours". Heartfly will provide you with a unique heartbeat URL (e.g., https://cron2.91-99-176-101.nip.io/api/v1/heartbeat/YOUR_UNIQUE_ID).

Next, create a shell script, say /usr/local/bin/backup_www.sh:

```bash

!/bin/bash

set -euo pipefail

--- Configuration ---

RESTIC_REPO="s3:s3.amazonaws.com/my-restic-bucket" RESTIC_PASSWORD_FILE="/etc/restic/password.txt" BACKUP_PATH="/var/www" HEARTBEAT_URL="https://cron2.91-99-176-101.nip.io/api/v1/heartbeat/YOUR_UNIQUE_ID" # Replace with your actual URL

--- Logging ---

LOG_FILE="/var/log/restic_www_backup.log" exec > >(tee -a "$LOG_FILE") 2>&1 echo "--- Starting Restic Backup for $BACKUP_PATH at $(date) ---"

--- Restic Backup ---

Use 'trap' to ensure heartbeat is sent even if restic is interrupted

trap 'echo "Restic backup failed or interrupted at $(date)"; curl -fsS --retry 3 --retry-delay 5 "$HEARTBEAT_URL/fail" || true' ERR

You might want a "start" ping here for very long jobs,

but for basic monitoring, an end-of-script ping is often sufficient.

curl -fsS --retry 3 --retry-delay 5 "$HEARTBEAT_URL/start" || true

restic backup \ --repo "$RESTIC_REPO" \ --password-file "$RESTIC_PASSWORD_FILE