Borg Backup Monitoring with a Heartbeat URL
BorgBackup is an excellent, deduplicating backup program that's a staple for many engineers managing Linux servers and workstations. Its efficiency, encryption capabilities, and ease of use make it a powerful tool for safeguarding your data. However, as with any critical system, the biggest challenge isn't setting it up, but ensuring it's actually running successfully, consistently. This is where monitoring comes in, and for scheduled jobs like Borg backups, a heartbeat URL provides a robust and straightforward solution.
The Problem with Silent Backup Failures
You've set up your cron job: 0 3 * * * /usr/local/bin/backup_script.sh. It runs every night at 3 AM. You're confident. Days turn into weeks, weeks into months. Then, disaster strikes. You need to restore a file, only to discover your last successful backup was three months ago. The cron job was running, but something inside the script was failing silently. Maybe the disk filled up, the network share disappeared, or Borg itself encountered an error.
The "no news is good news" approach simply doesn't cut it for backups. Here's why:
- Log Overload: While you might log
stdoutandstderrto files, actively reviewing these logs daily for every single backup job quickly becomes impractical. Log rotation might even delete critical error messages before you see them. - Human Error: Relying on manual checks introduces human error and inconsistency. Someone forgets to check, or misses a subtle error message amidst pages of successful output.
- Partial Successes: A script might run to completion, but a critical step within it (like
borg create) might fail, while subsequent steps (likeborg prune) might still execute or simply exit cleanly, masking the initial failure. - Systemic Issues: The backup machine itself might go offline, preventing the
cronjob from even starting. In this scenario, there are no logs to check.
What you need is active confirmation that the job not only attempted to run but succeeded according to its definition. This is precisely what a heartbeat URL helps you achieve.
How Heartbeat URLs Work for Monitoring
A heartbeat URL is a unique endpoint that your scheduled job "pings" to signal its successful completion. Think of it like a pulse. A monitoring service, like Heartfly, expects to receive this pulse within a pre-defined interval.
Here's the basic flow:
- You configure a new "monitor" in Heartfly, specifying a unique heartbeat URL and an expected interval (e.g., "every 24 hours").
- Your backup script, upon successful completion, makes a simple HTTP
GETrequest to this unique URL. This is typically done usingcurl. - Heartfly receives the ping and resets its internal timer for that monitor.
- If Heartfly does not receive a ping within the configured interval (plus any grace period), it assumes the job failed to run or complete successfully and sends an alert via Slack, Discord, email, or other configured channels.
This flips the monitoring paradigm: instead of constantly looking for errors, you're looking for the absence of success. It's a powerful and efficient way to monitor critical scheduled tasks.
Integrating Heartbeats into Your Borg Backup Script
The key to effective heartbeat monitoring is to ensure the curl command for the success ping is executed only if your entire backup process has genuinely completed without errors.
Let's start with a simple example. Imagine a very basic Borg backup command:
#!/bin/bash
set -euo pipefail # Exit immediately if a command exits with a non-zero status. Exit if an unset variable is used. Fail if any command in a pipeline fails.
HEARTFLY_URL_SUCCESS="https://cron2.91-99-176-101.nip.io/api/v1/heartbeat/YOUR_UNIQUE_ID/ping"
# Your Borg backup command
borg create --stats /mnt/backup/borg_repo::myhost-{now} /home/user/data
# If borg create succeeds, send the heartbeat
curl -fsS --retry 3 "${HEARTFLY_URL_SUCCESS}" || echo "Warning: Failed to send success heartbeat."
In this snippet:
* set -euo pipefail is crucial. If borg create fails, set -e will cause the script to exit immediately, preventing the