Fly.io Scheduled Task Alerts
Fly.io offers a fantastic platform for deploying applications and services, making it easy to scale and run your code globally. From long-running web servers to worker processes, it handles a lot of the infrastructure heavy lifting. Many applications, however, rely on scheduled tasks to perform critical background operations: daily database backups, hourly data synchronization, weekly report generation, or periodic cleanup jobs.
While Fly.io excels at keeping your applications running, it doesn't provide out-of-the-box alerting for the completion or failure of these one-off or internally scheduled tasks. This is a significant blind spot. A silently failing scheduled task can lead to stale data, missed deadlines, corrupted backups, or even complete system breakdown without you ever knowing until a user complains or a downstream process breaks.
This article will walk you through how to use Heartfly to add robust, reliable monitoring and alerting to your Fly.io scheduled tasks, ensuring you're immediately notified if something goes wrong.
The Challenge of Monitoring Scheduled Tasks on Fly.io
Fly.io provides several ways to run tasks:
- Long-running processes: Defined in
fly.tomlunder[processes], these are your typical web servers, API services, or background workers (e.g., a Redis queue consumer). - One-off commands: You can use
fly machine runorfly ssh console -C "your_command"to execute a command on a new or existing machine. This is often used for administrative scripts or batch jobs. - Internal schedulers: Within a long-running process, you might use a library like
cron(Node.js),APScheduler(Python),Sidekiq-cron(Ruby), orCelery Beat(Python) to schedule tasks that run at specific intervals or times.
The challenge arises because Fly.io's primary focus is on the health of your application processes. If your web server crashes, Fly.io can restart it and alert you via health checks. But what if a daily data import script, run via fly machine run, completes with an exit code 0 but actually failed to import any data? Or what if an hourly cleanup task within your worker process simply stops running due to an internal bug, while the worker process itself remains "healthy" according to Fly.io's checks?
These are silent failures, and they're insidious. You need a mechanism to confirm that a scheduled task not only started but also successfully completed its intended work.
How Heartfly Solves This
Heartfly is a SaaS tool designed specifically for this problem. It works on the "dead man's switch" principle:
- You create a "monitor" in Heartfly for each scheduled task.
- Heartfly provides a unique "heartbeat URL" for that monitor.
- At the very end of your scheduled task, if it completes successfully, you instruct it to "ping" this heartbeat URL (typically with a simple HTTP GET request).
- You configure the expected frequency of these pings (e.g., daily, hourly, every 15 minutes) and a "grace period."
- If Heartfly doesn't receive a ping within the expected frequency plus the grace period, it assumes the task failed or stopped running and sends an alert via Slack, Discord, email, or other integrations.
This means you're alerted when a task doesn't complete as expected, rather than trying to parse logs for error messages. It's a proactive approach to monitoring task completion.
Example 1: Monitoring a Daily Database Backup Script
Let's say you have a critical daily task that backs up your Fly.io PostgreSQL database to an S3 bucket. You run this using a fly machine run command that executes a shell script.
The Setup
- Create a Heartfly Monitor: Go to Heartfly, create a new monitor. Name it "Daily DB Backup". Set the expected frequency to "Daily" and choose a suitable grace period (e.g., 30 minutes, to account for variable backup times). Heartfly will give you a unique URL, something like
https://cron2.91-99-176-101.nip.io/ping/your-unique-id. - Modify Your Backup Script: Ensure your backup script pings Heartfly only if the backup process itself was successful.
Here's a simplified example of a backup.sh script:
#!/bin/bash
# Exit immediately if a command exits with a non-zero status.
set -e
# --- Configuration (ideally from environment variables) ---
DB_NAME="${DB_NAME:-your_database_name}"
S3_BUCKET="${S3_BUCKET:-your-s3-bucket}"
HEARTFLY_URL="${HEARTFLY_URL:-https://cron2.91-99-176-101.nip.io/ping/your-unique-id}" # Your Heartfly URL
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
BACKUP_FILE="/tmp/${DB_NAME}_${TIMESTAMP}.sql.gz"
echo "Starting database backup for ${DB_NAME} at ${TIMESTAMP}..."
# 1. Dump the database
# Replace 'pg_dump' with your actual database dump command and credentials
# Ensure your Fly.io machine has access to the database (e.g., via private network)
pg_dump -Fc -d "$DB_NAME" | gzip > "$BACKUP_FILE"
echo "Database dump created: $BACKUP_FILE"
# 2. Upload to S3
# Ensure your Fly.io machine has 'awscli' installed and configured with appropriate IAM roles/credentials
aws s3 cp "$BACKUP_FILE" "s3://${S3_BUCKET}/db_backups/${DB_NAME}/${BACKUP_FILE##*/}"
echo "Backup uploaded to S3."
# 3. Clean up local backup file
rm "$BACKUP_FILE"
echo "Local backup file removed."
# 4. Ping Heartfly ONLY if all previous commands were successful
# The '&&' ensures curl only runs if 'rm' (and by extension all previous commands due to 'set -e') succeeded.
echo "Pinging Heartfly..."
curl -fsS --retry 3 --retry-max-time 30 "$HEARTFLY_URL"
echo "Daily DB Backup completed successfully."
Running the Task on Fly.io
You would typically run this script daily using a tool like cron on a dedicated Fly.io machine, or more simply, orchestrate it externally (e.g., via GitHub Actions, a dedicated scheduler VM, or another Fly.io app) to execute it as a fly machine run command.
Let's assume you've containerized this script into an image my-org/db-backup-script:latest. You could then schedule it like this:
```bash
Example of running it manually, or from an external orchestrator
Ensure DB_NAME, S3_BUCKET, and HEARTFLY_URL are passed as environment variables
fly machine run \ --env DB_NAME="my_app_db" \ --env S3_BUCKET="my-fly-