Monitoring a Go Scheduled Job with a Heartbeat
Go applications are often long-running services, handling everything from APIs to background processing. Within these services, it's common to implement scheduled tasks directly in Go using constructs like time.NewTicker or specialized libraries. This approach offers flexibility and efficiency, but it introduces a critical monitoring challenge: how do you know if your internal scheduled job is actually running and completing successfully?
Traditional cron job monitoring works by wrapping your script and reporting its exit status. But when your Go application manages its own scheduling internally, an external cron entry only tells you if the application itself started, not if its internal tasks are firing as expected. This is where heartbeat monitoring becomes indispensable.
Why Heartbeats for Go Scheduled Jobs?
Imagine a Go service responsible for nightly data synchronization, generating daily reports, or cleaning up old records. If this critical task silently stops running due to an unhandled error, a resource exhaustion, or a logic bug, you might not notice until days or weeks later when the consequences become severe.
Here's why heartbeats are particularly well-suited for Go scheduled jobs:
- Internal Scheduling: Go applications often contain their own scheduling logic (e.g.,
time.NewTicker,goroutines, or libraries likego-co-op/gocron). An externalsystemdunit ordocker-composeentry might confirm your Go app is up, but not if its internal scheduler is healthy. - Silent Failures: A
panicmight crash your entire application (which your service orchestrator might detect), but a subtle bug could cause a specific goroutine or scheduled function to hang or exit prematurely without bringing down the whole service. These are the silent failures that heartbeats catch. - Proactive Alerts: Instead of waiting for users or downstream systems to report problems, a heartbeat system can alert you the moment a scheduled task misses its expected check-in.
- Simple Integration: Adding a simple HTTP call at the end of a scheduled task is straightforward in Go.
How Heartbeat Monitoring Works
The concept is simple: 1. You configure a monitoring service with an expected schedule for your job (e.g., "this job runs every 5 minutes"). 2. The monitoring service provides you with a unique "heartbeat URL." 3. At the successful completion of your scheduled job, your Go application makes an HTTP GET or POST request to this heartbeat URL. 4. If the monitoring service doesn't receive a heartbeat within the expected time window (e.g., 5 minutes + a grace period), it considers the job to have failed or stopped running and sends an alert via Slack, Discord, email, PagerDuty, etc.
This "pull" mechanism (the monitoring service waiting for a signal) is far more robust than trying to "push" status updates from every possible failure point in a complex Go application.
Implementing a Heartbeat in Go
Integrating a heartbeat into your Go scheduled job is typically a matter of adding a single HTTP request. Let's look at a couple of common scenarios.
Example 1: Simple time.NewTicker Loop
Many Go applications implement simple recurring tasks using time.NewTicker.
package main
import (
"fmt"
"log"
"net/http"
"os"
"time"
)
// simulateWork represents the actual task your Go application needs to perform.
func simulateWork(taskID int) error {
log.Printf("Task %d: Starting work...", taskID)
// Simulate some work that might take variable time or fail
time.Sleep(time.Duration(taskID%3+1) * time.Second) // Work takes 1-3 seconds
if taskID%5 == 0 {
return fmt.Errorf("Task %d: failed due to a simulated error", taskID)
}
log.Printf("Task %d: Work finished successfully.", taskID)
return nil
}
// sendHeartbeat makes an HTTP GET request to the provided URL.
func sendHeartbeat(url string, jobName string) {
if url == "" {
log.Printf("Heartbeat URL for %s is not set. Skipping heartbeat.", jobName)
return
}
resp, err := http.Get(url)
if err != nil {
log.Printf("ERROR: Failed to send heartbeat for %s: %v", jobName, err)
return
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
log.Printf("ERROR: Heartbeat for %s returned non-200 status: %d", jobName, resp.StatusCode)
} else {
log.Printf("Heartbeat sent successfully for %s.", jobName)
}
}
func main() {
heartbeatURL := os.Getenv("HEARTBEAT_URL_DAILY_REPORT")
if heartbeatURL == "" {
log.Println("WARNING: HEARTBEAT_URL_DAILY_REPORT environment variable not set. Heartbeats will be skipped.")
}
ticker := time.NewTicker(5 * time.Second) // Run every 5 seconds for demonstration
defer ticker.Stop()
taskID := 0
for range ticker.C {
taskID++
log.Printf("----------------------------------")
log.Printf("Scheduler: Kicking off Daily Report task %d", taskID)
err := simulateWork(taskID)
if err != nil {
log.Printf("Scheduler: Daily Report task %d FAILED: %v", taskID, err)
// Optionally, send a failure heartbeat if your monitoring service supports it
// or just rely on the absence of a success heartbeat.
} else {
log.Printf("Scheduler: Daily Report task %d completed successfully.", taskID)
sendHeartbeat(heartbeatURL, "Daily Report")
}
}
}
In this example, the sendHeartbeat function is called after simulateWork completes successfully. If simulateWork panics or hangs, the heartbeat won't be sent, and your monitoring service will detect the missed check-in.
Example 2: Using a Third-Party Scheduler like go-co-op/gocron
For more complex scheduling needs, libraries like go-co-op/gocron provide a robust framework. Integrating heartbeats here is just as straightforward.
First, install the library:
go get github.com/go-co-op/gocron/v2
Then, integrate the heartbeat: ```go package main
import ( "context" "fmt" "log" "net/http" "os" "time"
"github.com/go-co-op/gocron/v2"
)
// sendHeartbeat is the same as in Example 1 func sendHeartbeat(url string, jobName string) { if url == "" { log.Printf("Heartbeat URL for %s is not set. Skipping heartbeat.", jobName) return } resp, err := http.Get(url) if err != nil { log.Printf("ERROR: Failed to send heartbeat for %s: %v", jobName, err) return } defer resp.Body.Close() if resp.StatusCode != http.StatusOK { log.Printf("ERROR: Heartbeat for %s returned non-200 status: %d", jobName, resp.StatusCode) } else { log.Printf("Heartbeat sent successfully for %s.", jobName) } }
// myScheduledJob is the function that gocron will execute. func myScheduledJob(jobName, heartbeatURL string) { log.Printf("Job '%s': Starting work...", jobName) // Simulate some work time.Sleep(2 * time.Second) // Simulate a potential failure sometimes if time.Now().Second()%10 == 0 { // Fails every 10 seconds mark log.Printf("Job '%s': Simulated failure!", jobName) // No heartbeat on failure return } log.Printf("Job '%s': Work finished successfully.", jobName) sendHeartbeat(heartbeatURL, jobName) }
func main() { heartbeatURL := os.Getenv("HEARTBEAT_URL_DATA_SYNC") if heartbeatURL == "" { log.Println("WARNING: HEARTBEAT_URL_DATA_SYNC environment variable not set. Heartbeats will be skipped.") }
s, err := gocron.NewScheduler()
if err != nil {
log.Fatalf("Failed to create scheduler: %v", err)
}
// Schedule the job to run every 10 seconds
_, err = s.NewJob(
gocron.Duration(10*time.Second),
gocron.NewTask(
func() {
myScheduledJob("Data Sync", heartbeatURL)
},
),
)
if