Reliable GitLab CI Scheduled Pipeline Monitoring

GitLab CI is a powerful tool for automating virtually any task. For many teams, it's the backbone of their continuous integration and continuous delivery workflows. But beyond the typical "push code, run tests, deploy" cycle, GitLab CI excels at scheduled tasks: daily data cleanups, weekly report generations, hourly cache warming, or nightly database backups. These scheduled pipelines often operate in the background, out of sight and out of mind – until something goes wrong.

You've probably been there. A critical scheduled job silently fails to run for days, or worse, stops running entirely, and you only discover it much later when a downstream process breaks or a stakeholder asks for missing data. This "silent failure" or "job absence" problem is a significant blind spot for many teams relying on automated processes.

The Challenge of Monitoring Scheduled Pipelines

GitLab CI's built-in monitoring is excellent for what it does: it tells you if a pipeline ran and succeeded or failed. You can see the status in the UI, get notifications for failures, and review logs. However, this primarily addresses the problem of a job failing to complete successfully.

What happens if: * The pipeline schedule itself is accidentally deleted or disabled? GitLab won't trigger anything, and you'll get no notification because no pipeline started to fail. * A GitLab CI runner is misconfigured or becomes unavailable? Your scheduled pipeline might be queued indefinitely or never even picked up, again leading to no "failure" notification in the traditional sense. * Network issues prevent the scheduler from triggering? A rare but possible scenario. * The project containing the schedule is archived or deleted? All schedules go with it.

In all these scenarios, the critical issue isn't that a job failed, but that a job didn't run at all. GitLab CI, by design, isn't inherently equipped to tell you when something didn't happen. This is where an external heartbeat monitoring solution becomes indispensable.

Heartbeat Monitoring to the Rescue

Heartbeat monitoring is a simple yet powerful concept: instead of waiting for a system to tell you it's broken, you configure a critical process to "check in" regularly with an external monitoring service. If the check-in (the "heartbeat") doesn't arrive within an expected timeframe, the monitoring service assumes the process is dead or stalled and triggers an alert.

For GitLab CI scheduled pipelines, this means: 1. Upon successful completion of your scheduled pipeline, it sends a signal (a "heartbeat") to a monitoring service like Heartfly. 2. Heartfly expects this signal at a predefined interval (e.g., every 24 hours). 3. If the signal arrives as expected, Heartfly resets its timer and everything is good. 4. If the signal doesn't arrive within the expected interval (plus an optional grace period), Heartfly sends an alert via Slack, Discord, email, or other channels.

This mechanism effectively closes the "job absence" monitoring gap. It tells you not just when a job failed, but crucially, when it didn't run as expected.

Integrating Heartfly with GitLab CI Scheduled Pipelines

Integrating Heartfly is straightforward. You'll add a simple curl command to your .gitlab-ci.yml file.

Example 1: Basic Success Heartbeat

Let's say you have a daily data synchronization job that runs every night. You want to be alerted if it misses a run.

First, in Heartfly, you'd create a new monitor. Let's call it "Daily Data Sync". Heartfly will provide you with a unique URL for this monitor, something like https://cron2.91-99-176-101.nip.io/heartbeat/YOUR_UNIQUE_ID_HERE.

Now, modify your .gitlab-ci.yml:

# .gitlab-ci.yml

stages:
  - sync
  - monitor

variables:
  HEARTFLY_URL: https://cron2.91-99-176-101.nip.io/heartbeat/YOUR_UNIQUE_ID_HERE

daily_data_sync_job:
  stage: sync
  script:
    - echo "Starting daily data synchronization..."
    - # Your actual data sync commands here
    - python my_sync_script.py --daily
    - echo "Daily data synchronization complete."
  rules:
    - if: $CI_PIPELINE_SOURCE == "schedule"
      when: always # Only run this job if triggered by a schedule

# This job will only run if the daily_data_sync_job succeeds
send_heartbeat:
  stage: monitor
  needs: ["daily_data_sync_job"] # Ensure this job only runs after the sync job
  script:
    - echo "Sending Heartfly heartbeat..."
    - curl -fsS -m 10 --retry 5 --retry-delay 2 "${HEARTFLY_URL}"
    - echo "Heartfly heartbeat sent successfully."
  rules:
    - if: $CI_PIPELINE_SOURCE == "schedule"
      when: on_success # Only send heartbeat if the previous job succeeded

Explanation: * We've defined a send_heartbeat job in a separate monitor stage. This is a good practice to decouple the monitoring call from your core logic. * The needs: ["daily_data_sync_job"] ensures that the heartbeat job waits for the sync job to finish. * The rules: ... when: on_success is crucial: the heartbeat is only sent if daily_data_sync_job completes without errors. If daily_data_sync_job fails, send_heartbeat won't run, and Heartfly won't receive the ping, triggering an alert. * The curl command includes: * -fsS: Fail silently (no progress meter) and show errors. * -m 10: Timeout after 10 seconds. * --retry 5 --retry-delay 2: Retry up to 5 times with a 2-second delay if the network call fails. This adds robustness against transient network issues.

Example 2: Robust Monitoring with Start and Failure Pings

For more critical or long-running jobs, you might want to know not just if the job succeeded, but also if it started and if it failed explicitly. Heartfly supports this with "start" and "fail" URLs.

Let's imagine a weekly report generation pipeline that can take hours.

In Heartfly, when you create a monitor, you'll get three URLs: * Success URL: https://cron2.91-99-176-101.nip.io/heartbeat/YOUR_UNIQUE_ID/success * Start URL: https://cron2.91-99-176-101.nip.io/heartbeat/YOUR_UNIQUE_ID/start * Fail URL: https://cron2.91-99-176-101.nip.io/heartbeat/YOUR_UNIQUE_ID/fail

```yaml

.gitlab-ci.yml

stages: - setup - report - cleanup

variables: HEARTFLY_BASE_URL: https://cron2.91-99-176-101.nip.io/heartbeat/YOUR_UNIQUE_ID

weekly_report_pipeline: stage: report before_script: - echo "Sending Heartfly START heartbeat..." - curl -fsS -m 10 --retry 3 --retry-delay 1 "${HEARTFLY_BASE_URL}/start" - echo "Heartfly