What is cron job monitoring?

Cron job monitoring is the practice of confirming that your scheduled tasks actually run and succeed on time. Instead of trusting that cron did its job, a monitor expects a signal from each run and alerts you when that signal is late or missing, which is how it catches a job that errored or never ran at all.

How do I get alerted when a cron job fails?

Have the job send an HTTP ping when it finishes successfully, and give a monitor your cron schedule so it knows when the next ping is due. If the ping does not arrive within the schedule plus a grace period, the monitor marks the check down and sends an alert by email, Slack, Discord or webhook.

How is this different from cron emailing me on error?

Cron's MAILTO only sends mail when a job produces output or exits non-zero, so it tells you nothing when the job never runs because the crontab entry was dropped or the server was down. Heartbeat monitoring catches that case because the expected success ping simply does not arrive.

Will a single slow run page me?

No. You set a grace period that covers a normal slow run, and flap detection ignores a single missed window followed by a recovery. You only get paged when a job is genuinely stuck or gone, not when it runs a few minutes long.

Can I monitor cron jobs and website uptime in one place?

Yes. In Cronaut, cron and heartbeat checks run on the same engine as HTTP, keyword and SSL checks, so a silent backup failure and a downed API show up on the same dashboard and the same status page.

Cron Job Monitoring

The problem with cron

Cron is excellent at starting jobs on a schedule and completely indifferent to whether they work. It runs your script, throws away the output unless you capture it, and never checks that the job did what it was meant to do. When a nightly backup, a billing sync or a cleanup task stops working, you usually find out days later, when something downstream is missing.

There are three distinct ways a scheduled job fails, and they are not equally visible:

It ran and errored. The job exited non-zero or threw an exception. This one is easy to catch.
It ran but did the wrong thing. It exited zero while silently producing nothing, because of a bug or an empty input.
It never ran at all. A bad deploy dropped the crontab line, the schedule was wrong, or the server was down. There is no error and no log, so most setups miss it entirely.

That third case is the dangerous one, and it is exactly the case that cron's own MAILTO cannot see. There is no output to email when nothing runs.

How cron failure detection works

Heartbeat monitoring flips the logic around. Instead of waiting for a job to report a failure, it waits for the job to report success, and treats the absence of that report as the failure. Your job pings a URL when it finishes:

0 2 * * *  /usr/local/bin/backup.sh && curl -fsS https://ping.cronaut.dev/your-check-id

The && is the whole trick. The ping fires only if backup.sh exits zero, so a failed run sends nothing and a missing crontab entry sends nothing. Either way the expected signal does not arrive, and that absence is what gets caught.

For more detail, signal the three states separately so you can see duration and tell a crash from a job that never started:

curl -fsS https://ping.cronaut.dev/your-check-id/start
backup.sh && curl -fsS https://ping.cronaut.dev/your-check-id   # success
# on failure:
curl -fsS https://ping.cronaut.dev/your-check-id/fail

Deadlines and grace periods

A heartbeat is only useful if something knows when the ping was due. Cronaut reads your cron expression, so it knows the schedule and can tell a late run from a dead job. When the ping does not arrive within the schedule plus a grace period you set, the check moves to DOWN.

The grace period is how you avoid being paged for a job that just ran long. Set it to cover a normal slow run, and let anything past that alert you. Flap detection handles the job that succeeds, misses one window, then recovers, so a single blip never wakes you up.

Alerting that means something

A check changing state is the only thing that triggers an alert, so you are notified when a job genuinely stops, not on every routine run. When a cron check goes down you get a notification by email, Slack, Discord or webhook, and the same state change opens an incident on your public status page automatically. When the next ping arrives on time, the check recovers and the incident closes itself.

One engine for cron, uptime and SSL

A cron job failing is the same kind of event as your API going down: something you rely on stopped working. In Cronaut the cron heartbeat runs on the same check engine as active uptime monitoring and SSL certificate monitoring, so a silent backup failure, a downed endpoint and an expiring certificate all show up on one dashboard and one status page. There is a single place to look when something feels off.

Cron job monitoring that catches the job that never ran

The problem with cron

How cron failure detection works

Deadlines and grace periods

Alerting that means something

One engine for cron, uptime and SSL

Frequently asked questions

Keep reading

How to monitor cron jobs and get alerted when one fails

How to know if a cron job actually ran

Uptime monitoring

SSL certificate monitoring

Monitor it all in one place