The problem with cron
Cron is excellent at starting jobs on a schedule and completely indifferent to whether they work. It runs your script, throws away the output unless you capture it, and never checks that the job did what it was meant to do. When a nightly backup, a billing sync or a cleanup task stops working, you usually find out days later, when something downstream is missing.
There are three distinct ways a scheduled job fails, and they are not equally visible:
- It ran and errored. The job exited non-zero or threw an exception. This one is easy to catch.
- It ran but did the wrong thing. It exited zero while silently producing nothing, because of a bug or an empty input.
- It never ran at all. A bad deploy dropped the crontab line, the schedule was wrong, or the server was down. There is no error and no log, so most setups miss it entirely.
That third case is the dangerous one, and it is exactly the case that cron's own
MAILTO cannot see. There is no output to email when nothing runs.
How cron failure detection works
Heartbeat monitoring flips the logic around. Instead of waiting for a job to report a failure, it waits for the job to report success, and treats the absence of that report as the failure. Your job pings a URL when it finishes:
0 2 * * * /usr/local/bin/backup.sh && curl -fsS https://ping.cronaut.dev/your-check-id
The && is the whole trick. The ping fires only if backup.sh
exits zero, so a failed run sends nothing and a missing crontab entry sends nothing. Either
way the expected signal does not arrive, and that absence is what gets caught.
For more detail, signal the three states separately so you can see duration and tell a crash from a job that never started:
curl -fsS https://ping.cronaut.dev/your-check-id/start
backup.sh && curl -fsS https://ping.cronaut.dev/your-check-id # success
# on failure:
curl -fsS https://ping.cronaut.dev/your-check-id/fail Deadlines and grace periods
A heartbeat is only useful if something knows when the ping was due. Cronaut reads your cron
expression, so it knows the schedule and can tell a late run from a dead job. When the ping
does not arrive within the schedule plus a grace period you set, the check moves to
DOWN.
The grace period is how you avoid being paged for a job that just ran long. Set it to cover a normal slow run, and let anything past that alert you. Flap detection handles the job that succeeds, misses one window, then recovers, so a single blip never wakes you up.
Alerting that means something
A check changing state is the only thing that triggers an alert, so you are notified when a job genuinely stops, not on every routine run. When a cron check goes down you get a notification by email, Slack, Discord or webhook, and the same state change opens an incident on your public status page automatically. When the next ping arrives on time, the check recovers and the incident closes itself.
One engine for cron, uptime and SSL
A cron job failing is the same kind of event as your API going down: something you rely on stopped working. In Cronaut the cron heartbeat runs on the same check engine as active uptime monitoring and SSL certificate monitoring, so a silent backup failure, a downed endpoint and an expiring certificate all show up on one dashboard and one status page. There is a single place to look when something feels off.