Cron MTTR (Mean Time To Recover)

How long failed scheduled jobs take to recover.

Last updated 05/27/26

What this report shows

MTTR (Mean Time To Recover) for each scheduled job (cron). A failure group is one or more consecutive failed runs ended by a success — MTTR is the gap from the first failure to that recovering success.

This tells you: when a cron breaks, how long does it stay broken before the next green run fixes it? Lower is better.

How to read it

MTTR (minutes) per cron path. The lower, the faster the system self-heals or someone investigates.
Failure groups — how many separate failure-and-recover incidents in the window. More groups = more flakiness.
Open incidents — failure groups with no later success yet. These are excluded from the aggregate so a single ongoing outage doesn't dominate the average.

A cron with high MTTR is doing something dangerous: silently failing for hours/days before recovering, leaving downstream consumers stale.

Common questions

What counts as a "failure"? Any run whose status was logged as failed in `cron_run_logs`.
Why does a cron show MTTR = 0? It either had no failures in the window, or every failure was immediately followed by a success on the very next run (the run interval is the MTTR).
Why is a cron missing entirely? It has no run logs in the window — either it's disabled or hasn't fired yet.

What to do if numbers look wrong

Open the cron detail / `/admin/operations/crons` and confirm the recent run history.
For high-MTTR crons, look at the failure reason in the run-log detail.
If a cron with known recent failures isn't showing, check whether its name in the logs matches the report's expected path.