Cron MTTR (Mean Time To Recover)
How long failed scheduled jobs take to recover.
Last updated 05/27/26
What this report shows
MTTR (Mean Time To Recover) for each scheduled job (cron). A failure group is one or more consecutive failed runs ended by a success — MTTR is the gap from the first failure to that recovering success.
This tells you: when a cron breaks, how long does it stay broken before the next green run fixes it? Lower is better.
How to read it
- MTTR (minutes) per cron path. The lower, the faster the system self-heals or someone investigates.
- Failure groups — how many separate failure-and-recover incidents in the window. More groups = more flakiness.
- Open incidents — failure groups with no later success yet. These are excluded from the aggregate so a single ongoing outage doesn't dominate the average.
A cron with high MTTR is doing something dangerous: silently failing for hours/days before recovering, leaving downstream consumers stale.
Common questions
- What counts as a "failure"? Any run whose status was logged as failed in `cron_run_logs`.
- Why does a cron show MTTR = 0? It either had no failures in the window, or every failure was immediately followed by a success on the very next run (the run interval is the MTTR).
- Why is a cron missing entirely? It has no run logs in the window — either it's disabled or hasn't fired yet.
What to do if numbers look wrong
- Open the cron detail / `/admin/operations/crons` and confirm the recent run history.
- For high-MTTR crons, look at the failure reason in the run-log detail.
- If a cron with known recent failures isn't showing, check whether its name in the logs matches the report's expected path.