Skip to content
POLITIFAST

Cron MTTR (Mean Time To Recover)

How long failed scheduled jobs take to recover.

Last updated 05/27/26

What this report shows

MTTR (Mean Time To Recover) for each scheduled job (cron). A failure group is one or more consecutive failed runs ended by a success — MTTR is the gap from the first failure to that recovering success.

This tells you: when a cron breaks, how long does it stay broken before the next green run fixes it? Lower is better.

How to read it

  • MTTR (minutes) per cron path. The lower, the faster the system self-heals or someone investigates.
  • Failure groups — how many separate failure-and-recover incidents in the window. More groups = more flakiness.
  • Open incidents — failure groups with no later success yet. These are excluded from the aggregate so a single ongoing outage doesn't dominate the average.

A cron with high MTTR is doing something dangerous: silently failing for hours/days before recovering, leaving downstream consumers stale.

Common questions

  • What counts as a "failure"? Any run whose status was logged as failed in `cron_run_logs`.
  • Why does a cron show MTTR = 0? It either had no failures in the window, or every failure was immediately followed by a success on the very next run (the run interval is the MTTR).
  • Why is a cron missing entirely? It has no run logs in the window — either it's disabled or hasn't fired yet.

What to do if numbers look wrong

  1. Open the cron detail / `/admin/operations/crons` and confirm the recent run history.
  2. For high-MTTR crons, look at the failure reason in the run-log detail.
  3. If a cron with known recent failures isn't showing, check whether its name in the logs matches the report's expected path.