Scheduling & confirmations

Every monitor — whatever its type — shares a handful of settings that control how often it’s checked, how patient Aloft is before declaring trouble, and how slow counts as “not quite down.” Get these right and your alerts stay both timely and trustworthy.

Check interval

The check interval is how often Aloft runs the monitor. It ranges from 60 seconds to 24 hours, and defaults to 5 minutes.

Shorter intervals catch problems faster but generate more checks; longer intervals are fine for things that change slowly (a domain registration doesn’t lapse between two 5-minute checks). Pick the cadence that matches how quickly you’d actually want to know.

Timeout

The timeout is how long a single check waits for a response before giving up and recording a failure. It defaults to 30 seconds (range 1 second to 2 minutes).

Set it comfortably above your service’s normal worst-case response time. A timeout that’s too tight will flag a merely-slow service as down; too loose and a truly hung service takes longer to detect.

Confirmation re-checks

A single failed check is not enough to wake you up. The confirmation setting is the number of consecutive failures Aloft must see before it opens an incident and alerts you. It defaults to 2, and can range from 1 to 10.

This is your false-alarm filter. A lone failed check is often noise — a dropped packet, a momentary DNS hiccup, a load balancer recycling a backend, a deploy blipping for a second. Requiring two (or more) failures in a row means a real, sustained problem trips the alert while transient blips are quietly ignored.

The trade-off is detection speed: with the default of 2 and a 1-minute interval, a genuine outage is confirmed about a minute after the first failure. Raise the confirmation count if a particular monitor is flappy; lower it to 1 only for endpoints where you want to know about even a single miss instantly.

Response-time threshold (degraded)

By default a monitor is either up or down. If you set a response-time threshold (in milliseconds), you add a middle state: a check that succeeds but comes back slower than the threshold is marked degraded rather than up.

Degraded means “answering, but not healthy” — useful for catching a service that’s struggling under load before it tips over into a hard outage. The threshold is optional; leave it blank (or set it to 0) and the degraded state is simply never used.

The state machine, in plain terms

At any moment a monitor sits in one of these states:

Up — the last confirmed check succeeded (and, if a threshold is set, was fast enough).
Degraded — the check succeeded but was slower than your response-time threshold. The service is reachable but underperforming. For uptime calculations, degraded still counts as “available.”
Down — checks have failed for the full confirmation count in a row. This is what opens an incident and fires your alerts.
Paused — you’ve manually paused the monitor; Aloft stops checking it and it won’t alert.
Pending — a brand-new monitor that hasn’t completed its first check yet.

The flow in practice: a healthy monitor sits at up. A slow-but-working check nudges it to degraded. Failures begin, and once they reach the confirmation count the monitor flips to down and an incident opens. When successful checks return, the monitor recovers to up (or degraded) and the incident closes.

Incidents — what an open incident looks like and how alerts go out.
Heartbeats use the interval together with a grace period instead of confirmation re-checks.
SSL & domain expiry — types where the threshold doesn’t apply.