Troubleshooting

Each section below pairs a symptom with its likely cause and a fix. Work top to bottom — most issues are one of the first two.

My monitor shows “Unknown” forever

Symptom. A monitor never leaves Unknown (shown as Pending on the dashboard filter); no checks ever land.

Cause. The background worker that runs checks isn’t running. Aloft’s web app and its check worker are separate processes — the UI can be up while no checks execute.

Fix. Start (or restart) the worker process. The Monitors dashboard shows a “Worker is not reporting” card with how long it’s been since the worker was last seen — use it to confirm the worker is the problem. Once the worker is running, the monitor’s first check lands within seconds and the status updates.

No alerts are arriving

Symptom. A monitor goes down but nobody is notified.

Causes and fixes, in order of likelihood:

No channel attached. Alerts only go to channels attached to the monitor. Open the monitor’s edit form, find Alert via, and tick the channels that should be notified. See Alerting overview.
The channel is disabled. Disabled channels are silently skipped. Re-enable it on the Channels tab under Alerts. See Channels.
A maintenance window is active. Active windows suppress incidents and alerts. Check Maintenance for a window currently covering the monitor.
Email isn’t configured. If you’re relying on email and the deployment has no email provider set up, email alerts can’t be sent (they may only log to stdout). Use a Slack or Discord channel, or ask your admin to configure email.

Use the Channels tab’s test action to confirm a channel works end to end.

My webhook receiver rejects Aloft’s calls

Symptom. Aloft posts to your webhook but your receiver returns an error or drops the request.

Cause. Almost always signature verification. Aloft HMAC-signs every webhook delivery; if your receiver verifies the signature with the wrong secret, the wrong algorithm, or over a re-serialized body, it will reject a legitimate call.

Fix. Verify the HMAC exactly as documented — over the raw request body, with the channel’s signing secret. See Webhooks and signing for the payload shape and verification recipe. The delivery log shows the response your receiver returned, which is the fastest way to see why it rejected the call.

My heartbeat monitor keeps going down

Symptom. A heartbeat monitor flaps to Down even though the job seems to be running.

Causes and fixes:

Interval + grace are too tight. A heartbeat goes down when no check-in arrives within the interval plus the grace period. If the job’s real cadence has jitter, widen the grace period so a slightly late check-in doesn’t trip an incident.
The job isn’t posting. Confirm the job actually calls its ingest URL on success (not in a path that’s skipped when the job short-circuits). Test the URL by hand with curl -X POST. See Heartbeat monitors.

I can’t create an API key

Symptom. The API page won’t let you mint a key.

Cause. Managing API keys is admin-only — a key grants the whole org’s rights to whoever holds it.

Fix. Ask an admin (or org owner) to create the key, or have your role raised. See Teams & roles and API and integrations.

A status page returns 404

Symptom. Sharing a status page link returns “not found”.

Causes and fixes:

Wrong or stale link. Re-copy the public URL from the status page’s settings in Status pages.
The page was deleted or its slug changed. A renamed page changes its URL; re-share the current link.
Wrong organization. A status page belongs to the org that created it. Confirm you’re sharing the link generated in the right organization.

Everything is down at once

Symptom. Every monitor flips to down (or stops updating) simultaneously.

Cause. This is rarely your endpoints — it’s almost always the worker or the database. A stalled worker stops recording checks; a database problem breaks both the worker and the UI.

Fix. Start at the Monitors dashboard and look for the “Worker is not reporting” card — if it’s there, restart the worker. If the worker is healthy but checks still fail across the board, check the database connection and the worker’s logs. A genuine mass outage (one upstream dependency everything shares) is possible, but rule out the worker and DB first.