See what caused your incident before the war room starts.

Name: Incidentary
Availability: InStock

When an incident happens, Incidentary tells you where the failure first appeared, when it happened, and how it spread. Ready the moment the alert fires. No guesswork. No archaeology.

See a real incident →Get Started →

Incidentary trace view showing the causal chain, truth cards, and inspector for a synthetic Redis cluster failover incident

The audit

Don't take our word for it.

Especially not in marketing copy. Inspect these three things before you install. Everything else on this page is downstream.

Open the live demo

A real synthetic incident. No signup, no throwaway email.

incidentary.com/demo

Read the SDK source

Five Apache 2.0 repos across Node, Python, Go, .NET, and Kubernetes.

github.com/incidentary

Read the trust posture

Metadata only. We don't see your data because we don't want to.

incidentary.com/security

And one more thing we won't do: guess. Incidentary makes no inferences about your incidents. Everything in the artifact is something one of your services actually reported.

The artifact

Open one artifact.
Read four answers.

Every Incidentary artifact ships with the same four-answer header. Not because the answers are easy — because the questions never change.

INC-2444 · checkout-service14:22 UTCpartial

where the failure broke

session-service DB_QUERY 500

Redis GET timed out after 1.5s × 3 retries — cluster failover in progress.

where it spread

cdn-edge → checkout-service → session-service → redis

what we don't know

Whether the retry budget amplified load on the failing primary.

1 gap at warehouse-api (out of critical path)

what to look at next

Inspect Redis cluster state at redis-node-3.prod.internal:6379.

Verify session-service retry policy: 3 attempts, 1.5s each.

We don't tell you why it broke. That's still your job. We just make sure the question starts in the right place.

Recent incidents

Five incidents.
Five different first sentences.

When the artifact lands, the first thing every responder reads is one sentence. Incidentary writes it. It says exactly what happened, in the order it happened, in the language your team already uses.

Written by code, not by a model — every word maps to an event your services actually reported. No LLM. No inference. No "we think the issue is..."

01
sev-1checkout-svcINC-241214:22 UTC
checkout-svc called payments, which returned 503 after 1247ms. The error propagated to api-gateway.
partialassembled in 1.8s · 5 services visible · 1 gap at pg_pool · 3 annotations
02
sev-3orders-apiINC-228709:08 UTC
orders-api retried inventory 47 times within 3 seconds before timing out. The retry pattern matched a previous incident (INC-1903).
fullassembled in 2.1s · 2 services visible · similar incidents: 1
03
sev-2checkout-svcINC-251109:14 UTC
checkout-svc deployed at 09:14:00 UTC. First confirmed break at 09:14:42 — 42 seconds after rollout completed. Reverting the deploy correlated with alert resolution.
fullassembled in 1.4s · deploy markers: 1
04
sev-2api-gatewayINC-261522:41 UTC
api-gateway hit a 30-second timeout calling search-svc. Search-svc was unreachable from the gateway's region. 6 services visible, 2 gaps in the network path.
lowassembled in 2.4s · gaps detected: 2
05
sev-1users-svcINC-270103:17 UTC
users-svc started returning 500s 4 minutes after a database migration began. The migration locked the users table for read traffic. 3 services affected, all visible.
fullassembled in 1.6s · 3 services visible · 5 annotations

The same shape, every time. The channel opens to a sentence — not to "wait, which database?"

The mechanism

How it works.(Spoiler: there's no AI involved. On Purpose.)

Four steps. Plain mechanics. The trick isn't intelligence — it's timing.

Step 01
Capture continuously
The SDK records every outbound call, every error, every slow query — the moment they happen. By default we capture the skeleton: timing, status, causal shape. When the pre-arm signals trip, we elevate to full detail. Events buffer locally and flush in the background. Your services keep running. We keep listening.
flush every 1s · skeleton (timing + causal shape) by default
Step 02
Correlate as events arrive
The correlator builds the causal graph in real time, not on demand. By the time anything goes wrong, the graph already exists. We're not assembling at alert time — we're waiting to be asked.
causal graph: streaming, not on demand
Step 03
Lock the window when something looks off
Anomaly thresholds (latency spikes, error bursts, retry storms) trip the pre-arm sequence. The surrounding causal window locks the moment something looks wrong — so when the alert fires, the lead-up is already preserved.
pre-arm window: 60s–5min, while signals stay hot
Step 04
Deliver at the alert
When PagerDuty (or OpsGenie, or your custom webhook) fires, Incidentary assembles the artifact within seconds. The link lands in Slack with the Truth Cards already populated. You open one URL. The room opens to evidence.
Incidentary14:14:23 UTC
Pre-arm captured · payment-api p99 +320%
The 2m36s window before this alert is preserved.
where
session-service · DB_QUERY 500
when
14:13:47 UTC · T−12s
what
Redis cluster failover in progress
Open the artifact →
How Incidentary's message appears in Slack the moment an alert fires.
artifact ready: ≤2s after webhook

We're not replacing your APM.

You read Incidentary first, then go to Datadog knowing exactly what you're looking for. Think of it as the index for the rest of your stack — the page you read before you start scrolling Loki.

The job

The demo starts in 9 min.The page just fired.

Or it's 2:14am. Or you're three messages behind in standup and your phone has been buzzing for forty seconds. The hour decides who's watching. It doesn't decide what you need from the next thirty seconds.

You need the cause, named, in the language your services already use. You need an artifact you can paste in the channel without a paragraph of context. You need it before the ETA pings start, before the Slack guesses start, before the customer email gets escalated to your VP.

Stop the "wait — did anyone deploy?" message.
Stop the every-five-minutes "any update?" ping from the room.
Stop pasting screenshots from four tools into one war-room thread.
Stop telling Marketing "we don’t have an answer yet" for the third time.
Stop forecasting an ETA you can’t forecast because you don’t know the cause.
Stop reconstructing the timeline from scrollback the next morning.
Stop writing "it appears that…" in the postmortem.
Stop saying "I’m not sure" in the customer-facing email.
Stop the second war room because the first one didn’t reach a conclusion.
Stop the on-call rotation feeling like a tax the senior engineers pay.

The artifact is the alert. The cause is in the first frame. The chain was already assembled by the time you opened the link.

You make the call from evidence.You go back to whatever the page interrupted.

Open a real artifact

incidentary.com/demono signup, no throwaway email

The install

Five minutes to your first artifact.
One destination.

Already running OpenTelemetry? Add Incidentary as one more exporter. Don't have OTel yet? Install the SDK on one service. Either door, same artifact.

Add to your existing collector

No package to install. No agent to run. Three lines of YAML in the pipeline you already maintain.

full quickstart guide →

otel-collector.yaml

exporters:
  otlp/incidentary:
    endpoint: api.incidentary.com:4317
    headers:
      authorization: "Bearer ${INCIDENTARY_API_KEY}"

service:
  pipelines:
    traces:
      exporters: [otlp/incidentary]

The next incident is already on its calendar.

Open one. See for yourself. Then decide whether the next one belongs in here too.

Open a real artifact →Get started →

We'd rather you opened one before you signed up. Signup is for after.

See what caused your incident before the war room starts.

Don't take our word for it.

Open the live demo

Read the SDK source

Read the trust posture

Open one artifact.Read four answers.

Five incidents.Five different first sentences.

How it works.(Spoiler: there's no AI involved. On Purpose.)

Capture continuously

Correlate as events arrive

Lock the window when something looks off

Deliver at the alert

We're not replacing your APM.

Five minutes to your first artifact.One destination.

The next incident is already on its calendar.

Open one artifact.
Read four answers.

Five incidents.
Five different first sentences.

Five minutes to your first artifact.
One destination.