Relvy – AI On-Call Runbooks That Slash MTTR

TL;DR

TL;DR: Relvy turns static on-call runbooks into AI-executable workflows, slashing MTTR by automating SOP execution during incidents — backed by YC F24.

Source and Accuracy Notes

Official site: https://relvy.ai
HN Launch: https://news.ycombinator.com/item?id= (search Relvy on HN)
Founded: YC F24 batch, 2024

What Is Relvy?

Relvy is an on-call runbook automation platform built for SRE and incident response teams. Instead of relying on engineers to manually follow runbook steps during a high-pressure outage, Relvy lets AI execute those steps — automatically opening dashboards, running diagnostics, and escalating when human judgment is needed.

The core loop: your runbook is a structured document (markdown, YAML, or natural language), and Relvy’s agent interprets it at incident time, taking action without waiting for an engineer to read through it step by step.

Setup Workflow

PagerDuty, OpsGenie, and other on-call schedulers
Datadog, Grafana, and Prometheus for metrics
Slack and Microsoft Teams for alerting
GitHub for runbook versioning

Step 2: Import Your Runbooks

Import existing runbooks from Notion, Confluence, GitHub, or plain markdown. Relvy parses the structure and converts each step into a machine-executable action with a confidence threshold.

# Example: import from GitHub repo
relvy import --source github --repo your-org/runbooks --branch main

Step 3: Define Action Permissions

Set which actions Relvy is allowed to take autonomously vs. which require human approval. For example:

Auto: Open Grafana dashboard, fetch pod logs, restart a known-failing service
Approval required: Scale down a deployment, change DNS records, modify load balancer config

Step 4: Configure Alert Triggers

Connect your monitoring tool to trigger Relvy runbooks automatically when a threshold is breached. A typical flow:

Alert fires in PagerDuty → on-call engineer is paged
Relvy detects the alert → starts executing the relevant runbook
AI runs through diagnostic steps → posts updates to Slack in real time
If automation succeeds, incident is resolved without engineer involvement
If confidence is low, Relvy pauses and pages the engineer with context

Deeper Analysis

Where it excels:

High-pressure incidents where engineers waste time reading through long runbooks
Repetitive on-call patterns where the same 10-step diagnostic always runs
Reducing MTTR by cutting the time between alert and first meaningful action

Where it struggles:

Highly environment-specific runbooks that need judgment calls on every step
Security-sensitive actions that can’t be delegated to AI without extensive guardrails
Teams without mature enough runbook documentation to import

Pricing: Free tier available. See relvy.ai for paid tiers.

Practical Evaluation Checklist

Does your team have documented runbooks for top incident types?
Are your runbooks structured enough to be parsed (markdown headers, numbered steps)?
Do you have clear human/AI permission boundaries for production actions?
Is your monitoring stack supported (Datadog, Prometheus, PagerDuty)?
Would auto-execution actually save time vs. manual steps?

Security Notes

Action permissions are fully configurable — you control what AI can do without approval
Runbook access can be scoped to specific teams
Audit log of all AI-executed actions is available
Self-hosted deployment option for air-gapped environments (check with Relvy team)

FAQ

Q: Does Relvy replace the on-call engineer? A: No. Relvy automates routine diagnostic and remediation steps, but pauses and escalates when confidence is low. The engineer is still in the loop for judgment calls.

Q: What happens if the AI takes a wrong action? A: Relvy maintains a human-in-the-loop model. Each action class has a confidence threshold — below it, the AI pauses and requests approval before proceeding.

Q: How does it connect to existing monitoring? A: Relvy has native integrations with Datadog, Grafana, Prometheus, PagerDuty, and OpsGenie. You can also use webhooks for custom setups.

Q: Is there a self-hosted option? A: Contact the Relvy team for enterprise/air-gapped deployments. Standard offering is SaaS.

Conclusion

Relvy targets a real pain point: on-call engineers spending critical incident minutes manually following runbooks that a machine could execute. If your team has well-documented runbooks and wants to cut MTTR without hiring more engineers, it’s worth evaluating. The YC F24 backing signals the team is serious about building in the incident response space.

Site: https://relvy.ai

dev-tools

Automotive Skills Suite for AI Engineering

Evaluate Automotive Skills Suite for APQP, ASPICE, HARA, safety-plan, and DIA workflows with setup notes, governance risks, and SME review guidance.

5/28/2026

dev-tools

awesome-agentic-ai-zh Roadmap Guide

Explore awesome-agentic-ai-zh as a Chinese agentic AI learning roadmap, with setup notes, track selection, study workflow, and evaluation guidance.

5/28/2026

dev-tools

Baguette iOS Simulator Automation Guide

Set up Baguette for iOS Simulator automation, web dashboards, device farms, gesture input, streaming, and camera testing with Xcode caveats.

5/28/2026

TL;DR

Source and Accuracy Notes

What Is Relvy?

Setup Workflow

Step 1: Sign Up and Connect Your Stack

Step 2: Import Your Runbooks

Step 3: Define Action Permissions

Step 4: Configure Alert Triggers

Deeper Analysis

Practical Evaluation Checklist

Security Notes

FAQ

Conclusion

Related Posts