Reliability monitoring

Monitoring turns a collection into a watched routing chain. When a collection is monitored, the gateway runs a tiny server-side canary against every step on a schedule — roughly every 5 minutes — and records whether each step is up and how fast it answered. That history drives the per-step uptime and latency shown on the collection page, and feeds the alert thresholds you configure.

Free during beta. Canaries run gateway-internal — a single one-token ping per step — and never debit your wallet.

Turn monitoring on

Toggle monitoring for a collection you own (owner/admin). The first canary runs immediately so the page isn't empty; scheduled checks follow.

POST /me/collections/:slug/monitor
{ "monitored": true }                 // omit or false to turn off
→ { "ok": true, "monitored": true,
    "note": "Monitoring on — first canary running now; scheduled checks follow." }

Per-step health

Each canary picks the same route a real request would and sends a one-token ping to the step's model/provider, recording up/down and latency. GET /me/collections/:slug/health returns, per step, the uptime and p95 latency over a window (default 7 days; ?days= 1–90) and the last canary time. Any member can read it.

GET /me/collections/india-first-llama/health?days=7
→ { "monitored": true,
    "health": [
      { "model": "llama-3.1-8b-instruct", "provider": "vllm",
        "uptime": 0.998, "p95_ms": 820, "checks": 2014,
        "last_check": "2026-06-15T09:35:00Z" },
      { "model": "llama-3.1-8b-instruct", "provider": "krutrim",
        "uptime": 1.0, "p95_ms": 540, "checks": 2014, "last_check": "..." }
    ] }

uptime is a fraction 0–1 (successful canaries ÷ total) and p95_ms the 95th-percentile latency over the window; both are null until the first checks land. Run a check on demand with POST /me/collections/:slug/check, which canaries every step now, evaluates alerts, and returns fresh health.

Alerts

An alert fires when a metric crosses your threshold over a recent window. On breach BharatRouter sends an email and/or POSTs a webhook, then stays quiet for that alert for 24 hours (dedupe). Owners/admins manage alerts; you must supply notify_email and/or webhook_url.

Metric	Threshold	Fires when
`error_rate`	fraction 0–1	failed canaries ÷ total over the window ≥ threshold.
`latency_p95`	ms	p95 latency over the window ≥ threshold.

POST /me/collections/:slug/alerts
{
  "metric": "error_rate",         // or "latency_p95"
  "threshold": 0.1,               // error_rate: 0–1   |   latency_p95: ms
  "window_min": 60,               // window in minutes, 5–1440 (default 60)
  "notify_email": "oncall@acme.in",
  "webhook_url": "https://hooks.acme.in/br"   // optional, public http(s) only
}
→ { "ok": true, "alert": { "id": 3, "metric": "error_rate", "threshold": 0.1,
      "window_min": 60, "notify_email": "oncall@acme.in", "webhook_url": null } }

The webhook_url must be a public http(s) endpoint — the same SSRF guard as BYOE rejects loopback, private and cluster-internal hosts. On breach the gateway POSTs a compact JSON event you can route to Slack, Discord or a custom ingest:

POST <your webhook>   Content-Type: application/json
{ "event": "alert", "collection": "india-first-llama", "name": "India-first Llama",
  "metric": "error_rate", "threshold": 0.1,
  "breach": "error rate 22% ≥ 10% (last 60m)",
  "at": "2026-06-15T09:40:00Z",
  "text": "⚠ BharatRouter: \"India-first Llama\" — error rate 22% ≥ 10% (last 60m)" }

Endpoint	What it does
`GET /me/collections/:slug/alerts`	List the collection's alerts.
`POST /me/collections/:slug/alerts`	Create an alert (owner/admin).
`DELETE /me/collections/:slug/alerts/:id`	Remove an alert (owner/admin).

From agents (MCP)

The same surface is available over MCP for agents:

Tool	What it does
`get_collection_health`	Per-step 7-day uptime, p95 latency and last-canary time (read-only).
`set_monitoring`	Turn monitoring on/off (write — `user_confirmed: true`).
`run_monitor_check`	Canary every step now and return fresh health (write — `user_confirmed: true`).
`set_monitor_alert`	Add an alert on `error_rate` or `latency_p95` (write — `user_confirmed: true`).
`remove_monitor_alert`	Remove an alert by id (write — `user_confirmed: true`).