Routing & data residency

Most models in the catalog have more than one provider route. On every request the gateway picks among the healthy, eligible routes — by price unless you say otherwise — then fails over automatically if the chosen provider errors. You steer this with four optional request fields, accepted on both /v1/chat/completions and /v1/embeddings. They are BharatRouter extensions: the router consumes them and strips them before anything is forwarded upstream, so providers never see them.

Request extensions

Field	Values	What it does
`optimize`	`price` (default) · `latency` · `uptime` · `auto`	Route-selection preference among eligible providers. `price` ranks by input + output ₹/Mtok summed (not input alone). `latency` uses a moving average of observed latency per route; `uptime` sorts by observed failure rate; `auto` blends all three (reliability first, then a latency/price trade-off). Note: whatever the mode, routes whose circuit is open are always tried last — so the catalog's cheapest route can be skipped when it's currently unhealthy, which is why a `price` request may not land on the single cheapest provider.
`provider`	a provider id, e.g. `krutrim`, `sarvam`, `vllm`	Pin one provider and skip dynamic routing entirely. Provider ids are listed at `GET /v1/providers`.
`data_policy`	`india_only`	Only India-resident routes are eligible. If none exists for the model, the request fails with `no_route` — it never silently leaves India.
`upstream_key`	your provider API key	Per-request BYOK: the call runs on your key and your provider billing. Never stored or logged. See BYOK.

The route actually used is reported in the x-br-provider response header on every reply, streamed or not.

Example: cheapest India-resident route

from openai import OpenAI
client = OpenAI(base_url="https://api.bharatrouter.com/v1", api_key="br-...")

r = client.chat.completions.create(
    model="qwen2.5-7b-instruct",
    messages=[{"role": "user", "content": "Summarise this complaint in Hindi: ..."}],
    extra_body={"optimize": "price", "data_policy": "india_only"},
)

Example: pin a provider

curl https://api.bharatrouter.com/v1/chat/completions \
  -H "Authorization: Bearer br-..." -H "Content-Type: application/json" \
  -d '{
    "model": "krutrim-2",
    "provider": "krutrim",
    "messages": [{"role": "user", "content": "namaste"}]
  }'

How india_only works

Every route in the catalog carries a residency tag. data_policy: "india_only" filters the candidate set to India-resident routes before selection and failover, so the guarantee holds even mid-failover: a request can fail with no_route (HTTP 400) but it cannot be served from outside India. This is the enforcement point for DPDP-sensitive workloads — put it in the request, not in a policy document.

Failover & circuit breakers

If the selected provider fails, the gateway retries the remaining eligible routes in preference order before giving up with all_routes_failed (HTTP 502). Health is tracked per route:

What counts as a failure (triggers failover): a connection/network error, or an upstream 5xx, 429 (rate limit), 401 or 403. The gateway walks to the next step.
What does NOT trigger failover: a client-side 4xx (e.g. 400 malformed request, 404, 422) is your request's problem, not the route's — it is returned to you as-is, unchanged, and the chain stops. A bad request won't be silently retried against a different provider (which would just fail the same way and cost you latency).
A moving average of latency and failure rate per route feeds the optimize: latency and optimize: uptime modes.
After 3 consecutive failures a route's circuit opens and it stops receiving traffic; after 30 seconds a half-open probe lets one request through, and success closes the circuit again.
Failures on a BYOK key are attributed to that key, not the route — your expired key doesn't mark a healthy provider as down for everyone.

Two different "it didn't work" responses

These are distinct on purpose — they tell you where the request stopped:

no_route (HTTP 400) — pre-flight: no eligible route could even be resolved, so nothing was dialed. Causes: a model with no configured provider, data_policy: india_only with no India route, a pinned provider that doesn't serve the model, or a chain whose every step is unresolvable.
all_routes_failed (HTTP 502) — runtime: routes existed and were tried, but every one failed (per the failure rules above).
model_not_found (HTTP 404) — the model id isn't in the catalog and isn't a discoverable provider/model BYOK id.

Reasoning models & `max_tokens`

Reasoning models (tagged reasoning in the catalog — e.g. gpt-oss-120b, the qwen3 family, gpt-5) spend part of the completion budget on hidden reasoning before the visible answer. A very small max_tokens can therefore be consumed entirely by reasoning, leaving content empty. To avoid that footgun, the gateway raises a too-small max_tokens to a floor of 512 for reasoning models only, and reports it in the x-br-reasoning-min-tokens response header. An unset max_tokens is left alone (the provider default is already reasoning-aware). The thinking trace, when a provider returns one, arrives in a separate reasoning_content field, not in content.

Live circuit state is public at GET /health, and per-model 7-day stats at GET /v1/models/:id/stats.

Saved fallback chains

Beyond per-request routing you can save a fallback chain for a model — an ordered list of steps that replaces that model's default routing for your whole org. A step is { model, provider? } (a bare string is shorthand for { model }), and chains are cross-model first-class: "my own GPU → Krutrim → OpenRouter" is a valid chain. The same JSON shape is used everywhere — REST, MCP, and the dashboard.

1–10 steps. model is a catalog id or a provider/model-id BYOK id.
provider is a catalog provider id (see GET /v1/providers) or a byoe:<slug> custom endpoint.
Steps that don't resolve yet (a BYOK key not saved, a BYOE endpoint not registered) are skipped at request time rather than failing the chain.

PUT /me/routing/llama-3.1-8b-instruct
{ "steps": [
    { "model": "llama-3.1-8b-instruct", "provider": "vllm" },
    { "model": "llama-3.1-8b-instruct", "provider": "krutrim" },
    { "model": "mistral/mistral-large-latest" }
] }
→ { "ok": true, "model": "llama-3.1-8b-instruct", "steps": [ ... ] }   // effective within a minute

Endpoint	What it does
`GET /me/routing`	List your org's saved chains.
`GET /me/routing/:model`	Get the chain for one model.
`PUT /me/routing/:model`	Save or replace the chain (owner/admin).
`DELETE /me/routing/:model`	Remove it — routing returns to default (owner/admin).

A per-request fallbacks array (same step shape, on the chat/embeddings body) overrides the saved chain for that single call. Chains can be shared and reused as collections, and steps can point at your own registered endpoints. Agents manage chains over MCP with get_fallback_chains, set_fallback_chain and clear_fallback_chain.

Streaming & metering

For streamed requests the gateway injects stream_options.include_usage on providers that support it, and parses the usage block from the final SSE chunk — so streamed and non-streamed requests are metered identically, and your credit debits always reflect real token counts.