๐ŸŽฌ New โ€” watch the 2-minute guide videos โ†’

Routing & data residency

Most models in the catalog have more than one provider route. On every request the gateway picks among the healthy, eligible routes โ€” by price unless you say otherwise โ€” then fails over automatically if the chosen provider errors. You steer this with four optional request fields, accepted on both /v1/chat/completions and /v1/embeddings. They are BharatRouter extensions: the router consumes them and strips them before anything is forwarded upstream, so providers never see them.

Request extensions

FieldValuesWhat it does
optimizeprice (default) ยท latency ยท uptime ยท auto Route-selection preference among eligible providers. price ranks by input + output โ‚น/Mtok summed (not input alone). latency uses a moving average of observed latency per route; uptime sorts by observed failure rate; auto blends all three (reliability first, then a latency/price trade-off). Note: whatever the mode, routes whose circuit is open are always tried last โ€” so the catalog's cheapest route can be skipped when it's currently unhealthy, which is why a price request may not land on the single cheapest provider.
providera provider id, e.g. krutrim, sarvam, vllm Pin one provider and skip dynamic routing entirely. Provider ids are listed at GET /v1/providers.
data_policyindia_only Only India-resident routes are eligible. If none exists for the model, the request fails with no_route โ€” it never silently leaves India.
upstream_keyyour provider API key Per-request BYOK: the call runs on your key and your provider billing. Never stored or logged. See BYOK.

The route actually used is reported in the x-br-provider response header on every reply, streamed or not.

Example: cheapest India-resident route

from openai import OpenAI
client = OpenAI(base_url="https://api.bharatrouter.com/v1", api_key="br-...")

r = client.chat.completions.create(
    model="qwen2.5-7b-instruct",
    messages=[{"role": "user", "content": "Summarise this complaint in Hindi: ..."}],
    extra_body={"optimize": "price", "data_policy": "india_only"},
)

Example: pin a provider

curl https://api.bharatrouter.com/v1/chat/completions \
  -H "Authorization: Bearer br-..." -H "Content-Type: application/json" \
  -d '{
    "model": "krutrim-2",
    "provider": "krutrim",
    "messages": [{"role": "user", "content": "namaste"}]
  }'

How india_only works

Every route in the catalog carries a residency tag. data_policy: "india_only" filters the candidate set to India-resident routes before selection and failover, so the guarantee holds even mid-failover: a request can fail with no_route (HTTP 400) but it cannot be served from outside India. This is the enforcement point for DPDP-sensitive workloads โ€” put it in the request, not in a policy document.

Failover & circuit breakers

If the selected provider fails, the gateway retries the remaining eligible routes in preference order before giving up with all_routes_failed (HTTP 502). Health is tracked per route:

Two different "it didn't work" responses

These are distinct on purpose โ€” they tell you where the request stopped:

Reasoning models & max_tokens

Reasoning models (tagged reasoning in the catalog โ€” e.g. gpt-oss-120b, the qwen3 family, gpt-5) spend part of the completion budget on hidden reasoning before the visible answer. A very small max_tokens can therefore be consumed entirely by reasoning, leaving content empty. To avoid that footgun, the gateway raises a too-small max_tokens to a floor of 512 for reasoning models only, and reports it in the x-br-reasoning-min-tokens response header. An unset max_tokens is left alone (the provider default is already reasoning-aware). The thinking trace, when a provider returns one, arrives in a separate reasoning_content field, not in content.

Live circuit state is public at GET /health, and per-model 7-day stats at GET /v1/models/:id/stats.

Saved fallback chains

Beyond per-request routing you can save a fallback chain for a model โ€” an ordered list of steps that replaces that model's default routing for your whole org. A step is { model, provider? } (a bare string is shorthand for { model }), and chains are cross-model first-class: "my own GPU โ†’ Krutrim โ†’ OpenRouter" is a valid chain. The same JSON shape is used everywhere โ€” REST, MCP, and the dashboard.

PUT /me/routing/llama-3.1-8b-instruct
{ "steps": [
    { "model": "llama-3.1-8b-instruct", "provider": "vllm" },
    { "model": "llama-3.1-8b-instruct", "provider": "krutrim" },
    { "model": "mistral/mistral-large-latest" }
] }
โ†’ { "ok": true, "model": "llama-3.1-8b-instruct", "steps": [ ... ] }   // effective within a minute
EndpointWhat it does
GET /me/routingList your org's saved chains.
GET /me/routing/:modelGet the chain for one model.
PUT /me/routing/:modelSave or replace the chain (owner/admin).
DELETE /me/routing/:modelRemove it โ€” routing returns to default (owner/admin).

A per-request fallbacks array (same step shape, on the chat/embeddings body) overrides the saved chain for that single call. Chains can be shared and reused as collections, and steps can point at your own registered endpoints. Agents manage chains over MCP with get_fallback_chains, set_fallback_chain and clear_fallback_chain.

Streaming & metering

For streamed requests the gateway injects stream_options.include_usage on providers that support it, and parses the usage block from the final SSE chunk โ€” so streamed and non-streamed requests are metered identically, and your credit debits always reflect real token counts.