Esc to close ยท โK / Ctrl-K opens search anywhere
Most models in the catalog have more than one provider route.
On every request the gateway picks among the healthy, eligible routes โ by price unless
you say otherwise โ then fails over automatically if the chosen provider errors. You
steer this with four optional request fields, accepted on both
/v1/chat/completions and /v1/embeddings. They are BharatRouter
extensions: the router consumes them and strips them before anything is forwarded
upstream, so providers never see them.
| Field | Values | What it does |
|---|---|---|
optimize | price (default) ยท latency ยท uptime ยท auto | Route-selection preference among eligible providers. price ranks by input + output โน/Mtok summed (not input alone). latency uses a moving average of observed latency per route; uptime sorts by observed failure rate; auto blends all three (reliability first, then a latency/price trade-off). Note: whatever the mode, routes whose circuit is open are always tried last โ so the catalog's cheapest route can be skipped when it's currently unhealthy, which is why a price request may not land on the single cheapest provider. |
provider | a provider id, e.g. krutrim, sarvam, vllm | Pin one provider and skip dynamic routing entirely. Provider ids are listed at GET /v1/providers. |
data_policy | india_only | Only India-resident routes are eligible. If none exists for the model, the request fails with no_route โ it never silently leaves India. |
upstream_key | your provider API key | Per-request BYOK: the call runs on your key and your provider billing. Never stored or logged. See BYOK. |
The route actually used is reported in the x-br-provider response header
on every reply, streamed or not.
from openai import OpenAI
client = OpenAI(base_url="https://api.bharatrouter.com/v1", api_key="br-...")
r = client.chat.completions.create(
model="qwen2.5-7b-instruct",
messages=[{"role": "user", "content": "Summarise this complaint in Hindi: ..."}],
extra_body={"optimize": "price", "data_policy": "india_only"},
) curl https://api.bharatrouter.com/v1/chat/completions \
-H "Authorization: Bearer br-..." -H "Content-Type: application/json" \
-d '{
"model": "krutrim-2",
"provider": "krutrim",
"messages": [{"role": "user", "content": "namaste"}]
}' Every route in the catalog carries a residency tag. data_policy: "india_only"
filters the candidate set to India-resident routes before selection and failover,
so the guarantee holds even mid-failover: a request can fail with
no_route (HTTP 400) but it cannot be served from outside India. This is the
enforcement point for DPDP-sensitive workloads โ put it in the request, not in a policy
document.
If the selected provider fails, the gateway retries the remaining eligible routes
in preference order before giving up with all_routes_failed (HTTP 502). Health
is tracked per route:
400 malformed request, 404, 422) is your
request's problem, not the route's โ it is returned to you as-is, unchanged,
and the chain stops. A bad request won't be silently retried against a different
provider (which would just fail the same way and cost you latency).optimize: latency and optimize: uptime modes.These are distinct on purpose โ they tell you where the request stopped:
no_route (HTTP 400) โ pre-flight: no eligible
route could even be resolved, so nothing was dialed. Causes: a model with no configured
provider, data_policy: india_only with no India route, a pinned provider
that doesn't serve the model, or a chain whose every step is unresolvable.all_routes_failed (HTTP 502) โ runtime: routes
existed and were tried, but every one failed (per the failure rules above).model_not_found (HTTP 404) โ the model id isn't in the
catalog and isn't a discoverable provider/model BYOK id.max_tokensReasoning models (tagged reasoning in the catalog โ e.g.
gpt-oss-120b, the qwen3 family, gpt-5) spend part of the completion budget on
hidden reasoning before the visible answer. A very small max_tokens
can therefore be consumed entirely by reasoning, leaving content empty. To
avoid that footgun, the gateway raises a too-small max_tokens to a floor of
512 for reasoning models only, and reports it in the
x-br-reasoning-min-tokens response header. An unset max_tokens is
left alone (the provider default is already reasoning-aware). The thinking trace, when a
provider returns one, arrives in a separate reasoning_content field, not in
content.
Live circuit state is public at GET /health, and per-model 7-day stats at
GET /v1/models/:id/stats.
Beyond per-request routing you can save a fallback chain for a model โ
an ordered list of steps that replaces that model's default routing for your whole org. A
step is { model, provider? } (a bare string is shorthand for
{ model }), and chains are cross-model first-class: "my own GPU โ
Krutrim โ OpenRouter" is a valid chain. The same JSON shape is used everywhere โ REST, MCP,
and the dashboard.
model is a catalog id or a
provider/model-id BYOK id.provider is a catalog provider id (see GET /v1/providers) or a
byoe:<slug> custom endpoint.PUT /me/routing/llama-3.1-8b-instruct
{ "steps": [
{ "model": "llama-3.1-8b-instruct", "provider": "vllm" },
{ "model": "llama-3.1-8b-instruct", "provider": "krutrim" },
{ "model": "mistral/mistral-large-latest" }
] }
โ { "ok": true, "model": "llama-3.1-8b-instruct", "steps": [ ... ] } // effective within a minute | Endpoint | What it does |
|---|---|
GET /me/routing | List your org's saved chains. |
GET /me/routing/:model | Get the chain for one model. |
PUT /me/routing/:model | Save or replace the chain (owner/admin). |
DELETE /me/routing/:model | Remove it โ routing returns to default (owner/admin). |
A per-request fallbacks array (same step shape, on the chat/embeddings body)
overrides the saved chain for that single call. Chains can be shared and reused as
collections, and steps can point at your own
registered endpoints. Agents manage chains over
MCP with get_fallback_chains,
set_fallback_chain and clear_fallback_chain.
For streamed requests the gateway injects stream_options.include_usage
on providers that support it, and parses the usage block from the final SSE chunk โ so
streamed and non-streamed requests are metered identically, and your
credit debits always reflect real token counts.