Audio — speech

BharatRouter serves audio the same way it serves chat: one API key, the OpenAI wire format, INR pricing. Both audio models run on India-resident self-hosted infrastructure — transcription with Whisper, synthesis with Kokoro (Hindi-capable voices). The route that served the request is echoed in the x-br-provider response header.

Speech to text

POST /v1/audio/transcriptions — OpenAI-compatible multipart upload. Model whisper-large-v3. Audio file up to 25 MB (mp3/wav/m4a/webm and the usual formats). Billed per minute of audio at per-second granularity (1 second minimum).

curl https://api.bharatrouter.com/v1/audio/transcriptions \
  -H "Authorization: Bearer br-..." \
  -F file=@call.mp3 \
  -F model=whisper-large-v3 \
  -F language=hi \
  -F response_format=json

Fields: file and model are required; language, prompt, temperature and response_format are optional. response_format controls the reply shape:

response_format	Reply
`json` (default)	`{ "text": "..." }`
`text`	Plain text body (`text/plain`).
`verbose_json`	Full object with `text`, `language`, `duration` and segments.

from openai import OpenAI
client = OpenAI(base_url="https://api.bharatrouter.com/v1", api_key="br-...")

with open("call.mp3", "rb") as f:
    r = client.audio.transcriptions.create(model="whisper-large-v3", file=f, language="hi")
print(r.text)

Text to speech

POST /v1/audio/speech — JSON body. Model kokoro-82m. The default voice hf_alpha is Hindi-capable (other voices include hf_beta, hm_omega, hm_psi). Billed per 1 million input characters. The reply is raw audio bytes; response_format selects the container (mp3 default, plus wav, opus, flac).

curl https://api.bharatrouter.com/v1/audio/speech \
  -H "Authorization: Bearer br-..." -H "Content-Type: application/json" \
  -d '{
    "model": "kokoro-82m",
    "input": "नमस्ते, आपका स्वागत है।",
    "voice": "hf_alpha",
    "response_format": "mp3"
  }' --output hello.mp3

Notes

Audio models are single-route and self-hosted, so they proxy directly — there is no multi-provider failover for audio, but the same metering and credit debits apply (trial keys stay free).
Pricing per model is on the catalog: ASR is priced in ₹/minute, TTS in ₹ per 1M characters.
Both endpoints take a standard API key — see API keys & limits.