🎬 New — watch the 2-minute guide videos →

Audio — speech

BharatRouter serves audio the same way it serves chat: one API key, the OpenAI wire format, INR pricing. Both audio models run on India-resident self-hosted infrastructure — transcription with Whisper, synthesis with Kokoro (Hindi-capable voices). The route that served the request is echoed in the x-br-provider response header.

Speech to text

POST /v1/audio/transcriptions — OpenAI-compatible multipart upload. Model whisper-large-v3. Audio file up to 25 MB (mp3/wav/m4a/webm and the usual formats). Billed per minute of audio at per-second granularity (1 second minimum).

curl https://api.bharatrouter.com/v1/audio/transcriptions \
  -H "Authorization: Bearer br-..." \
  -F file=@call.mp3 \
  -F model=whisper-large-v3 \
  -F language=hi \
  -F response_format=json

Fields: file and model are required; language, prompt, temperature and response_format are optional. response_format controls the reply shape:

response_formatReply
json (default){ "text": "..." }
textPlain text body (text/plain).
verbose_jsonFull object with text, language, duration and segments.
from openai import OpenAI
client = OpenAI(base_url="https://api.bharatrouter.com/v1", api_key="br-...")

with open("call.mp3", "rb") as f:
    r = client.audio.transcriptions.create(model="whisper-large-v3", file=f, language="hi")
print(r.text)

Text to speech

POST /v1/audio/speech — JSON body. Model kokoro-82m. The default voice hf_alpha is Hindi-capable (other voices include hf_beta, hm_omega, hm_psi). Billed per 1 million input characters. The reply is raw audio bytes; response_format selects the container (mp3 default, plus wav, opus, flac).

curl https://api.bharatrouter.com/v1/audio/speech \
  -H "Authorization: Bearer br-..." -H "Content-Type: application/json" \
  -d '{
    "model": "kokoro-82m",
    "input": "नमस्ते, आपका स्वागत है।",
    "voice": "hf_alpha",
    "response_format": "mp3"
  }' --output hello.mp3

Notes