Gnani Vachana — Indian-Language Speech (STT + TTS)

Vachana is Gnani's hosted speech platform built for Indian languages. It covers speech-to-text and text-to-speech across 10 Indian languages plus Hinglish code-switching, with inference served from Gnani's servers. No model weights are shipped — you authenticate with an API key and call the pipeline.

Vachana is trained on over 14 million hours of Indic speech data, the largest training corpus for Indian languages in production today. The models are built for conditions that generic ASR systems handle poorly: telephony-grade audio (8 kHz, PSTN), noisy field environments, regional accents across tier-2 and rural India, and natural code-switching between Hindi and English in the same utterance.

Supported interaction patterns include REST (file-based) and real-time WebSocket streaming for both STT and TTS, with auto language detection across all 10 supported languages.

Performance

Vachana STT delivers 10–20 % lower Word Error Rate compared to leading alternatives on Indic language benchmarks, with the gap widening on noisy audio — call-center recordings, field environments, and telephony captures where background noise and channel distortion are typical.

Metric	Value
STT latency	P95 < 200 ms (streaming)
TTS naturalness	MOS 4.23
Languages	10 Indian languages + Hinglish code-mixed and Latin-script variants
Audio input	Optimized for both broadband and 8 kHz telephony

Single repo · Hosted inference · No weights shipped

This is the Hugging Face integration for Gnani's Vachana speech platform. All inference happens on Gnani's servers — no model weights, tokenizers, or processor files are included. You just need an API key.

Get Your API Key

Sign up at app.vachana.ai
Or email speechstack@gnani.ai

Installation

pip install gnani-vachana transformers

STT — Speech-to-Text

REST (file-based)

import os
os.environ["GNANI_API_KEY"] = "your-api-key"
os.environ["GNANI_ORGANIZATION_ID"] = "your-org-id"
os.environ["GNANI_USER_ID"] = "your-user-id"

from transformers import pipeline

pipe = pipeline(
    "automatic-speech-recognition",
    model="gnani-ai/vachana",
    trust_remote_code=True,
)
result = pipe("audio.wav", language_code="hi-IN")
print(result["text"])

Realtime (WebSocket streaming)

import os
os.environ["GNANI_API_KEY"] = "your-api-key"

from transformers import pipeline

pipe = pipeline(
    "automatic-speech-recognition",
    model="gnani-ai/vachana",
    trust_remote_code=True,
)
result = pipe("audio.wav", language_code="hi-IN", use_streaming=True)
print(result["text"])

Note: Realtime STT only requires GNANI_API_KEY.

TTS — Text-to-Speech

REST (one-shot)

import os
os.environ["GNANI_API_KEY"] = "your-api-key"

from transformers import pipeline

pipe = pipeline(
    "text-to-speech",
    model="gnani-ai/vachana",
    trust_remote_code=True,
)
result = pipe("नमस्ते, आप कैसे हैं?", voice="Simran")
# result["audio"]          → bytes (WAV)
# result["sampling_rate"]  → int

with open("output.wav", "wb") as f:
    f.write(result["audio"])

Realtime (WebSocket streaming)

result = pipe(
    "Hello, how are you?",
    voice="Karan",
    use_streaming=True,
    sample_rate=22050,
    container="wav",
)
with open("output.wav", "wb") as f:
    f.write(result["audio"])

Supported Languages

Vachana supports 10 Indian languages.

STT languages: Supported STT Languages
TTS languages: Supported TTS Languages

TTS Voices

Voice ID	Gender	Description
`Karan`	Male	Bold, Trustworthy
`Simran`	Female	Confident, Bright
`Nara`	Female	Gentle, Expressive
`Riya`	Female	Cheerful, Energetic
`Viraj`	Male	Commanding, Dynamic
`Raju`	Male	Grounded, Conversational

Environment Variables

Variable	Required For	Description
`GNANI_API_KEY`	All endpoints	Your Vachana API key
`GNANI_ORGANIZATION_ID`	STT REST only	Your organisation ID
`GNANI_USER_ID`	STT REST only	Your user ID

Intended Use

Vachana is built for production speech applications in Indian language contexts. Primary use cases:

Contact center and IVR automation: optimized for telephony-grade audio (8 kHz, PSTN/VoIP), the dominant deployment environment for Indian enterprise voice
Conversational AI and voice agents: real-time streaming STT with Hinglish code-switching support for consumer-facing bots where speakers mix Hindi and English naturally mid-sentence
Field and mobile applications: robust to ambient noise, low-quality microphones, and regional accent variation across tier-2 and rural India
Multilingual transcription pipelines: batch or streaming transcription for content, compliance, or analytics workflows across 10 Indian languages
TTS for voice interfaces: natural-sounding synthesis for IVR prompts, notification audio, and agent response generation in Indian languages

Vachana performs well on audio that typically degrades generic ASR: noisy environments, narrow-band telephony, accented regional speech, and code-mixed utterances. These are supported use cases, not edge cases.

Out-of-Scope Use

Languages outside the supported 10 Indian languages and Hinglish variants
High-accuracy transcription of non-Indian English accents (use en-IN for Indian English specifically)
Offline or on-device inference: all inference runs on Gnani's hosted infrastructure and requires an active API key and network connectivity
Applications requiring model fine-tuning, weight access, or custom vocabulary injection at the architecture level: Vachana is a hosted API, not an open model
Medical, legal, or safety-critical transcription without human review — as with any ASR system, outputs should be validated before use in high-stakes decisions

Downloads last month: -

gnani-ai
/

vachana

Gnani Vachana — Indian-Language Speech (STT + TTS)

Performance

Get Your API Key

Installation

STT — Speech-to-Text

REST (file-based)

Realtime (WebSocket streaming)

TTS — Text-to-Speech

REST (one-shot)

Realtime (WebSocket streaming)

Supported Languages

TTS Voices

Environment Variables

Links

Intended Use

Out-of-Scope Use