--- library_name: transformers license: mit datasets: - ai4bharat/IndicVoices language: - hi - gu - mr base_model: - openai/whisper-large-v3 pipeline_tag: automatic-speech-recognition --- # Open-Sarika This is a speech recognition and translation model for Indian languages (Hindi, Gujarati, and Marathi). The model can transcribe speech in these languages and translate between them. This is an open-source implementation inspired by Sarvam AI's Sarika model. ## Model Details ### Model Description - **Model type:** Speech Recognition and Translation (based on Whisper architecture) - **Language(s):** Hindi (hi), Gujarati (gu), Marathi (mr) - **License:** MIT - **Base Model:** openai/whisper-large-v3 ## Uses ### Direct Use The model can be used for: 1. Transcribing speech in Hindi, Gujarati, and Marathi 2. Translating speech between these languages Here's a simple example to get started: ```python from transformers import WhisperProcessor, WhisperForConditionalGeneration import torch import librosa model_id = "theharshithh/open-sarika-v1" device = "cuda" if torch.cuda.is_available() else "cpu" # Load model and processor processor = WhisperProcessor.from_pretrained(model_id) model = WhisperForConditionalGeneration.from_pretrained(model_id).to(device) model.config.forced_decoder_ids = None # Load and process audio audio_path = "your_audio.wav" audio, rate = librosa.load(audio_path, sr=16000) # Generate transcription inputs = processor(audio, sampling_rate=16000, return_tensors="pt").to(device) with torch.no_grad(): output_ids = model.generate(**inputs) transcription = processor.batch_decode(output_ids, skip_special_tokens=True)[0] ``` ### Training Data The model was trained on a variety of datasets, including: - Project Vaani dataset: A large-scale Indian language collection project by the Indian Institute of Science (IISc) in collaboration with ARTPARK, funded by Google - High-quality speech recordings in Hindi, Gujarati, and Marathi from AI4Bharat - Real-world speech data from various sources ### Hardware Requirements - Minimum RAM: 8GB - GPU: Recommended for faster inference - Storage: Model size is approximately 1.5GB ## Model Card Contact For issues and feedback, please create an issue on the model's repository: https://huggingface.co/theharshithh/open-sarika-v1 ## Github Github Repo: https://github.com/theharshithh/open-sarika