--- language: - lt tags: - whisper - faster-whisper - ctranslate2 - automatic-speech-recognition - lithuanian - medical license: apache-2.0 base_model: openai/whisper-large-v3 library_name: ctranslate2 pipeline_tag: automatic-speech-recognition --- [English](#english) | [Lietuvių](#lietuvių) # English ## LT_AI_Medical — Lithuanian Medical Whisper A fine-tuned [Whisper large-v3](https://huggingface.co/openai/whisper-large-v3) model for Lithuanian medical speech-to-text transcription, optimized for [faster-whisper](https://github.com/SYSTRAN/faster-whisper) (CTranslate2 format). ## Model Details - **Base model:** [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) - **Fine-tuning method:** LoRA (Low-Rank Adaptation) adapters, merged into the base model - **Format:** CTranslate2 (float16 quantization) - **Language:** Lithuanian (`lt`) - **Domain:** Medical dictation (radiology, family medicine) - **Dataset:** [VSSA-SDSA/LT_Medical_S_corpus](https://huggingface.co/datasets/VSSA-SDSA/LT_Medical_S_corpus) - **Code repository:** [github.com/VSSA-AtvirasKodas-LT/LT_AI_Medical](https://github.com/VSSA-AtvirasKodas-LT/LT_AI_Medical) ## Usage ### With faster-whisper ```python from faster_whisper import WhisperModel model = WhisperModel("VSSA-SDSA/LT_AI_Medical", device="cuda", compute_type="float16") segments, _ = model.transcribe( "audio.wav", language="lt", beam_size=5, vad_filter=True, ) text = " ".join(segment.text.strip() for segment in segments) print(text) ``` ### With the provided transcribe.py script ```bash git clone https://github.com/VSSA-AtvirasKodas-LT/LT_AI_Medical cd LT_AI_Medical pip install -r requirements.txt python transcribe.py sample.wav ``` ## Intended Use This model is designed for transcribing Lithuanian medical speech, particularly: - Radiology reports - Family medicine consultations ## Limitations - Trained primarily on medical vocabulary — may not perform as well on general Lithuanian speech - Performance may degrade on accents or dialects outside the training distribution - Audio longer than 30 seconds may produce hallucinations without proper VAD filtering ## License This model is released under the Apache 2.0 license, inheriting from the base Whisper large-v3 license. --- # Lietuvių ## LT_AI_Medical — Lietuviškas medicininis Whisper modelis Apmokytas [Whisper large-v3](https://huggingface.co/openai/whisper-large-v3) modelis lietuvių kalbos medicininio kalbinio teksto atpažinimui, optimizuotas [faster-whisper](https://github.com/SYSTRAN/faster-whisper) bibliotekai (CTranslate2 formatas). ## Modelio informacija - **Bazinis modelis:** [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) - **Apmokymo metodas:** LoRA (Low-Rank Adaptation) adapteriai, sujungti su baziniu modeliu - **Formatas:** CTranslate2 (float16 kvantizacija) - **Kalba:** Lietuvių (`lt`) - **Sritis:** Medicininė diktatūra (radiologija, šeimos medicina) - **Duomenų rinkinys:** [VSSA-SDSA/LT_Medical_S_corpus](https://huggingface.co/datasets/VSSA-SDSA/LT_Medical_S_corpus) - **Kodo saugykla:** [github.com/VSSA-AtvirasKodas-LT/LT_AI_Medical](https://github.com/VSSA-AtvirasKodas-LT/LT_AI_Medical) ## Naudojimas ### Su faster-whisper ```python from faster_whisper import WhisperModel model = WhisperModel("VSSA-SDSA/LT_AI_Medical", device="cuda", compute_type="float16") segments, _ = model.transcribe( "audio.wav", language="lt", beam_size=5, vad_filter=True, ) text = " ".join(segment.text.strip() for segment in segments) print(text) ``` ### Su pridėtu transcribe.py skriptu ```bash git clone https://github.com/VSSA-AtvirasKodas-LT/LT_AI_Medical cd LT_AI_Medical pip install -r requirements.txt python transcribe.py sample.wav ``` ## Paskirtis Šis modelis skirtas lietuvių kalbos medicininio kalbinio teksto transkribavimui, ypač: - Radiologijos ataskaitoms - Šeimos medicinos konsultacijoms ## Apribojimai - Apmokytas daugiausia su medicininiu žodynu — gali prasčiau atpažinti bendros lietuvių kalbos tekstą - Našumas gali pablogėti su akcentais ar tarmėmis, nepatenkančiomis į apmokymo duomenis - Ilgesnis nei 30 sekundžių garsas gali sukelti haliucinacijas be tinkamo VAD filtravimo ## Licencija Šis modelis išleistas pagal Apache 2.0 licenciją, paveldėtą iš bazinio Whisper large-v3 modelio licencijos.