| --- |
| language: |
| - lt |
| tags: |
| - whisper |
| - faster-whisper |
| - ctranslate2 |
| - automatic-speech-recognition |
| - lithuanian |
| - medical |
| license: apache-2.0 |
| base_model: openai/whisper-large-v3 |
| library_name: ctranslate2 |
| pipeline_tag: automatic-speech-recognition |
| --- |
| |
| [English](#english) | [Lietuvių](#lietuvių) |
|
|
| # English |
|
|
| ## LT_AI_Medical — Lithuanian Medical Whisper |
|
|
| A fine-tuned [Whisper large-v3](https://huggingface.co/openai/whisper-large-v3) model for Lithuanian medical |
| speech-to-text transcription, optimized for [faster-whisper](https://github.com/SYSTRAN/faster-whisper) (CTranslate2 |
| format). |
|
|
| ## Model Details |
|
|
| - **Base model:** [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) |
| - **Fine-tuning method:** LoRA (Low-Rank Adaptation) adapters, merged into the base model |
| - **Format:** CTranslate2 (float16 quantization) |
| - **Language:** Lithuanian (`lt`) |
| - **Domain:** Medical dictation (radiology, family medicine) |
| - **Dataset:** [VSSA-SDSA/LT_Medical_S_corpus](https://huggingface.co/datasets/VSSA-SDSA/LT_Medical_S_corpus) |
| - **Code repository:** [github.com/VSSA-AtvirasKodas-LT/LT_AI_Medical](https://github.com/VSSA-AtvirasKodas-LT/LT_AI_Medical) |
|
|
| ## Usage |
|
|
| ### With faster-whisper |
|
|
| ```python |
| from faster_whisper import WhisperModel |
| |
| model = WhisperModel("VSSA-SDSA/LT_AI_Medical", device="cuda", compute_type="float16") |
| |
| segments, _ = model.transcribe( |
| "audio.wav", |
| language="lt", |
| beam_size=5, |
| vad_filter=True, |
| ) |
| |
| text = " ".join(segment.text.strip() for segment in segments) |
| print(text) |
| ``` |
|
|
| ### With the provided transcribe.py script |
|
|
| ```bash |
| git clone https://github.com/VSSA-AtvirasKodas-LT/LT_AI_Medical |
| cd LT_AI_Medical |
| pip install -r requirements.txt |
| python transcribe.py sample.wav |
| ``` |
|
|
| ## Intended Use |
|
|
| This model is designed for transcribing Lithuanian medical speech, particularly: |
|
|
| - Radiology reports |
| - Family medicine consultations |
|
|
| ## Limitations |
|
|
| - Trained primarily on medical vocabulary — may not perform as well on general Lithuanian speech |
| - Performance may degrade on accents or dialects outside the training distribution |
| - Audio longer than 30 seconds may produce hallucinations without proper VAD filtering |
|
|
| ## License |
|
|
| This model is released under the Apache 2.0 license, inheriting from the base Whisper large-v3 license. |
|
|
| --- |
|
|
| # Lietuvių |
|
|
| ## LT_AI_Medical — Lietuviškas medicininis Whisper modelis |
|
|
| Apmokytas [Whisper large-v3](https://huggingface.co/openai/whisper-large-v3) modelis lietuvių kalbos |
| medicininio kalbinio teksto atpažinimui, optimizuotas [faster-whisper](https://github.com/SYSTRAN/faster-whisper) |
| bibliotekai (CTranslate2 formatas). |
|
|
| ## Modelio informacija |
|
|
| - **Bazinis modelis:** [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) |
| - **Apmokymo metodas:** LoRA (Low-Rank Adaptation) adapteriai, sujungti su baziniu modeliu |
| - **Formatas:** CTranslate2 (float16 kvantizacija) |
| - **Kalba:** Lietuvių (`lt`) |
| - **Sritis:** Medicininė diktatūra (radiologija, šeimos medicina) |
| - **Duomenų rinkinys:** [VSSA-SDSA/LT_Medical_S_corpus](https://huggingface.co/datasets/VSSA-SDSA/LT_Medical_S_corpus) |
| - **Kodo saugykla:** [github.com/VSSA-AtvirasKodas-LT/LT_AI_Medical](https://github.com/VSSA-AtvirasKodas-LT/LT_AI_Medical) |
|
|
| ## Naudojimas |
|
|
| ### Su faster-whisper |
|
|
| ```python |
| from faster_whisper import WhisperModel |
| |
| model = WhisperModel("VSSA-SDSA/LT_AI_Medical", device="cuda", compute_type="float16") |
| |
| segments, _ = model.transcribe( |
| "audio.wav", |
| language="lt", |
| beam_size=5, |
| vad_filter=True, |
| ) |
| |
| text = " ".join(segment.text.strip() for segment in segments) |
| print(text) |
| ``` |
|
|
| ### Su pridėtu transcribe.py skriptu |
|
|
| ```bash |
| git clone https://github.com/VSSA-AtvirasKodas-LT/LT_AI_Medical |
| cd LT_AI_Medical |
| pip install -r requirements.txt |
| python transcribe.py sample.wav |
| ``` |
|
|
| ## Paskirtis |
|
|
| Šis modelis skirtas lietuvių kalbos medicininio kalbinio teksto transkribavimui, ypač: |
|
|
| - Radiologijos ataskaitoms |
| - Šeimos medicinos konsultacijoms |
|
|
| ## Apribojimai |
|
|
| - Apmokytas daugiausia su medicininiu žodynu — gali prasčiau atpažinti bendros lietuvių kalbos tekstą |
| - Našumas gali pablogėti su akcentais ar tarmėmis, nepatenkančiomis į apmokymo duomenis |
| - Ilgesnis nei 30 sekundžių garsas gali sukelti haliucinacijas be tinkamo VAD filtravimo |
|
|
| ## Licencija |
|
|
| Šis modelis išleistas pagal Apache 2.0 licenciją, paveldėtą iš bazinio Whisper large-v3 modelio licencijos. |
|
|