File size: 4,417 Bytes
babee6f de3999c babee6f de3999c babee6f | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 | ---
language:
- lt
tags:
- whisper
- faster-whisper
- ctranslate2
- automatic-speech-recognition
- lithuanian
- medical
license: apache-2.0
base_model: openai/whisper-large-v3
library_name: ctranslate2
pipeline_tag: automatic-speech-recognition
---
[English](#english) | [Lietuvių](#lietuvių)
# English
## LT_AI_Medical — Lithuanian Medical Whisper
A fine-tuned [Whisper large-v3](https://huggingface.co/openai/whisper-large-v3) model for Lithuanian medical
speech-to-text transcription, optimized for [faster-whisper](https://github.com/SYSTRAN/faster-whisper) (CTranslate2
format).
## Model Details
- **Base model:** [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3)
- **Fine-tuning method:** LoRA (Low-Rank Adaptation) adapters, merged into the base model
- **Format:** CTranslate2 (float16 quantization)
- **Language:** Lithuanian (`lt`)
- **Domain:** Medical dictation (radiology, family medicine)
- **Dataset:** [VSSA-SDSA/LT_Medical_S_corpus](https://huggingface.co/datasets/VSSA-SDSA/LT_Medical_S_corpus)
- **Code repository:** [github.com/VSSA-AtvirasKodas-LT/LT_AI_Medical](https://github.com/VSSA-AtvirasKodas-LT/LT_AI_Medical)
## Usage
### With faster-whisper
```python
from faster_whisper import WhisperModel
model = WhisperModel("VSSA-SDSA/LT_AI_Medical", device="cuda", compute_type="float16")
segments, _ = model.transcribe(
"audio.wav",
language="lt",
beam_size=5,
vad_filter=True,
)
text = " ".join(segment.text.strip() for segment in segments)
print(text)
```
### With the provided transcribe.py script
```bash
git clone https://github.com/VSSA-AtvirasKodas-LT/LT_AI_Medical
cd LT_AI_Medical
pip install -r requirements.txt
python transcribe.py sample.wav
```
## Intended Use
This model is designed for transcribing Lithuanian medical speech, particularly:
- Radiology reports
- Family medicine consultations
## Limitations
- Trained primarily on medical vocabulary — may not perform as well on general Lithuanian speech
- Performance may degrade on accents or dialects outside the training distribution
- Audio longer than 30 seconds may produce hallucinations without proper VAD filtering
## License
This model is released under the Apache 2.0 license, inheriting from the base Whisper large-v3 license.
---
# Lietuvių
## LT_AI_Medical — Lietuviškas medicininis Whisper modelis
Apmokytas [Whisper large-v3](https://huggingface.co/openai/whisper-large-v3) modelis lietuvių kalbos
medicininio kalbinio teksto atpažinimui, optimizuotas [faster-whisper](https://github.com/SYSTRAN/faster-whisper)
bibliotekai (CTranslate2 formatas).
## Modelio informacija
- **Bazinis modelis:** [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3)
- **Apmokymo metodas:** LoRA (Low-Rank Adaptation) adapteriai, sujungti su baziniu modeliu
- **Formatas:** CTranslate2 (float16 kvantizacija)
- **Kalba:** Lietuvių (`lt`)
- **Sritis:** Medicininė diktatūra (radiologija, šeimos medicina)
- **Duomenų rinkinys:** [VSSA-SDSA/LT_Medical_S_corpus](https://huggingface.co/datasets/VSSA-SDSA/LT_Medical_S_corpus)
- **Kodo saugykla:** [github.com/VSSA-AtvirasKodas-LT/LT_AI_Medical](https://github.com/VSSA-AtvirasKodas-LT/LT_AI_Medical)
## Naudojimas
### Su faster-whisper
```python
from faster_whisper import WhisperModel
model = WhisperModel("VSSA-SDSA/LT_AI_Medical", device="cuda", compute_type="float16")
segments, _ = model.transcribe(
"audio.wav",
language="lt",
beam_size=5,
vad_filter=True,
)
text = " ".join(segment.text.strip() for segment in segments)
print(text)
```
### Su pridėtu transcribe.py skriptu
```bash
git clone https://github.com/VSSA-AtvirasKodas-LT/LT_AI_Medical
cd LT_AI_Medical
pip install -r requirements.txt
python transcribe.py sample.wav
```
## Paskirtis
Šis modelis skirtas lietuvių kalbos medicininio kalbinio teksto transkribavimui, ypač:
- Radiologijos ataskaitoms
- Šeimos medicinos konsultacijoms
## Apribojimai
- Apmokytas daugiausia su medicininiu žodynu — gali prasčiau atpažinti bendros lietuvių kalbos tekstą
- Našumas gali pablogėti su akcentais ar tarmėmis, nepatenkančiomis į apmokymo duomenis
- Ilgesnis nei 30 sekundžių garsas gali sukelti haliucinacijas be tinkamo VAD filtravimo
## Licencija
Šis modelis išleistas pagal Apache 2.0 licenciją, paveldėtą iš bazinio Whisper large-v3 modelio licencijos.
|