LT_AI_Medical / README.md
Zygisluk's picture
Update README.md
de3999c verified
---
language:
- lt
tags:
- whisper
- faster-whisper
- ctranslate2
- automatic-speech-recognition
- lithuanian
- medical
license: apache-2.0
base_model: openai/whisper-large-v3
library_name: ctranslate2
pipeline_tag: automatic-speech-recognition
---
[English](#english) | [Lietuvių](#lietuvių)
# English
## LT_AI_Medical — Lithuanian Medical Whisper
A fine-tuned [Whisper large-v3](https://huggingface.co/openai/whisper-large-v3) model for Lithuanian medical
speech-to-text transcription, optimized for [faster-whisper](https://github.com/SYSTRAN/faster-whisper) (CTranslate2
format).
## Model Details
- **Base model:** [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3)
- **Fine-tuning method:** LoRA (Low-Rank Adaptation) adapters, merged into the base model
- **Format:** CTranslate2 (float16 quantization)
- **Language:** Lithuanian (`lt`)
- **Domain:** Medical dictation (radiology, family medicine)
- **Dataset:** [VSSA-SDSA/LT_Medical_S_corpus](https://huggingface.co/datasets/VSSA-SDSA/LT_Medical_S_corpus)
- **Code repository:** [github.com/VSSA-AtvirasKodas-LT/LT_AI_Medical](https://github.com/VSSA-AtvirasKodas-LT/LT_AI_Medical)
## Usage
### With faster-whisper
```python
from faster_whisper import WhisperModel
model = WhisperModel("VSSA-SDSA/LT_AI_Medical", device="cuda", compute_type="float16")
segments, _ = model.transcribe(
"audio.wav",
language="lt",
beam_size=5,
vad_filter=True,
)
text = " ".join(segment.text.strip() for segment in segments)
print(text)
```
### With the provided transcribe.py script
```bash
git clone https://github.com/VSSA-AtvirasKodas-LT/LT_AI_Medical
cd LT_AI_Medical
pip install -r requirements.txt
python transcribe.py sample.wav
```
## Intended Use
This model is designed for transcribing Lithuanian medical speech, particularly:
- Radiology reports
- Family medicine consultations
## Limitations
- Trained primarily on medical vocabulary — may not perform as well on general Lithuanian speech
- Performance may degrade on accents or dialects outside the training distribution
- Audio longer than 30 seconds may produce hallucinations without proper VAD filtering
## License
This model is released under the Apache 2.0 license, inheriting from the base Whisper large-v3 license.
---
# Lietuvių
## LT_AI_Medical — Lietuviškas medicininis Whisper modelis
Apmokytas [Whisper large-v3](https://huggingface.co/openai/whisper-large-v3) modelis lietuvių kalbos
medicininio kalbinio teksto atpažinimui, optimizuotas [faster-whisper](https://github.com/SYSTRAN/faster-whisper)
bibliotekai (CTranslate2 formatas).
## Modelio informacija
- **Bazinis modelis:** [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3)
- **Apmokymo metodas:** LoRA (Low-Rank Adaptation) adapteriai, sujungti su baziniu modeliu
- **Formatas:** CTranslate2 (float16 kvantizacija)
- **Kalba:** Lietuvių (`lt`)
- **Sritis:** Medicininė diktatūra (radiologija, šeimos medicina)
- **Duomenų rinkinys:** [VSSA-SDSA/LT_Medical_S_corpus](https://huggingface.co/datasets/VSSA-SDSA/LT_Medical_S_corpus)
- **Kodo saugykla:** [github.com/VSSA-AtvirasKodas-LT/LT_AI_Medical](https://github.com/VSSA-AtvirasKodas-LT/LT_AI_Medical)
## Naudojimas
### Su faster-whisper
```python
from faster_whisper import WhisperModel
model = WhisperModel("VSSA-SDSA/LT_AI_Medical", device="cuda", compute_type="float16")
segments, _ = model.transcribe(
"audio.wav",
language="lt",
beam_size=5,
vad_filter=True,
)
text = " ".join(segment.text.strip() for segment in segments)
print(text)
```
### Su pridėtu transcribe.py skriptu
```bash
git clone https://github.com/VSSA-AtvirasKodas-LT/LT_AI_Medical
cd LT_AI_Medical
pip install -r requirements.txt
python transcribe.py sample.wav
```
## Paskirtis
Šis modelis skirtas lietuvių kalbos medicininio kalbinio teksto transkribavimui, ypač:
- Radiologijos ataskaitoms
- Šeimos medicinos konsultacijoms
## Apribojimai
- Apmokytas daugiausia su medicininiu žodynu — gali prasčiau atpažinti bendros lietuvių kalbos tekstą
- Našumas gali pablogėti su akcentais ar tarmėmis, nepatenkančiomis į apmokymo duomenis
- Ilgesnis nei 30 sekundžių garsas gali sukelti haliucinacijas be tinkamo VAD filtravimo
## Licencija
Šis modelis išleistas pagal Apache 2.0 licenciją, paveldėtą iš bazinio Whisper large-v3 modelio licencijos.