File size: 4,417 Bytes
babee6f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
de3999c
babee6f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
de3999c
babee6f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
---
language:
  - lt
tags:
  - whisper
  - faster-whisper
  - ctranslate2
  - automatic-speech-recognition
  - lithuanian
  - medical
license: apache-2.0
base_model: openai/whisper-large-v3
library_name: ctranslate2
pipeline_tag: automatic-speech-recognition
---

[English](#english) | [Lietuvių](#lietuvių)

# English

## LT_AI_Medical — Lithuanian Medical Whisper

A fine-tuned [Whisper large-v3](https://huggingface.co/openai/whisper-large-v3) model for Lithuanian medical
speech-to-text transcription, optimized for [faster-whisper](https://github.com/SYSTRAN/faster-whisper) (CTranslate2
format).

## Model Details

- **Base model:** [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3)
- **Fine-tuning method:** LoRA (Low-Rank Adaptation) adapters, merged into the base model
- **Format:** CTranslate2 (float16 quantization)
- **Language:** Lithuanian (`lt`)
- **Domain:** Medical dictation (radiology, family medicine)
- **Dataset:** [VSSA-SDSA/LT_Medical_S_corpus](https://huggingface.co/datasets/VSSA-SDSA/LT_Medical_S_corpus)
- **Code repository:** [github.com/VSSA-AtvirasKodas-LT/LT_AI_Medical](https://github.com/VSSA-AtvirasKodas-LT/LT_AI_Medical)

## Usage

### With faster-whisper

```python
from faster_whisper import WhisperModel

model = WhisperModel("VSSA-SDSA/LT_AI_Medical", device="cuda", compute_type="float16")

segments, _ = model.transcribe(
    "audio.wav",
    language="lt",
    beam_size=5,
    vad_filter=True,
)

text = " ".join(segment.text.strip() for segment in segments)
print(text)
```

### With the provided transcribe.py script

```bash
git clone https://github.com/VSSA-AtvirasKodas-LT/LT_AI_Medical
cd LT_AI_Medical
pip install -r requirements.txt
python transcribe.py sample.wav
```

## Intended Use

This model is designed for transcribing Lithuanian medical speech, particularly:

- Radiology reports
- Family medicine consultations

## Limitations

- Trained primarily on medical vocabulary — may not perform as well on general Lithuanian speech
- Performance may degrade on accents or dialects outside the training distribution
- Audio longer than 30 seconds may produce hallucinations without proper VAD filtering

## License

This model is released under the Apache 2.0 license, inheriting from the base Whisper large-v3 license.

---

# Lietuvių

## LT_AI_Medical — Lietuviškas medicininis Whisper modelis

Apmokytas [Whisper large-v3](https://huggingface.co/openai/whisper-large-v3) modelis lietuvių kalbos
medicininio kalbinio teksto atpažinimui, optimizuotas [faster-whisper](https://github.com/SYSTRAN/faster-whisper)
bibliotekai (CTranslate2 formatas).

## Modelio informacija

- **Bazinis modelis:** [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3)
- **Apmokymo metodas:** LoRA (Low-Rank Adaptation) adapteriai, sujungti su baziniu modeliu
- **Formatas:** CTranslate2 (float16 kvantizacija)
- **Kalba:** Lietuvių (`lt`)
- **Sritis:** Medicininė diktatūra (radiologija, šeimos medicina)
- **Duomenų rinkinys:** [VSSA-SDSA/LT_Medical_S_corpus](https://huggingface.co/datasets/VSSA-SDSA/LT_Medical_S_corpus)
- **Kodo saugykla:** [github.com/VSSA-AtvirasKodas-LT/LT_AI_Medical](https://github.com/VSSA-AtvirasKodas-LT/LT_AI_Medical)

## Naudojimas

### Su faster-whisper

```python
from faster_whisper import WhisperModel

model = WhisperModel("VSSA-SDSA/LT_AI_Medical", device="cuda", compute_type="float16")

segments, _ = model.transcribe(
    "audio.wav",
    language="lt",
    beam_size=5,
    vad_filter=True,
)

text = " ".join(segment.text.strip() for segment in segments)
print(text)
```

### Su pridėtu transcribe.py skriptu

```bash
git clone https://github.com/VSSA-AtvirasKodas-LT/LT_AI_Medical
cd LT_AI_Medical
pip install -r requirements.txt
python transcribe.py sample.wav
```

## Paskirtis

Šis modelis skirtas lietuvių kalbos medicininio kalbinio teksto transkribavimui, ypač:

- Radiologijos ataskaitoms
- Šeimos medicinos konsultacijoms

## Apribojimai

- Apmokytas daugiausia su medicininiu žodynu — gali prasčiau atpažinti bendros lietuvių kalbos tekstą
- Našumas gali pablogėti su akcentais ar tarmėmis, nepatenkančiomis į apmokymo duomenis
- Ilgesnis nei 30 sekundžių garsas gali sukelti haliucinacijas be tinkamo VAD filtravimo

## Licencija

Šis modelis išleistas pagal Apache 2.0 licenciją, paveldėtą iš bazinio Whisper large-v3 modelio licencijos.