Initial 4bit quantized release (mlx-whisper compatible)
Browse files- README.md +112 -0
- config.json +17 -0
- weights.safetensors +3 -0
README.md
ADDED
|
@@ -0,0 +1,112 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
language:
|
| 4 |
+
- ko
|
| 5 |
+
library_name: mlx
|
| 6 |
+
tags:
|
| 7 |
+
- whisper
|
| 8 |
+
- mlx
|
| 9 |
+
- quantized
|
| 10 |
+
- 4bit
|
| 11 |
+
- korean
|
| 12 |
+
- speech-recognition
|
| 13 |
+
- automatic-speech-recognition
|
| 14 |
+
base_model: seastar105/whisper-medium-ko-zeroth
|
| 15 |
+
---
|
| 16 |
+
|
| 17 |
+
# Whisper Medium Korean (Zeroth fine-tune) โ MLX 4bit
|
| 18 |
+
|
| 19 |
+
ํ๊ตญ์ด ์์ฑ ์ธ์์ ์ํ [Whisper Medium](https://huggingface.co/openai/whisper-medium) fine-tune ๋ชจ๋ธ์ Apple MLX ํ๋ ์์ํฌ์ฉ์ผ๋ก **4bit ์์ํ**ํ ๋ฒ์ ์
๋๋ค.
|
| 20 |
+
|
| 21 |
+
์๋ณธ: [`seastar105/whisper-medium-ko-zeroth`](https://huggingface.co/seastar105/whisper-medium-ko-zeroth) (Whisper Medium์ Zeroth Korean ๋ฐ์ดํฐ์
์ผ๋ก fine-tune)
|
| 22 |
+
|
| 23 |
+
## ์์ฝ
|
| 24 |
+
|
| 25 |
+
- **๋ฒ ์ด์ค**: Whisper Medium (769M ํ๋ผ๋ฏธํฐ)
|
| 26 |
+
- **Fine-tune**: Zeroth Korean ASR corpus
|
| 27 |
+
- **์์ํ**: 4bit (group size 64), `mlx-examples/whisper/convert.py` ์ฌ์ฉ
|
| 28 |
+
- **๋์คํฌ ํฌ๊ธฐ**: **831 MB** (์๋ณธ fp16 2.8GB ๋๋น ์ฝ 70% ๊ฐ์)
|
| 29 |
+
- **์ถ๋ก RAM**: ~1.26 GB
|
| 30 |
+
- **ํ๋ ์์ํฌ**: Apple MLX (Apple Silicon ์ ์ฉ)
|
| 31 |
+
|
| 32 |
+
## ํ๊ตญ์ด ์ฑ๋ฅ (Zeroth Korean test split)
|
| 33 |
+
|
| 34 |
+
| ์งํ | ๊ฐ |
|
| 35 |
+
|------|------|
|
| 36 |
+
| **CER** | **1.25%** |
|
| 37 |
+
| **WER** | **3.21%** |
|
| 38 |
+
| **RTF** | 0.055 (M3 16GB ๊ธฐ์ค) |
|
| 39 |
+
|
| 40 |
+
์๋ณธ fp16 ๋ชจ๋ธ๊ณผ ๊ฑฐ์ ๋์ผํ ์ ํ๋๋ฅผ ์ ์งํ๋ฉด์ ํฌ๊ธฐ์ ๋ฉ๋ชจ๋ฆฌ๋ฅผ ํฌ๊ฒ ์ค์์ต๋๋ค.
|
| 41 |
+
|
| 42 |
+
## ์ฌ์ฉ๋ฒ
|
| 43 |
+
|
| 44 |
+
### 1) `mlx-whisper` ์ง์ ํธ์ถ
|
| 45 |
+
|
| 46 |
+
```bash
|
| 47 |
+
pip install mlx-whisper
|
| 48 |
+
```
|
| 49 |
+
|
| 50 |
+
```python
|
| 51 |
+
import mlx_whisper
|
| 52 |
+
|
| 53 |
+
result = mlx_whisper.transcribe(
|
| 54 |
+
"audio.wav",
|
| 55 |
+
path_or_hf_repo="youngouk/seastar-medium-ko-4bit-mlx",
|
| 56 |
+
language="ko",
|
| 57 |
+
word_timestamps=True,
|
| 58 |
+
)
|
| 59 |
+
print(result["text"])
|
| 60 |
+
```
|
| 61 |
+
|
| 62 |
+
### 2) `meeting-transcriber` ์ฑ์์ ์ฌ์ฉ
|
| 63 |
+
|
| 64 |
+
[meeting-transcriber](https://github.com/youngouk/meeting-transcriber)๋ ์ด ๋ชจ๋ธ์ ๊ธฐ๋ณธ ์ ํ์ง๋ก ์ ๊ณตํ๋ macOS ๋ก์ปฌ ํ์ ์ ์ฌ ์ฑ์
๋๋ค.
|
| 65 |
+
|
| 66 |
+
์น UI์์ `์ค์ โ ์์ฑ ์ธ์ ๋ชจ๋ธ (STT) โ seastar medium-ko-zeroth (4bit)`๋ฅผ ์ ํํ๋ฉด ์๋ ๋ค์ด๋ก๋ ๋ฐ ํ์ฑํ๋ฉ๋๋ค.
|
| 67 |
+
|
| 68 |
+
## ํ์ผ ๊ตฌ์ฑ
|
| 69 |
+
|
| 70 |
+
```
|
| 71 |
+
config.json # MLX Whisper ๋ชจ๋ธ ์ค์ (์์ํ ํ๋ผ๋ฏธํฐ ํฌํจ)
|
| 72 |
+
weights.safetensors # 4bit ์์ํ๋ ๊ฐ์ค์น (~415MB)
|
| 73 |
+
```
|
| 74 |
+
|
| 75 |
+
`mlx-whisper` ๋ฐํ์์ด ์ ๋ ํ์ผ์ `path_or_hf_repo=` ์ธ์๋ก ๋ฐ๋ก ๋ก๋ํฉ๋๋ค. ํ ํฌ๋์ด์ ๋ `mlx-whisper`๊ฐ ๋ด์ฅํ multilingual vocab์ ์ฌ์ฉํ๋ฏ๋ก ๋ณ๋ ํ์ผ ๋ถํ์.
|
| 76 |
+
|
| 77 |
+
## ์์ํ ํ๋ผ๋ฏธํฐ
|
| 78 |
+
|
| 79 |
+
```json
|
| 80 |
+
{
|
| 81 |
+
"quantization": {
|
| 82 |
+
"bits": 4,
|
| 83 |
+
"group_size": 64
|
| 84 |
+
}
|
| 85 |
+
}
|
| 86 |
+
```
|
| 87 |
+
|
| 88 |
+
์ฌํ ์ปค๋งจ๋:
|
| 89 |
+
|
| 90 |
+
```bash
|
| 91 |
+
python mlx-examples/whisper/convert.py \
|
| 92 |
+
--torch-name-or-path seastar105/whisper-medium-ko-zeroth \
|
| 93 |
+
--mlx-path ./seastar-medium-ko-4bit \
|
| 94 |
+
-q --q-bits 4 --q-group-size 64
|
| 95 |
+
```
|
| 96 |
+
|
| 97 |
+
## ๋ผ์ด์ ์ค
|
| 98 |
+
|
| 99 |
+
Apache License 2.0 โ [์๋ณธ](https://huggingface.co/seastar105/whisper-medium-ko-zeroth) ๋ผ์ด์ ์ค๋ฅผ ๊ทธ๋๋ก ์น๊ณํฉ๋๋ค.
|
| 100 |
+
|
| 101 |
+
## ์ ํ ์ฌํญ
|
| 102 |
+
|
| 103 |
+
- **Apple Silicon ์ ์ฉ**: MLX ํ๋ ์์ํฌ๋ x86 CPU / CUDA์์ ๋์ํ์ง ์์ต๋๋ค. Intel Mac / Linux / Windows ์ฌ์ฉ์๋ ์๋ณธ [seastar105/whisper-medium-ko-zeroth](https://huggingface.co/seastar105/whisper-medium-ko-zeroth)๋ฅผ ์ฌ์ฉํ์ธ์.
|
| 104 |
+
- **ํ๊ตญ์ด ํนํ**: Zeroth Korean ๋ฐ์ดํฐ์
์ผ๋ก fine-tune๋์ด ํ๊ตญ์ด ์ธ ์ธ์ด ์ฑ๋ฅ์ ๋ฒ ์ด์ค Whisper Medium๋ณด๋ค ๋ฎ์ ์ ์์ต๋๋ค.
|
| 105 |
+
- **4bit ์์ํ ํน์ฑ**: ๋งค์ฐ ๋๋ฌผ๊ฒ ํฌ๊ท ์ดํ์์ ์๋ณธ fp16๋ณด๋ค ์ฝ๊ฐ ๋ฎ์ ์ ํ๋๋ฅผ ๋ณด์ผ ์ ์์ต๋๋ค (์ธก์ ๋ CER/WER ์ฐจ์ด๋ ๋ฌด์ ๊ฐ๋ฅํ ์์ค).
|
| 106 |
+
|
| 107 |
+
## ์ถ์ฒ ยท ์ธ์ฉ
|
| 108 |
+
|
| 109 |
+
- ์๋ณธ Whisper: [OpenAI](https://github.com/openai/whisper)
|
| 110 |
+
- ํ๊ตญ์ด fine-tune: [seastar105/whisper-medium-ko-zeroth](https://huggingface.co/seastar105/whisper-medium-ko-zeroth)
|
| 111 |
+
- ์์ํ ๋๊ตฌ: [mlx-examples/whisper](https://github.com/ml-explore/mlx-examples/tree/main/whisper)
|
| 112 |
+
- ์ฌ๋ฐฐํฌ: [youngouk](https://huggingface.co/youngouk) for [meeting-transcriber](https://github.com/youngouk/meeting-transcriber)
|
config.json
ADDED
|
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"n_mels": 80,
|
| 3 |
+
"n_audio_ctx": 1500,
|
| 4 |
+
"n_audio_state": 1024,
|
| 5 |
+
"n_audio_head": 16,
|
| 6 |
+
"n_audio_layer": 24,
|
| 7 |
+
"n_vocab": 51865,
|
| 8 |
+
"n_text_ctx": 448,
|
| 9 |
+
"n_text_state": 1024,
|
| 10 |
+
"n_text_head": 16,
|
| 11 |
+
"n_text_layer": 24,
|
| 12 |
+
"quantization": {
|
| 13 |
+
"group_size": 64,
|
| 14 |
+
"bits": 4
|
| 15 |
+
},
|
| 16 |
+
"model_type": "whisper"
|
| 17 |
+
}
|
weights.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:013f0e71c0b2e10c7cc24d6522a480c9c18d007f104d5ed6ec82978150f097c0
|
| 3 |
+
size 435558705
|