youngouk commited on
Commit
d995029
ยท
verified ยท
1 Parent(s): 8af0742

Initial 4bit quantized release (mlx-whisper compatible)

Browse files
Files changed (3) hide show
  1. README.md +112 -0
  2. config.json +17 -0
  3. weights.safetensors +3 -0
README.md ADDED
@@ -0,0 +1,112 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - ko
5
+ library_name: mlx
6
+ tags:
7
+ - whisper
8
+ - mlx
9
+ - quantized
10
+ - 4bit
11
+ - korean
12
+ - speech-recognition
13
+ - automatic-speech-recognition
14
+ base_model: seastar105/whisper-medium-ko-zeroth
15
+ ---
16
+
17
+ # Whisper Medium Korean (Zeroth fine-tune) โ€” MLX 4bit
18
+
19
+ ํ•œ๊ตญ์–ด ์Œ์„ฑ ์ธ์‹์„ ์œ„ํ•œ [Whisper Medium](https://huggingface.co/openai/whisper-medium) fine-tune ๋ชจ๋ธ์„ Apple MLX ํ”„๋ ˆ์ž„์›Œํฌ์šฉ์œผ๋กœ **4bit ์–‘์žํ™”**ํ•œ ๋ฒ„์ „์ž…๋‹ˆ๋‹ค.
20
+
21
+ ์›๋ณธ: [`seastar105/whisper-medium-ko-zeroth`](https://huggingface.co/seastar105/whisper-medium-ko-zeroth) (Whisper Medium์„ Zeroth Korean ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ fine-tune)
22
+
23
+ ## ์š”์•ฝ
24
+
25
+ - **๋ฒ ์ด์Šค**: Whisper Medium (769M ํŒŒ๋ผ๋ฏธํ„ฐ)
26
+ - **Fine-tune**: Zeroth Korean ASR corpus
27
+ - **์–‘์žํ™”**: 4bit (group size 64), `mlx-examples/whisper/convert.py` ์‚ฌ์šฉ
28
+ - **๋””์Šคํฌ ํฌ๊ธฐ**: **831 MB** (์›๋ณธ fp16 2.8GB ๋Œ€๋น„ ์•ฝ 70% ๊ฐ์†Œ)
29
+ - **์ถ”๋ก  RAM**: ~1.26 GB
30
+ - **ํ”„๋ ˆ์ž„์›Œํฌ**: Apple MLX (Apple Silicon ์ „์šฉ)
31
+
32
+ ## ํ•œ๊ตญ์–ด ์„ฑ๋Šฅ (Zeroth Korean test split)
33
+
34
+ | ์ง€ํ‘œ | ๊ฐ’ |
35
+ |------|------|
36
+ | **CER** | **1.25%** |
37
+ | **WER** | **3.21%** |
38
+ | **RTF** | 0.055 (M3 16GB ๊ธฐ์ค€) |
39
+
40
+ ์›๋ณธ fp16 ๋ชจ๋ธ๊ณผ ๊ฑฐ์˜ ๋™์ผํ•œ ์ •ํ™•๋„๋ฅผ ์œ ์ง€ํ•˜๋ฉด์„œ ํฌ๊ธฐ์™€ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ํฌ๊ฒŒ ์ค„์˜€์Šต๋‹ˆ๋‹ค.
41
+
42
+ ## ์‚ฌ์šฉ๋ฒ•
43
+
44
+ ### 1) `mlx-whisper` ์ง์ ‘ ํ˜ธ์ถœ
45
+
46
+ ```bash
47
+ pip install mlx-whisper
48
+ ```
49
+
50
+ ```python
51
+ import mlx_whisper
52
+
53
+ result = mlx_whisper.transcribe(
54
+ "audio.wav",
55
+ path_or_hf_repo="youngouk/seastar-medium-ko-4bit-mlx",
56
+ language="ko",
57
+ word_timestamps=True,
58
+ )
59
+ print(result["text"])
60
+ ```
61
+
62
+ ### 2) `meeting-transcriber` ์•ฑ์—์„œ ์‚ฌ์šฉ
63
+
64
+ [meeting-transcriber](https://github.com/youngouk/meeting-transcriber)๋Š” ์ด ๋ชจ๋ธ์„ ๊ธฐ๋ณธ ์„ ํƒ์ง€๋กœ ์ œ๊ณตํ•˜๋Š” macOS ๋กœ์ปฌ ํšŒ์˜ ์ „์‚ฌ ์•ฑ์ž…๋‹ˆ๋‹ค.
65
+
66
+ ์›น UI์—์„œ `์„ค์ • โ†’ ์Œ์„ฑ ์ธ์‹ ๋ชจ๋ธ (STT) โ†’ seastar medium-ko-zeroth (4bit)`๋ฅผ ์„ ํƒํ•˜๋ฉด ์ž๋™ ๋‹ค์šด๋กœ๋“œ ๋ฐ ํ™œ์„ฑํ™”๋ฉ๋‹ˆ๋‹ค.
67
+
68
+ ## ํŒŒ์ผ ๊ตฌ์„ฑ
69
+
70
+ ```
71
+ config.json # MLX Whisper ๋ชจ๋ธ ์„ค์ • (์–‘์žํ™” ํŒŒ๋ผ๋ฏธํ„ฐ ํฌํ•จ)
72
+ weights.safetensors # 4bit ์–‘์žํ™”๋œ ๊ฐ€์ค‘์น˜ (~415MB)
73
+ ```
74
+
75
+ `mlx-whisper` ๋Ÿฐํƒ€์ž„์ด ์œ„ ๋‘ ํŒŒ์ผ์„ `path_or_hf_repo=` ์ธ์ž๋กœ ๋ฐ”๋กœ ๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค. ํ† ํฌ๋‚˜์ด์ €๋Š” `mlx-whisper`๊ฐ€ ๋‚ด์žฅํ•œ multilingual vocab์„ ์‚ฌ์šฉํ•˜๋ฏ€๋กœ ๋ณ„๋„ ํŒŒ์ผ ๋ถˆํ•„์š”.
76
+
77
+ ## ์–‘์žํ™” ํŒŒ๋ผ๋ฏธํ„ฐ
78
+
79
+ ```json
80
+ {
81
+ "quantization": {
82
+ "bits": 4,
83
+ "group_size": 64
84
+ }
85
+ }
86
+ ```
87
+
88
+ ์žฌํ˜„ ์ปค๋งจ๋“œ:
89
+
90
+ ```bash
91
+ python mlx-examples/whisper/convert.py \
92
+ --torch-name-or-path seastar105/whisper-medium-ko-zeroth \
93
+ --mlx-path ./seastar-medium-ko-4bit \
94
+ -q --q-bits 4 --q-group-size 64
95
+ ```
96
+
97
+ ## ๋ผ์ด์„ ์Šค
98
+
99
+ Apache License 2.0 โ€” [์›๋ณธ](https://huggingface.co/seastar105/whisper-medium-ko-zeroth) ๋ผ์ด์„ ์Šค๋ฅผ ๊ทธ๋Œ€๋กœ ์Šน๊ณ„ํ•ฉ๋‹ˆ๋‹ค.
100
+
101
+ ## ์ œํ•œ ์‚ฌํ•ญ
102
+
103
+ - **Apple Silicon ์ „์šฉ**: MLX ํ”„๋ ˆ์ž„์›Œํฌ๋Š” x86 CPU / CUDA์—์„œ ๋™์ž‘ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. Intel Mac / Linux / Windows ์‚ฌ์šฉ์ž๋Š” ์›๋ณธ [seastar105/whisper-medium-ko-zeroth](https://huggingface.co/seastar105/whisper-medium-ko-zeroth)๋ฅผ ์‚ฌ์šฉํ•˜์„ธ์š”.
104
+ - **ํ•œ๊ตญ์–ด ํŠนํ™”**: Zeroth Korean ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ fine-tune๋˜์–ด ํ•œ๊ตญ์–ด ์™ธ ์–ธ์–ด ์„ฑ๋Šฅ์€ ๋ฒ ์ด์Šค Whisper Medium๋ณด๋‹ค ๋‚ฎ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
105
+ - **4bit ์–‘์žํ™” ํŠน์„ฑ**: ๋งค์šฐ ๋“œ๋ฌผ๊ฒŒ ํฌ๊ท€ ์–ดํœ˜์—์„œ ์›๋ณธ fp16๋ณด๋‹ค ์•ฝ๊ฐ„ ๋‚ฎ์€ ์ •ํ™•๋„๋ฅผ ๋ณด์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค (์ธก์ •๋œ CER/WER ์ฐจ์ด๋Š” ๋ฌด์‹œ ๊ฐ€๋Šฅํ•œ ์ˆ˜์ค€).
106
+
107
+ ## ์ถœ์ฒ˜ ยท ์ธ์šฉ
108
+
109
+ - ์›๋ณธ Whisper: [OpenAI](https://github.com/openai/whisper)
110
+ - ํ•œ๊ตญ์–ด fine-tune: [seastar105/whisper-medium-ko-zeroth](https://huggingface.co/seastar105/whisper-medium-ko-zeroth)
111
+ - ์–‘์žํ™” ๋„๊ตฌ: [mlx-examples/whisper](https://github.com/ml-explore/mlx-examples/tree/main/whisper)
112
+ - ์žฌ๋ฐฐํฌ: [youngouk](https://huggingface.co/youngouk) for [meeting-transcriber](https://github.com/youngouk/meeting-transcriber)
config.json ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "n_mels": 80,
3
+ "n_audio_ctx": 1500,
4
+ "n_audio_state": 1024,
5
+ "n_audio_head": 16,
6
+ "n_audio_layer": 24,
7
+ "n_vocab": 51865,
8
+ "n_text_ctx": 448,
9
+ "n_text_state": 1024,
10
+ "n_text_head": 16,
11
+ "n_text_layer": 24,
12
+ "quantization": {
13
+ "group_size": 64,
14
+ "bits": 4
15
+ },
16
+ "model_type": "whisper"
17
+ }
weights.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:013f0e71c0b2e10c7cc24d6522a480c9c18d007f104d5ed6ec82978150f097c0
3
+ size 435558705