bigdefence commited on
Commit
c3f9ffc
ยท
verified ยท
1 Parent(s): 42ed16f

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +152 -0
README.md ADDED
@@ -0,0 +1,152 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - ko
5
+ base_model:
6
+ - meta-llama/Llama-3.2-1B-Instruct
7
+ tags:
8
+ - speech-to-text
9
+ - korean
10
+ - llama
11
+ - audio
12
+ - voice
13
+ - bigdefence
14
+ - HyperCLOVAX
15
+ - naver
16
+ pipeline_tag: audio-text-to-text
17
+ ---
18
+
19
+ ## ๐ŸŽง Bigvox
20
+
21
+ **Bigvox**์€ ํ•œ๊ตญ์–ด ์Œ์„ฑ ์ธ์‹์— ํŠนํ™”๋œ ๊ณ ์„ฑ๋Šฅ, ์ €์ง€์—ฐ ์Œ์„ฑ ์–ธ์–ด ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. [LLaMA-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct) ๊ธฐ๋ฐ˜์œผ๋กœ ๊ตฌ์ถ•๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๐Ÿš€
22
+
23
+ ### ๐Ÿ“‚ ๋ชจ๋ธ ์ ‘๊ทผ
24
+ - **GitHub**: [bigdefence/bigvox](https://github.com/bigdefence/bigvox-hyperclovax) ๐ŸŒ
25
+ - **HuggingFace**: [bigdefence/bigvox](https://huggingface.co/bigdefence/bigvox) ๐Ÿค—
26
+ - **๋ชจ๋ธ ํฌ๊ธฐ**: 2B ํŒŒ๋ผ๋ฏธํ„ฐ ๐Ÿ“Š
27
+
28
+ ## ๐ŸŒŸ ์ฃผ์š” ํŠน์ง•
29
+
30
+ - **๐Ÿ‡ฐ๐Ÿ‡ท ํ•œ๊ตญ์–ด ํŠนํ™”**: ํ•œ๊ตญ์–ด ์Œ์„ฑ ํŒจํ„ด๊ณผ ์–ธ์–ด์  ํŠน์„ฑ์— ์ตœ์ ํ™”
31
+ - **โšก ๊ฒฝ๋Ÿ‰ํ™”**: 2B ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ ํšจ์œจ์ ์ธ ์ถ”๋ก  ์„ฑ๋Šฅ
32
+ - **๐ŸŽฏ ๊ณ ์ •ํ™•๋„**: ๋‹ค์–‘ํ•œ ํ•œ๊ตญ์–ด ์Œ์„ฑ ํ™˜๊ฒฝ์—์„œ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ
33
+ - **๐Ÿ”ง ์‹ค์šฉ์„ฑ**: ์‹ค์‹œ๊ฐ„ ์Œ์„ฑ ์ธ์‹ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์— ์ ํ•ฉ
34
+
35
+ ## ๐Ÿ“‹ ๋ชจ๋ธ ์ •๋ณด
36
+
37
+ | ํ•ญ๋ชฉ | ์„ธ๋ถ€์‚ฌํ•ญ |
38
+ |------|----------|
39
+ | **๊ธฐ๋ฐ˜ ๋ชจ๋ธ** | naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-0.5B |
40
+ | **์–ธ์–ด** | ํ•œ๊ตญ์–ด (Korean) |
41
+ | **๋ชจ๋ธ ํฌ๊ธฐ** | ~1B ํŒŒ๋ผ๋ฏธํ„ฐ |
42
+ | **์ž‘์—… ์œ ํ˜•** | Speech-to-Text ์Œ์„ฑ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ |
43
+ | **๋ผ์ด์„ ์Šค** | Apache 2.0 |
44
+
45
+ ### ๐Ÿ”ง ๋ ˆํฌ์ง€ํ† ๋ฆฌ ๋‹ค์šด๋กœ๋“œ ๋ฐ ํ™˜๊ฒฝ ์„ค์ •
46
+
47
+ **Bigvox**์„ ์‹œ์ž‘ํ•˜๋ ค๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋ ˆํฌ์ง€ํ† ๋ฆฌ๋ฅผ ํด๋ก ํ•˜๊ณ  ํ™˜๊ฒฝ์„ ์„ค์ •ํ•˜์„ธ์š”. ๐Ÿ› ๏ธ
48
+
49
+ 1. **๋ ˆํฌ์ง€ํ† ๋ฆฌ ํด๋ก **:
50
+ ```bash
51
+ git clone https://github.com/bigdefence/bigvox-hyperclovax
52
+ cd bigvox
53
+ ```
54
+
55
+ 2. **์˜์กด์„ฑ ์„ค์น˜**:
56
+ ```bash
57
+ pip install --upgrade pip
58
+ conda install pytorch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 pytorch-cuda=12.1 -c pytorch -c nvidia
59
+ pip install transformers huggingface_hub
60
+ ```
61
+
62
+ 3. **์„ ํƒ์‚ฌํ•ญ: ํ›ˆ๋ จ ํŒจํ‚ค์ง€ ์„ค์น˜**:
63
+ ๋ชจ๋ธ ํ›ˆ๋ จ์„ ๊ณ„ํšํ•œ๋‹ค๋ฉด ์ถ”๊ฐ€ ํŒจํ‚ค์ง€๋ฅผ ์„ค์น˜ํ•˜์„ธ์š”:
64
+ ```bash
65
+ pip install accelerate datasets
66
+ pip install flash-attn --no-build-isolation
67
+ ```
68
+
69
+ ### ๐Ÿ“ฅ ๋‹ค์šด๋กœ๋“œ ๋ฐฉ๋ฒ•
70
+
71
+ **Huggingface CLI ์‚ฌ์šฉ**:
72
+ ```bash
73
+ pip install -U huggingface_hub
74
+ huggingface-cli download bigdefence/bigvox-hyperclovax --local-dir ./checkpoints/bigvox
75
+ ```
76
+
77
+ **Snapshot Download ์‚ฌ์šฉ**:
78
+ ```bash
79
+ pip install -U huggingface_hub
80
+ ```
81
+ ```python
82
+ from huggingface_hub import snapshot_download
83
+ snapshot_download(
84
+ repo_id="bigdefence/bigvox-hyperclovax",
85
+ local_dir="./checkpoints/bigvox",
86
+ resume_download=True
87
+ )
88
+ ```
89
+
90
+ **Git ์‚ฌ์šฉ**:
91
+ ```bash
92
+ git lfs install
93
+ git clone https://huggingface.co/bigdefence/bigvox-hyperclovax
94
+ ```
95
+
96
+ ### ๐Ÿ› ๏ธ ์˜์กด์„ฑ ๋ชจ๋ธ
97
+ - **Speech Encoder**: [Whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) ๐ŸŽค
98
+
99
+ ### ๐Ÿ”„ ๋กœ์ปฌ ์ถ”๋ก 
100
+
101
+ **Bigvox**์œผ๋กœ ์ถ”๋ก ์„ ์ˆ˜ํ–‰ํ•˜๋ ค๋ฉด ๋‹ค์Œ ๋‹จ๊ณ„๋ฅผ ๋”ฐ๋ผ ๋ชจ๋ธ์„ ์„ค์ •ํ•˜๊ณ  ๋กœ์ปฌ์—์„œ ์‹คํ–‰ํ•˜์„ธ์š”. ๐Ÿ“ก
102
+
103
+ 1. **๋ชจ๋ธ ์ค€๋น„**:
104
+ - [HuggingFace](https://huggingface.co/bigdefence/bigvox-hyperclovax)์—์„œ **Bigvox** ๋‹ค์šด๋กœ๋“œ ๐Ÿ“ฆ
105
+ - [HuggingFace](https://huggingface.co/openai/whisper-large-v3)์—์„œ **Whisper-large-v3** ์Œ์„ฑ ์ธ์ฝ”๋”๋ฅผ ๋‹ค์šด๋กœ๋“œํ•˜์—ฌ `./models/speech_encoder/` ๋””๋ ‰ํ† ๋ฆฌ์— ๋ฐฐ์น˜ ๐ŸŽค
106
+
107
+ 2. **์ถ”๋ก  ์‹คํ–‰**:
108
+ - **์Œ์„ฑ-ํ…์ŠคํŠธ(S2T)** ์ถ”๋ก :
109
+ ```bash
110
+ python3 omni_speech/infer/bigvox.py --query_audio test_audio.wav
111
+ ```
112
+
113
+ ## ๐Ÿ”ง ํ›ˆ๋ จ ์„ธ๋ถ€์‚ฌํ•ญ
114
+
115
+ ### ๋ฐ์ดํ„ฐ์…‹
116
+ - **VoiceAssistant**: ํ•œ๊ตญ์–ด ๋Œ€ํ™” ์Œ์„ฑ ๋ฐ์ดํ„ฐ
117
+
118
+ ### ํ›ˆ๋ จ ์„ค์ •
119
+ - **Base Model**: naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-0.5B
120
+ - **Hardware**: 1x NVIDIA RTX 6000A GPU
121
+ - **Training Time**: 3์‹œ๊ฐ„
122
+
123
+ ## โš ๏ธ ์ œํ•œ์‚ฌํ•ญ
124
+
125
+ - ๋ฐฐ๊ฒฝ ์†Œ์Œ์ด ์‹ฌํ•œ ํ™˜๊ฒฝ์—์„œ๋Š” ์„ฑ๋Šฅ์ด ์ €ํ•˜๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค
126
+ - ๋งค์šฐ ๋น ๋ฅธ ๋ฐœํ™”๋‚˜ ์ค‘์–ผ๊ฑฐ๋ฆฌ๋Š” ๋งํˆฌ์— ๋Œ€ํ•ด์„œ๋Š” ์ธ์‹๋ฅ ์ด ๋–จ์–ด์งˆ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค
127
+ - ์ „๋ฌธ ์šฉ์–ด๋‚˜ ๊ณ ์œ ๋ช…์‚ฌ์— ๋Œ€ํ•œ ์ธ์‹๋ฅ ์€ ๋„๋ฉ”์ธ์— ๋”ฐ๋ผ ์ฐจ์ด๊ฐ€ ์žˆ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค
128
+
129
+ ## ๐Ÿ“œ ๋ผ์ด์„ ์Šค
130
+
131
+ ์ด ๋ชจ๋ธ์€ Apache 2.0 ๋ผ์ด์„ ์Šค ํ•˜์— ๋ฐฐํฌ๋ฉ๋‹ˆ๋‹ค. ์ƒ์—…์  ์‚ฌ์šฉ์ด ๊ฐ€๋Šฅํ•˜๋ฉฐ, ์ž์„ธํ•œ ๋‚ด์šฉ์€ [LICENSE](LICENSE) ํŒŒ์ผ์„ ์ฐธ์กฐํ•˜์„ธ์š”.
132
+
133
+
134
+ ## ๐Ÿ“ž ๋ฌธ์˜์‚ฌํ•ญ
135
+
136
+ - **๊ฐœ๋ฐœ**: BigDefence
137
+
138
+ ## ๐Ÿ“ˆ ์—…๋ฐ์ดํŠธ ๋กœ๊ทธ
139
+
140
+ ### v1.0.0 (2024.12)
141
+ - ๐ŸŽ‰ **์ดˆ๊ธฐ ๋ชจ๋ธ ๋ฆด๋ฆฌ์ฆˆ**: Bigvox ๊ณต๊ฐœ
142
+ - ๐Ÿ‡ฐ๐Ÿ‡ท **ํ•œ๊ตญ์–ด ํŠนํ™”**: HyperCLOVAX-SEED-Text-Instruct-0.5B ๊ธฐ๋ฐ˜ ํ•œ๊ตญ์–ด ์Œ์„ฑ-ํ…์ŠคํŠธ ์Œ์„ฑ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๋ชจ๋ธ
143
+ ---
144
+
145
+ ## ๐Ÿค ๊ธฐ์—ฌํ•˜๊ธฐ
146
+
147
+ **Bigvox** ํ”„๋กœ์ ํŠธ์— ๊ธฐ์—ฌํ•˜๊ณ  ์‹ถ์œผ์‹œ๋‹ค๋ฉด:
148
+ ---
149
+
150
+ **BigDefence**์™€ ํ•จ๊ป˜ ํ•œ๊ตญ์–ด AI ์Œ์„ฑ ์ธ์‹์˜ ๋ฏธ๋ž˜๋ฅผ ๋งŒ๋“ค์–ด๊ฐ€์„ธ์š”! ๐Ÿš€๐Ÿ‡ฐ๐Ÿ‡ท
151
+
152
+ *"Every voice matters, every word counts - ๋ชจ๋“  ๋ชฉ์†Œ๋ฆฌ๊ฐ€ ์ค‘์š”ํ•˜๊ณ , ๋ชจ๋“  ๋ง์ด ๊ฐ€์น˜ ์žˆ์Šต๋‹ˆ๋‹ค"*