Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,152 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
language:
|
| 4 |
+
- ko
|
| 5 |
+
base_model:
|
| 6 |
+
- meta-llama/Llama-3.2-1B-Instruct
|
| 7 |
+
tags:
|
| 8 |
+
- speech-to-text
|
| 9 |
+
- korean
|
| 10 |
+
- llama
|
| 11 |
+
- audio
|
| 12 |
+
- voice
|
| 13 |
+
- bigdefence
|
| 14 |
+
- HyperCLOVAX
|
| 15 |
+
- naver
|
| 16 |
+
pipeline_tag: audio-text-to-text
|
| 17 |
+
---
|
| 18 |
+
|
| 19 |
+
## ๐ง Bigvox
|
| 20 |
+
|
| 21 |
+
**Bigvox**์ ํ๊ตญ์ด ์์ฑ ์ธ์์ ํนํ๋ ๊ณ ์ฑ๋ฅ, ์ ์ง์ฐ ์์ฑ ์ธ์ด ๋ฉํฐ๋ชจ๋ฌ ๋ชจ๋ธ์
๋๋ค. [LLaMA-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct) ๊ธฐ๋ฐ์ผ๋ก ๊ตฌ์ถ๋์์ต๋๋ค. ๐
|
| 22 |
+
|
| 23 |
+
### ๐ ๋ชจ๋ธ ์ ๊ทผ
|
| 24 |
+
- **GitHub**: [bigdefence/bigvox](https://github.com/bigdefence/bigvox-hyperclovax) ๐
|
| 25 |
+
- **HuggingFace**: [bigdefence/bigvox](https://huggingface.co/bigdefence/bigvox) ๐ค
|
| 26 |
+
- **๋ชจ๋ธ ํฌ๊ธฐ**: 2B ํ๋ผ๋ฏธํฐ ๐
|
| 27 |
+
|
| 28 |
+
## ๐ ์ฃผ์ ํน์ง
|
| 29 |
+
|
| 30 |
+
- **๐ฐ๐ท ํ๊ตญ์ด ํนํ**: ํ๊ตญ์ด ์์ฑ ํจํด๊ณผ ์ธ์ด์ ํน์ฑ์ ์ต์ ํ
|
| 31 |
+
- **โก ๊ฒฝ๋ํ**: 2B ํ๋ผ๋ฏธํฐ๋ก ํจ์จ์ ์ธ ์ถ๋ก ์ฑ๋ฅ
|
| 32 |
+
- **๐ฏ ๊ณ ์ ํ๋**: ๋ค์ํ ํ๊ตญ์ด ์์ฑ ํ๊ฒฝ์์ ์ฐ์ํ ์ฑ๋ฅ
|
| 33 |
+
- **๐ง ์ค์ฉ์ฑ**: ์ค์๊ฐ ์์ฑ ์ธ์ ์ ํ๋ฆฌ์ผ์ด์
์ ์ ํฉ
|
| 34 |
+
|
| 35 |
+
## ๐ ๋ชจ๋ธ ์ ๋ณด
|
| 36 |
+
|
| 37 |
+
| ํญ๋ชฉ | ์ธ๋ถ์ฌํญ |
|
| 38 |
+
|------|----------|
|
| 39 |
+
| **๊ธฐ๋ฐ ๋ชจ๋ธ** | naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-0.5B |
|
| 40 |
+
| **์ธ์ด** | ํ๊ตญ์ด (Korean) |
|
| 41 |
+
| **๋ชจ๋ธ ํฌ๊ธฐ** | ~1B ํ๋ผ๋ฏธํฐ |
|
| 42 |
+
| **์์
์ ํ** | Speech-to-Text ์์ฑ ๋ฉํฐ๋ชจ๋ฌ |
|
| 43 |
+
| **๋ผ์ด์ ์ค** | Apache 2.0 |
|
| 44 |
+
|
| 45 |
+
### ๐ง ๋ ํฌ์งํ ๋ฆฌ ๋ค์ด๋ก๋ ๋ฐ ํ๊ฒฝ ์ค์
|
| 46 |
+
|
| 47 |
+
**Bigvox**์ ์์ํ๋ ค๋ฉด ๋ค์๊ณผ ๊ฐ์ด ๋ ํฌ์งํ ๋ฆฌ๋ฅผ ํด๋ก ํ๊ณ ํ๊ฒฝ์ ์ค์ ํ์ธ์. ๐ ๏ธ
|
| 48 |
+
|
| 49 |
+
1. **๋ ํฌ์งํ ๋ฆฌ ํด๋ก **:
|
| 50 |
+
```bash
|
| 51 |
+
git clone https://github.com/bigdefence/bigvox-hyperclovax
|
| 52 |
+
cd bigvox
|
| 53 |
+
```
|
| 54 |
+
|
| 55 |
+
2. **์์กด์ฑ ์ค์น**:
|
| 56 |
+
```bash
|
| 57 |
+
pip install --upgrade pip
|
| 58 |
+
conda install pytorch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 pytorch-cuda=12.1 -c pytorch -c nvidia
|
| 59 |
+
pip install transformers huggingface_hub
|
| 60 |
+
```
|
| 61 |
+
|
| 62 |
+
3. **์ ํ์ฌํญ: ํ๋ จ ํจํค์ง ์ค์น**:
|
| 63 |
+
๋ชจ๋ธ ํ๋ จ์ ๊ณํํ๋ค๋ฉด ์ถ๊ฐ ํจํค์ง๋ฅผ ์ค์นํ์ธ์:
|
| 64 |
+
```bash
|
| 65 |
+
pip install accelerate datasets
|
| 66 |
+
pip install flash-attn --no-build-isolation
|
| 67 |
+
```
|
| 68 |
+
|
| 69 |
+
### ๐ฅ ๋ค์ด๋ก๋ ๋ฐฉ๋ฒ
|
| 70 |
+
|
| 71 |
+
**Huggingface CLI ์ฌ์ฉ**:
|
| 72 |
+
```bash
|
| 73 |
+
pip install -U huggingface_hub
|
| 74 |
+
huggingface-cli download bigdefence/bigvox-hyperclovax --local-dir ./checkpoints/bigvox
|
| 75 |
+
```
|
| 76 |
+
|
| 77 |
+
**Snapshot Download ์ฌ์ฉ**:
|
| 78 |
+
```bash
|
| 79 |
+
pip install -U huggingface_hub
|
| 80 |
+
```
|
| 81 |
+
```python
|
| 82 |
+
from huggingface_hub import snapshot_download
|
| 83 |
+
snapshot_download(
|
| 84 |
+
repo_id="bigdefence/bigvox-hyperclovax",
|
| 85 |
+
local_dir="./checkpoints/bigvox",
|
| 86 |
+
resume_download=True
|
| 87 |
+
)
|
| 88 |
+
```
|
| 89 |
+
|
| 90 |
+
**Git ์ฌ์ฉ**:
|
| 91 |
+
```bash
|
| 92 |
+
git lfs install
|
| 93 |
+
git clone https://huggingface.co/bigdefence/bigvox-hyperclovax
|
| 94 |
+
```
|
| 95 |
+
|
| 96 |
+
### ๐ ๏ธ ์์กด์ฑ ๋ชจ๋ธ
|
| 97 |
+
- **Speech Encoder**: [Whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) ๐ค
|
| 98 |
+
|
| 99 |
+
### ๐ ๋ก์ปฌ ์ถ๋ก
|
| 100 |
+
|
| 101 |
+
**Bigvox**์ผ๋ก ์ถ๋ก ์ ์ํํ๋ ค๋ฉด ๋ค์ ๋จ๊ณ๋ฅผ ๋ฐ๋ผ ๋ชจ๋ธ์ ์ค์ ํ๊ณ ๋ก์ปฌ์์ ์คํํ์ธ์. ๐ก
|
| 102 |
+
|
| 103 |
+
1. **๋ชจ๋ธ ์ค๋น**:
|
| 104 |
+
- [HuggingFace](https://huggingface.co/bigdefence/bigvox-hyperclovax)์์ **Bigvox** ๋ค์ด๋ก๋ ๐ฆ
|
| 105 |
+
- [HuggingFace](https://huggingface.co/openai/whisper-large-v3)์์ **Whisper-large-v3** ์์ฑ ์ธ์ฝ๋๋ฅผ ๋ค์ด๋ก๋ํ์ฌ `./models/speech_encoder/` ๋๋ ํ ๋ฆฌ์ ๋ฐฐ์น ๐ค
|
| 106 |
+
|
| 107 |
+
2. **์ถ๋ก ์คํ**:
|
| 108 |
+
- **์์ฑ-ํ
์คํธ(S2T)** ์ถ๋ก :
|
| 109 |
+
```bash
|
| 110 |
+
python3 omni_speech/infer/bigvox.py --query_audio test_audio.wav
|
| 111 |
+
```
|
| 112 |
+
|
| 113 |
+
## ๐ง ํ๋ จ ์ธ๋ถ์ฌํญ
|
| 114 |
+
|
| 115 |
+
### ๋ฐ์ดํฐ์
|
| 116 |
+
- **VoiceAssistant**: ํ๊ตญ์ด ๋ํ ์์ฑ ๋ฐ์ดํฐ
|
| 117 |
+
|
| 118 |
+
### ํ๋ จ ์ค์
|
| 119 |
+
- **Base Model**: naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-0.5B
|
| 120 |
+
- **Hardware**: 1x NVIDIA RTX 6000A GPU
|
| 121 |
+
- **Training Time**: 3์๊ฐ
|
| 122 |
+
|
| 123 |
+
## โ ๏ธ ์ ํ์ฌํญ
|
| 124 |
+
|
| 125 |
+
- ๋ฐฐ๊ฒฝ ์์์ด ์ฌํ ํ๊ฒฝ์์๋ ์ฑ๋ฅ์ด ์ ํ๋ ์ ์์ต๋๋ค
|
| 126 |
+
- ๋งค์ฐ ๋น ๋ฅธ ๋ฐํ๋ ์ค์ผ๊ฑฐ๋ฆฌ๋ ๋งํฌ์ ๋ํด์๋ ์ธ์๋ฅ ์ด ๋จ์ด์ง ์ ์์ต๋๋ค
|
| 127 |
+
- ์ ๋ฌธ ์ฉ์ด๋ ๊ณ ์ ๋ช
์ฌ์ ๋ํ ์ธ์๋ฅ ์ ๋๋ฉ์ธ์ ๋ฐ๋ผ ์ฐจ์ด๊ฐ ์์ ์ ์์ต๋๋ค
|
| 128 |
+
|
| 129 |
+
## ๐ ๋ผ์ด์ ์ค
|
| 130 |
+
|
| 131 |
+
์ด ๋ชจ๋ธ์ Apache 2.0 ๋ผ์ด์ ์ค ํ์ ๋ฐฐํฌ๋ฉ๋๋ค. ์์
์ ์ฌ์ฉ์ด ๊ฐ๋ฅํ๋ฉฐ, ์์ธํ ๋ด์ฉ์ [LICENSE](LICENSE) ํ์ผ์ ์ฐธ์กฐํ์ธ์.
|
| 132 |
+
|
| 133 |
+
|
| 134 |
+
## ๐ ๋ฌธ์์ฌํญ
|
| 135 |
+
|
| 136 |
+
- **๊ฐ๋ฐ**: BigDefence
|
| 137 |
+
|
| 138 |
+
## ๐ ์
๋ฐ์ดํธ ๋ก๊ทธ
|
| 139 |
+
|
| 140 |
+
### v1.0.0 (2024.12)
|
| 141 |
+
- ๐ **์ด๊ธฐ ๋ชจ๋ธ ๋ฆด๋ฆฌ์ฆ**: Bigvox ๊ณต๊ฐ
|
| 142 |
+
- ๐ฐ๐ท **ํ๊ตญ์ด ํนํ**: HyperCLOVAX-SEED-Text-Instruct-0.5B ๊ธฐ๋ฐ ํ๊ตญ์ด ์์ฑ-ํ
์คํธ ์์ฑ ๋ฉํฐ๋ชจ๋ฌ ๋ชจ๋ธ
|
| 143 |
+
---
|
| 144 |
+
|
| 145 |
+
## ๐ค ๊ธฐ์ฌํ๊ธฐ
|
| 146 |
+
|
| 147 |
+
**Bigvox** ํ๋ก์ ํธ์ ๊ธฐ์ฌํ๊ณ ์ถ์ผ์๋ค๋ฉด:
|
| 148 |
+
---
|
| 149 |
+
|
| 150 |
+
**BigDefence**์ ํจ๊ป ํ๊ตญ์ด AI ์์ฑ ์ธ์์ ๋ฏธ๋๋ฅผ ๋ง๋ค์ด๊ฐ์ธ์! ๐๐ฐ๐ท
|
| 151 |
+
|
| 152 |
+
*"Every voice matters, every word counts - ๋ชจ๋ ๋ชฉ์๋ฆฌ๊ฐ ์ค์ํ๊ณ , ๋ชจ๋ ๋ง์ด ๊ฐ์น ์์ต๋๋ค"*
|