|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- ko |
|
|
base_model: |
|
|
- naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-0.5B |
|
|
tags: |
|
|
- speech-to-text |
|
|
- korean |
|
|
- llama |
|
|
- audio |
|
|
- voice |
|
|
- bigdefence |
|
|
- HyperCLOVAX |
|
|
- naver |
|
|
pipeline_tag: audio-text-to-text |
|
|
--- |
|
|
|
|
|
## ๐ง Bigvox |
|
|
|
|
|
- **Bigvox**์ ํ๊ตญ์ด ์์ฑ ์ธ์์ ํนํ๋ ๊ณ ์ฑ๋ฅ, ์ ์ง์ฐ ์์ฑ ์ธ์ด ๋ฉํฐ๋ชจ๋ฌ ๋ชจ๋ธ์
๋๋ค. [naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-0.5B](https://huggingface.co/naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-0.5B) ๊ธฐ๋ฐ์ผ๋ก ๊ตฌ์ถ๋์์ต๋๋ค. ๐ |
|
|
- **End-to-End** ์์ฑ ๋ฉํฐ๋ชจ๋ฌ ๊ตฌ์กฐ๋ฅผ ์ฑํํ์ฌ ์์ฑ ์
๋ ฅ๋ถํฐ ํ
์คํธ ์ถ๋ ฅ๊น์ง ํ๋์ ํ์ดํ๋ผ์ธ์์ ์ฒ๋ฆฌํ๋ฉฐ, ์ถ๊ฐ์ ์ธ ์ค๊ฐ ๋ชจ๋ธ ์์ด ์์ฐ์ค๋ฝ๊ฒ ๋ฉํฐ๋ชจ๋ฌ ์ฒ๋ฆฌ๋ฅผ ์ง์ํฉ๋๋ค. |
|
|
 |
|
|
|
|
|
### ๐ ๋ชจ๋ธ ์ ๊ทผ |
|
|
- **GitHub**: [bigdefence/bigvox-hyperclovax](https://github.com/bigdefence/bigvox-hyperclovax) ๐ |
|
|
- **HuggingFace**: [bigdefence/Bigvox-HyperCLOVAX-Audio](https://huggingface.co/bigdefence/Bigvox-HyperCLOVAX-Audio) ๐ค |
|
|
- **๋ชจ๋ธ ํฌ๊ธฐ**: 1B ํ๋ผ๋ฏธํฐ ๐ |
|
|
|
|
|
## ๐ ์ฃผ์ ํน์ง |
|
|
|
|
|
- **๐ฐ๐ท ํ๊ตญ์ด ํนํ**: ํ๊ตญ์ด ์์ฑ ํจํด๊ณผ ์ธ์ด์ ํน์ฑ์ ์ต์ ํ |
|
|
- **โก ๊ฒฝ๋ํ**: 1B ํ๋ผ๋ฏธํฐ๋ก ํจ์จ์ ์ธ ์ถ๋ก ์ฑ๋ฅ |
|
|
- **๐ฏ ๊ณ ์ ํ๋**: ๋ค์ํ ํ๊ตญ์ด ์์ฑ ํ๊ฒฝ์์ ์ฐ์ํ ์ฑ๋ฅ |
|
|
- **๐ง ์ค์ฉ์ฑ**: ์ค์๊ฐ ์์ฑ ์ธ์ ์ ํ๋ฆฌ์ผ์ด์
์ ์ ํฉ |
|
|
|
|
|
## ๐ ๋ชจ๋ธ ์ ๋ณด |
|
|
|
|
|
| ํญ๋ชฉ | ์ธ๋ถ์ฌํญ | |
|
|
|------|----------| |
|
|
| **๊ธฐ๋ฐ ๋ชจ๋ธ** | naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-0.5B | |
|
|
| **์ธ์ด** | ํ๊ตญ์ด (Korean) | |
|
|
| **๋ชจ๋ธ ํฌ๊ธฐ** | ~1B ํ๋ผ๋ฏธํฐ | |
|
|
| **์์
์ ํ** | Speech-to-Text ์์ฑ ๋ฉํฐ๋ชจ๋ฌ | |
|
|
| **๋ผ์ด์ ์ค** | Apache 2.0 | |
|
|
|
|
|
### ๐ง ๋ ํฌ์งํ ๋ฆฌ ๋ค์ด๋ก๋ ๋ฐ ํ๊ฒฝ ์ค์ |
|
|
|
|
|
**Bigvox**์ ์์ํ๋ ค๋ฉด ๋ค์๊ณผ ๊ฐ์ด ๋ ํฌ์งํ ๋ฆฌ๋ฅผ ํด๋ก ํ๊ณ ํ๊ฒฝ์ ์ค์ ํ์ธ์. ๐ ๏ธ |
|
|
|
|
|
1. **๋ ํฌ์งํ ๋ฆฌ ํด๋ก **: |
|
|
```bash |
|
|
git clone https://github.com/bigdefence/bigvox-hyperclovax |
|
|
cd bigvox-hyperclovax |
|
|
``` |
|
|
|
|
|
2. **์์กด์ฑ ์ค์น**: |
|
|
```bash |
|
|
bash setting.sh |
|
|
``` |
|
|
|
|
|
### ๐ฅ ๋ค์ด๋ก๋ ๋ฐฉ๋ฒ |
|
|
|
|
|
**Huggingface CLI ์ฌ์ฉ**: |
|
|
```bash |
|
|
pip install -U huggingface_hub |
|
|
huggingface-cli download bigdefence/Bigvox-HyperCLOVAX-Audio --local-dir ./checkpoints |
|
|
``` |
|
|
|
|
|
**Snapshot Download ์ฌ์ฉ**: |
|
|
```bash |
|
|
pip install -U huggingface_hub |
|
|
``` |
|
|
```python |
|
|
from huggingface_hub import snapshot_download |
|
|
snapshot_download( |
|
|
repo_id="bigdefence/Bigvox-HyperCLOVAX-Audio", |
|
|
local_dir="./checkpoints", |
|
|
resume_download=True |
|
|
) |
|
|
``` |
|
|
|
|
|
**Git ์ฌ์ฉ**: |
|
|
```bash |
|
|
git lfs install |
|
|
git clone https://huggingface.co/bigdefence/Bigvox-HyperCLOVAX-Audio |
|
|
``` |
|
|
|
|
|
### ๐ ๏ธ ์์กด์ฑ ๋ชจ๋ธ |
|
|
- **Speech Encoder**: [Whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) ๐ค |
|
|
|
|
|
### ๐ ๋ก์ปฌ ์ถ๋ก |
|
|
|
|
|
**Bigvox**์ผ๋ก ์ถ๋ก ์ ์ํํ๋ ค๋ฉด ๋ค์ ๋จ๊ณ๋ฅผ ๋ฐ๋ผ ๋ชจ๋ธ์ ์ค์ ํ๊ณ ๋ก์ปฌ์์ ์คํํ์ธ์. ๐ก |
|
|
|
|
|
1. **๋ชจ๋ธ ์ค๋น**: |
|
|
- [HuggingFace](https://huggingface.co/bigdefence/Bigvox-HyperCLOVAX-Audio)์์ **Bigvox** ๋ค์ด๋ก๋ ๐ฆ |
|
|
- [HuggingFace](https://huggingface.co/openai/whisper-large-v3)์์ **Whisper-large-v3** ์์ฑ ์ธ์ฝ๋๋ฅผ ๋ค์ด๋ก๋ํ์ฌ `./models/speech_encoder/` ๋๋ ํ ๋ฆฌ์ ๋ฐฐ์น ๐ค |
|
|
|
|
|
2. **์ถ๋ก ์คํ**: |
|
|
- **์์ฑ-ํ
์คํธ(S2T)** ์ถ๋ก : |
|
|
- **Non-Streaming** |
|
|
```bash |
|
|
python3 omni_speech/infer/bigvox.py --query_audio test_audio.wav |
|
|
``` |
|
|
- **Streaming** |
|
|
```bash |
|
|
python3 omni_speech/infer/bigvox_streaming.py --query_audio test_audio.wav |
|
|
``` |
|
|
|
|
|
## ๐ง ํ๋ จ ์ธ๋ถ์ฌํญ |
|
|
|
|
|
### ํ๋ จ ์ค์ |
|
|
- **Base Model**: naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-0.5B |
|
|
- **Hardware**: 1x NVIDIA RTX 6000A GPU |
|
|
- **Training Time**: 3์๊ฐ |
|
|
|
|
|
## โ ๏ธ ์ ํ์ฌํญ |
|
|
|
|
|
- ๋ฐฐ๊ฒฝ ์์์ด ์ฌํ ํ๊ฒฝ์์๋ ์ฑ๋ฅ์ด ์ ํ๋ ์ ์์ต๋๋ค |
|
|
- ๋งค์ฐ ๋น ๋ฅธ ๋ฐํ๋ ์ค์ผ๊ฑฐ๋ฆฌ๋ ๋งํฌ์ ๋ํด์๋ ์ธ์๋ฅ ์ด ๋จ์ด์ง ์ ์์ต๋๋ค |
|
|
- ์ ๋ฌธ ์ฉ์ด๋ ๊ณ ์ ๋ช
์ฌ์ ๋ํ ์ธ์๋ฅ ์ ๋๋ฉ์ธ์ ๋ฐ๋ผ ์ฐจ์ด๊ฐ ์์ ์ ์์ต๋๋ค |
|
|
|
|
|
## ๐ ๋ผ์ด์ ์ค |
|
|
|
|
|
์ด ๋ชจ๋ธ์ Apache 2.0 ๋ผ์ด์ ์ค ํ์ ๋ฐฐํฌ๋ฉ๋๋ค. ์์
์ ์ฌ์ฉ์ด ๊ฐ๋ฅํ๋ฉฐ, ์์ธํ ๋ด์ฉ์ [LICENSE](LICENSE) ํ์ผ์ ์ฐธ์กฐํ์ธ์. |
|
|
|
|
|
|
|
|
## ๐ ๋ฌธ์์ฌํญ |
|
|
|
|
|
- **๊ฐ๋ฐ**: BigDefence |
|
|
|
|
|
## ๐ ์
๋ฐ์ดํธ ๋ก๊ทธ |
|
|
|
|
|
### v1.0.0 (2024.12) |
|
|
- ๐ **์ด๊ธฐ ๋ชจ๋ธ ๋ฆด๋ฆฌ์ฆ**: Bigvox ๊ณต๊ฐ |
|
|
- ๐ฐ๐ท **ํ๊ตญ์ด ํนํ**: HyperCLOVAX-SEED-Text-Instruct-0.5B ๊ธฐ๋ฐ ํ๊ตญ์ด ์์ฑ-ํ
์คํธ ์์ฑ ๋ฉํฐ๋ชจ๋ฌ ๋ชจ๋ธ |
|
|
--- |
|
|
|
|
|
## ๐ค ๊ธฐ์ฌํ๊ธฐ |
|
|
|
|
|
**Bigvox** ํ๋ก์ ํธ์ ๊ธฐ์ฌํ๊ณ ์ถ์ผ์๋ค๋ฉด: |
|
|
--- |
|
|
|
|
|
**BigDefence**์ ํจ๊ป ํ๊ตญ์ด AI ์์ฑ ์ธ์์ ๋ฏธ๋๋ฅผ ๋ง๋ค์ด๊ฐ์ธ์! ๐๐ฐ๐ท |
|
|
|
|
|
*"Every voice matters, every word counts - ๋ชจ๋ ๋ชฉ์๋ฆฌ๊ฐ ์ค์ํ๊ณ , ๋ชจ๋ ๋ง์ด ๊ฐ์น ์์ต๋๋ค"* |