File size: 4,952 Bytes
c3f9ffc
 
 
 
 
3120a38
c3f9ffc
 
 
 
 
 
 
 
 
 
 
 
 
 
9f127fb
 
af72403
 
c3f9ffc
7aef48e
c9cb778
1fb0f3a
c3f9ffc
 
 
 
1fb0f3a
c3f9ffc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5495fc3
afbcc77
c3f9ffc
 
 
 
df62ea7
c3f9ffc
 
 
 
 
 
 
c9cb778
c3f9ffc
 
 
 
 
 
 
 
 
c9cb778
afbcc77
c3f9ffc
 
 
 
 
 
 
c9cb778
c3f9ffc
 
 
 
 
 
 
 
 
 
c9cb778
c3f9ffc
 
 
 
64f3d01
c3f9ffc
 
 
64f3d01
c8689f3
 
 
c3f9ffc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
---
license: apache-2.0
language:
- ko
base_model:
- naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-0.5B
tags:
- speech-to-text
- korean
- llama
- audio
- voice
- bigdefence
- HyperCLOVAX
- naver
pipeline_tag: audio-text-to-text
---

## ๐ŸŽง Bigvox

- **Bigvox**์€ ํ•œ๊ตญ์–ด ์Œ์„ฑ ์ธ์‹์— ํŠนํ™”๋œ ๊ณ ์„ฑ๋Šฅ, ์ €์ง€์—ฐ ์Œ์„ฑ ์–ธ์–ด ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. [naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-0.5B](https://huggingface.co/naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-0.5B) ๊ธฐ๋ฐ˜์œผ๋กœ ๊ตฌ์ถ•๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๐Ÿš€
- **End-to-End** ์Œ์„ฑ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๊ตฌ์กฐ๋ฅผ ์ฑ„ํƒํ•˜์—ฌ ์Œ์„ฑ ์ž…๋ ฅ๋ถ€ํ„ฐ ํ…์ŠคํŠธ ์ถœ๋ ฅ๊นŒ์ง€ ํ•˜๋‚˜์˜ ํŒŒ์ดํ”„๋ผ์ธ์—์„œ ์ฒ˜๋ฆฌํ•˜๋ฉฐ, ์ถ”๊ฐ€์ ์ธ ์ค‘๊ฐ„ ๋ชจ๋ธ ์—†์ด ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ์ฒ˜๋ฆฌ๋ฅผ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค.
![image/png](https://cdn-uploads.huggingface.co/production/uploads/653494138bde2fae198fe89e/NwonFS__hErgVy0p2Weu4.png)

### ๐Ÿ“‚ ๋ชจ๋ธ ์ ‘๊ทผ
- **GitHub**: [bigdefence/bigvox-hyperclovax](https://github.com/bigdefence/bigvox-hyperclovax) ๐ŸŒ
- **HuggingFace**: [bigdefence/Bigvox-HyperCLOVAX-Audio](https://huggingface.co/bigdefence/Bigvox-HyperCLOVAX-Audio) ๐Ÿค—
- **๋ชจ๋ธ ํฌ๊ธฐ**: 1B ํŒŒ๋ผ๋ฏธํ„ฐ ๐Ÿ“Š

## ๐ŸŒŸ ์ฃผ์š” ํŠน์ง•

- **๐Ÿ‡ฐ๐Ÿ‡ท ํ•œ๊ตญ์–ด ํŠนํ™”**: ํ•œ๊ตญ์–ด ์Œ์„ฑ ํŒจํ„ด๊ณผ ์–ธ์–ด์  ํŠน์„ฑ์— ์ตœ์ ํ™”
- **โšก ๊ฒฝ๋Ÿ‰ํ™”**: 1B ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ ํšจ์œจ์ ์ธ ์ถ”๋ก  ์„ฑ๋Šฅ
- **๐ŸŽฏ ๊ณ ์ •ํ™•๋„**: ๋‹ค์–‘ํ•œ ํ•œ๊ตญ์–ด ์Œ์„ฑ ํ™˜๊ฒฝ์—์„œ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ
- **๐Ÿ”ง ์‹ค์šฉ์„ฑ**: ์‹ค์‹œ๊ฐ„ ์Œ์„ฑ ์ธ์‹ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์— ์ ํ•ฉ

## ๐Ÿ“‹ ๋ชจ๋ธ ์ •๋ณด

| ํ•ญ๋ชฉ | ์„ธ๋ถ€์‚ฌํ•ญ |
|------|----------|
| **๊ธฐ๋ฐ˜ ๋ชจ๋ธ** | naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-0.5B |
| **์–ธ์–ด** | ํ•œ๊ตญ์–ด (Korean) |
| **๋ชจ๋ธ ํฌ๊ธฐ** | ~1B ํŒŒ๋ผ๋ฏธํ„ฐ |
| **์ž‘์—… ์œ ํ˜•** | Speech-to-Text ์Œ์„ฑ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ |
| **๋ผ์ด์„ ์Šค** | Apache 2.0 |

### ๐Ÿ”ง ๋ ˆํฌ์ง€ํ† ๋ฆฌ ๋‹ค์šด๋กœ๋“œ ๋ฐ ํ™˜๊ฒฝ ์„ค์ •

**Bigvox**์„ ์‹œ์ž‘ํ•˜๋ ค๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋ ˆํฌ์ง€ํ† ๋ฆฌ๋ฅผ ํด๋ก ํ•˜๊ณ  ํ™˜๊ฒฝ์„ ์„ค์ •ํ•˜์„ธ์š”. ๐Ÿ› ๏ธ

1. **๋ ˆํฌ์ง€ํ† ๋ฆฌ ํด๋ก **:
   ```bash
   git clone https://github.com/bigdefence/bigvox-hyperclovax
   cd bigvox-hyperclovax
   ```

2. **์˜์กด์„ฑ ์„ค์น˜**:
   ```bash
   bash setting.sh
   ```

### ๐Ÿ“ฅ ๋‹ค์šด๋กœ๋“œ ๋ฐฉ๋ฒ•

**Huggingface CLI ์‚ฌ์šฉ**:
```bash
pip install -U huggingface_hub
huggingface-cli download bigdefence/Bigvox-HyperCLOVAX-Audio --local-dir ./checkpoints
```

**Snapshot Download ์‚ฌ์šฉ**:
```bash
pip install -U huggingface_hub
```
```python
from huggingface_hub import snapshot_download
snapshot_download(
  repo_id="bigdefence/Bigvox-HyperCLOVAX-Audio",
  local_dir="./checkpoints",
  resume_download=True
)
```

**Git ์‚ฌ์šฉ**:
```bash
git lfs install
git clone https://huggingface.co/bigdefence/Bigvox-HyperCLOVAX-Audio
```

### ๐Ÿ› ๏ธ ์˜์กด์„ฑ ๋ชจ๋ธ
- **Speech Encoder**: [Whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) ๐ŸŽค

### ๐Ÿ”„ ๋กœ์ปฌ ์ถ”๋ก 

**Bigvox**์œผ๋กœ ์ถ”๋ก ์„ ์ˆ˜ํ–‰ํ•˜๋ ค๋ฉด ๋‹ค์Œ ๋‹จ๊ณ„๋ฅผ ๋”ฐ๋ผ ๋ชจ๋ธ์„ ์„ค์ •ํ•˜๊ณ  ๋กœ์ปฌ์—์„œ ์‹คํ–‰ํ•˜์„ธ์š”. ๐Ÿ“ก

1. **๋ชจ๋ธ ์ค€๋น„**:
   - [HuggingFace](https://huggingface.co/bigdefence/Bigvox-HyperCLOVAX-Audio)์—์„œ **Bigvox** ๋‹ค์šด๋กœ๋“œ ๐Ÿ“ฆ
   - [HuggingFace](https://huggingface.co/openai/whisper-large-v3)์—์„œ **Whisper-large-v3** ์Œ์„ฑ ์ธ์ฝ”๋”๋ฅผ ๋‹ค์šด๋กœ๋“œํ•˜์—ฌ `./models/speech_encoder/` ๋””๋ ‰ํ† ๋ฆฌ์— ๋ฐฐ์น˜ ๐ŸŽค

2. **์ถ”๋ก  ์‹คํ–‰**:
   - **์Œ์„ฑ-ํ…์ŠคํŠธ(S2T)** ์ถ”๋ก :
     - **Non-Streaming**
     ```bash
     python3 omni_speech/infer/bigvox.py --query_audio test_audio.wav
     ```
     - **Streaming**
     ```bash
     python3 omni_speech/infer/bigvox_streaming.py --query_audio test_audio.wav
     ```

## ๐Ÿ”ง ํ›ˆ๋ จ ์„ธ๋ถ€์‚ฌํ•ญ

### ํ›ˆ๋ จ ์„ค์ •
- **Base Model**: naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-0.5B
- **Hardware**: 1x NVIDIA RTX 6000A GPU
- **Training Time**: 3์‹œ๊ฐ„

## โš ๏ธ ์ œํ•œ์‚ฌํ•ญ

- ๋ฐฐ๊ฒฝ ์†Œ์Œ์ด ์‹ฌํ•œ ํ™˜๊ฒฝ์—์„œ๋Š” ์„ฑ๋Šฅ์ด ์ €ํ•˜๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค
- ๋งค์šฐ ๋น ๋ฅธ ๋ฐœํ™”๋‚˜ ์ค‘์–ผ๊ฑฐ๋ฆฌ๋Š” ๋งํˆฌ์— ๋Œ€ํ•ด์„œ๋Š” ์ธ์‹๋ฅ ์ด ๋–จ์–ด์งˆ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค
- ์ „๋ฌธ ์šฉ์–ด๋‚˜ ๊ณ ์œ ๋ช…์‚ฌ์— ๋Œ€ํ•œ ์ธ์‹๋ฅ ์€ ๋„๋ฉ”์ธ์— ๋”ฐ๋ผ ์ฐจ์ด๊ฐ€ ์žˆ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค

## ๐Ÿ“œ ๋ผ์ด์„ ์Šค

์ด ๋ชจ๋ธ์€ Apache 2.0 ๋ผ์ด์„ ์Šค ํ•˜์— ๋ฐฐํฌ๋ฉ๋‹ˆ๋‹ค. ์ƒ์—…์  ์‚ฌ์šฉ์ด ๊ฐ€๋Šฅํ•˜๋ฉฐ, ์ž์„ธํ•œ ๋‚ด์šฉ์€ [LICENSE](LICENSE) ํŒŒ์ผ์„ ์ฐธ์กฐํ•˜์„ธ์š”.


## ๐Ÿ“ž ๋ฌธ์˜์‚ฌํ•ญ

- **๊ฐœ๋ฐœ**: BigDefence

## ๐Ÿ“ˆ ์—…๋ฐ์ดํŠธ ๋กœ๊ทธ

### v1.0.0 (2024.12)
- ๐ŸŽ‰ **์ดˆ๊ธฐ ๋ชจ๋ธ ๋ฆด๋ฆฌ์ฆˆ**: Bigvox ๊ณต๊ฐœ
- ๐Ÿ‡ฐ๐Ÿ‡ท **ํ•œ๊ตญ์–ด ํŠนํ™”**: HyperCLOVAX-SEED-Text-Instruct-0.5B ๊ธฐ๋ฐ˜ ํ•œ๊ตญ์–ด ์Œ์„ฑ-ํ…์ŠคํŠธ ์Œ์„ฑ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๋ชจ๋ธ
---

## ๐Ÿค ๊ธฐ์—ฌํ•˜๊ธฐ

**Bigvox** ํ”„๋กœ์ ํŠธ์— ๊ธฐ์—ฌํ•˜๊ณ  ์‹ถ์œผ์‹œ๋‹ค๋ฉด:
---

**BigDefence**์™€ ํ•จ๊ป˜ ํ•œ๊ตญ์–ด AI ์Œ์„ฑ ์ธ์‹์˜ ๋ฏธ๋ž˜๋ฅผ ๋งŒ๋“ค์–ด๊ฐ€์„ธ์š”! ๐Ÿš€๐Ÿ‡ฐ๐Ÿ‡ท

*"Every voice matters, every word counts - ๋ชจ๋“  ๋ชฉ์†Œ๋ฆฌ๊ฐ€ ์ค‘์š”ํ•˜๊ณ , ๋ชจ๋“  ๋ง์ด ๊ฐ€์น˜ ์žˆ์Šต๋‹ˆ๋‹ค"*