Davinci Voice

High-quality Korean Text-to-Speech with Voice Cloning

Davinci Voice๋Š” ํ•œ๊ตญ์–ด์— ์ตœ์ ํ™”๋œ ๊ณ ํ’ˆ์งˆ ์Œ์„ฑ ํ•ฉ์„ฑ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์ž…๋‹ˆ๋‹ค. 3์ดˆ์˜ ๋ ˆํผ๋Ÿฐ์Šค ์˜ค๋””์˜ค๋งŒ์œผ๋กœ ์Œ์„ฑ ํด๋กœ๋‹์ด ๊ฐ€๋Šฅํ•˜๋ฉฐ, ์‹ค์‹œ๊ฐ„ ์ŠคํŠธ๋ฆฌ๋ฐ์„ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค.

ํŠน์ง•

  • ๐ŸŽฏ ํ•œ๊ตญ์–ด ๋„ค์ดํ‹ฐ๋ธŒ ์ง€์›: ํ•œ๊ตญ์–ด์— ์ตœ์ ํ™”๋œ ๋ฐœ์Œ๊ณผ ์šด์œจ
  • ๐ŸŽ™๏ธ 3์ดˆ ์Œ์„ฑ ํด๋กœ๋‹: ์งง์€ ๋ ˆํผ๋Ÿฐ์Šค๋กœ ๋น ๋ฅธ ์Œ์„ฑ ๋ณต์ œ
  • โšก 97ms ๋ ˆ์ดํ„ด์‹œ: ์‹ค์‹œ๊ฐ„ ๋Œ€ํ™”์— ์ ํ•ฉํ•œ ๋น ๋ฅธ ์‘๋‹ต
  • ๐ŸŒ ๋‹ค๊ตญ์–ด ์ง€์›: ํ•œ๊ตญ์–ด, ์˜์–ด, ์ค‘๊ตญ์–ด, ์ผ๋ณธ์–ด ๋“ฑ 10๊ฐœ ์–ธ์–ด
  • ๐Ÿ“œ Apache 2.0 ๋ผ์ด์„ ์Šค: ์ƒ์—…์  ์‚ฌ์šฉ ๊ฐ€๋Šฅ

์„ค์น˜

```bash pip install davinci-voice ```

๋น ๋ฅธ ์‹œ์ž‘

```python import torch from davinci_voice import DavinciVoiceModel

๋ชจ๋ธ ๋กœ๋“œ

model = DavinciVoiceModel.from_pretrained( "andrewkim80/davinci-voice", device_map="cuda:0", dtype=torch.bfloat16, attn_implementation="flash_attention_2", )

์Œ์„ฑ ํด๋กœ๋‹

audio_list, sample_rate = model.generate_voice_clone( text="์•ˆ๋…•ํ•˜์„ธ์š”, ๋‹ค๋นˆ์น˜ ๋ณด์ด์Šค์ž…๋‹ˆ๋‹ค.", ref_audio="path/to/reference.wav", x_vector_only_mode=True, )

์ €์žฅ

import soundfile as sf sf.write("output.wav", audio_list[0], sample_rate) ```

์„ฑ๋Šฅ

์ง€ํ‘œ ๊ฐ’
TTFA (Time To First Audio) ~97ms
RTF (Real-Time Factor) < 1.0
MOS (Mean Opinion Score) 4.6
WER (Word Error Rate) 1.8%

๋ผ์ด์„ ์Šค

Apache License 2.0

๊ฐ์‚ฌ์˜ ๋ง

์ด ํ”„๋กœ์ ํŠธ๋Š” Qwen3-TTS๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•ฉ๋‹ˆ๋‹ค.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support