Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing
    • Website
      • Tasks
      • HuggingChat
      • Collections
      • Languages
      • Organizations
    • Community
      • Blog
      • Posts
      • Daily Papers
      • Learn
      • Discord
      • Forum
      • GitHub
    • Solutions
      • Team & Enterprise
      • Hugging Face PRO
      • Enterprise Support
      • Inference Providers
      • Inference Endpoints
      • Storage Buckets

  • Log In
  • Sign Up

openbmb
/
VoxCPM-0.5B

Text-to-Speech
VoxCPM
PyTorch
English
Chinese
speech
speech generation
voice cloning
Model card Files Files and versions
xet
Community
17

Instructions to use openbmb/VoxCPM-0.5B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

  • Libraries
  • VoxCPM

    How to use openbmb/VoxCPM-0.5B with VoxCPM:

    import soundfile as sf
    from voxcpm import VoxCPM
    
    model = VoxCPM.from_pretrained("openbmb/VoxCPM-0.5B")
    
    wav = model.generate(
        text="VoxCPM is an innovative end-to-end TTS model from ModelBest, designed to generate highly expressive speech.",
        prompt_wav_path=None,      # optional: path to a prompt speech for voice cloning
        prompt_text=None,          # optional: reference text
        cfg_value=2.0,             # LM guidance on LocDiT, higher for better adherence to the prompt, but maybe worse
        inference_timesteps=10,   # LocDiT inference timesteps, higher for better result, lower for fast speed
        normalize=True,           # enable external TN tool
        denoise=True,             # enable external Denoise tool
        retry_badcase=True,        # enable retrying mode for some bad cases (unstoppable)
        retry_badcase_max_times=3,  # maximum retrying times
        retry_badcase_ratio_threshold=6.0, # maximum length restriction for bad case detection (simple but effective), it could be adjusted for slow pace speech
    )
    
    sf.write("output.wav", wav, 16000)
    print("saved: output.wav")
  • Notebooks
  • Google Colab
  • Kaggle
New discussion
Resources
  • PR & discussions documentation
  • Code of Conduct
  • Hub documentation

Multilingual support

πŸ‘€ 1
1
#16 opened 8 months ago by
evewashere

ReadMe-.md

3
#15 opened 8 months ago by
BOGDANIMAL

Update README.md

#14 opened 8 months ago by
BOGDANIMAL

Update README.md

#13 opened 8 months ago by
BOGDANIMAL

Any way to support timing and text highlight for TTS?

#12 opened 8 months ago by
randomantlab25

Mac mps support?

6
#11 opened 8 months ago by
pylotlight

Adding `safetensors` variant of this model

πŸ‘ 2
#10 opened 8 months ago by
SFconvertbot

Finetune the model

1
#9 opened 8 months ago by
Suraj0295

modelscope.cn

2
#8 opened 8 months ago by
anujchopra

Incredible work, and just a few questions!

2
#6 opened 8 months ago by
MRU4913

ComfyUI integration

πŸ‘πŸ‘€ 3
#5 opened 8 months ago by
Wildminder

Adding `safetensors` variant of this model

πŸ‘ 1
#3 opened 8 months ago by
SFconvertbot

Local Installation Video and Testing - Step by Step

πŸ‘ 5
#2 opened 8 months ago by
fahdmirzac

Failure due to unnecessary dependency

9
#1 opened 8 months ago by
notafraud
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs