metadata
license: apache-2.0
language:
- en
pipeline_tag: audio-to-audio
QuarkAudio-HCodec-1.5: A Unified Discrete Audio Tokenizer with adaptive frame rate for High-Fidelity, Multitask Audio Generation
π― Quick Start: Run Inference in 3 Minutes
1. Installation
- Install dependencies from requirement.txt via pypi
- Download pretrained weights from Huggingface π€: QuarkAudio/HCodec-1.5-adaptive and save them to ./checkpoints/
- confirm the
ckpt_pathin fileconf/config_adaptive_v3.yamlis valid
2. Clone Repository
git clone https://github.com/alibaba/unified-audio.git
cd QuarkAudio-HCodec
3. Create a Conda environment and install dependencies
conda create -n unise python=3.10
conda activate unise
pip install -r requirements.txt
4. Tokenizer
#!/bin/bash
python audio_tokenizer.py
5. Optional configuration
- Customize your testing options about adaptive frame rate
# hyperparameter configuration in conf/config_adaptive_v3.yaml
training: false # keep false when testing
use_similarity_alignment: true
use_dynamic_similarity_threshold: false
infer_using_dynamic_threshold: true # work when manual_threshold is null
similarity_threshold: 0.7
similarity_threshold_lower: 0.7
similarity_threshold_upper: 1.0 # valid interval of dynamic threshold when 'infer_using_dynamic_threshold' turns on
max_tokens_per_group: 8
manual_threshold: 0.6 # set to a fixed value when evaluate specific threshold
π Acknowlegement
We would like to thank the great work of following projects:
- The adaptive mechanism implementation is based on the work from FlexiCodec and VARSTok.
- Transformer implementation is based on the work from Mimi Codec
