QuarkAudio
/

HCodec-1.5-adaptive

Model card Files Files and versions

HCodec-1.5-adaptive / README.md

opsam's picture

Update README.md

11107f8 verified about 13 hours ago

|

history blame contribute delete

2.78 kB

	---
	license: apache-2.0
	language:
	- en
	pipeline_tag: audio-to-audio
	---

	# QuarkAudio-HCodec-1.5: A Unified Discrete Audio Tokenizer with adaptive frame rate for High-Fidelity, Multitask Audio Generation

	<p align="center">
	<a href="https://arxiv.org/pdf/2512.20151">
	<img src="https://img.shields.io/badge/Paper-ArXiv-red.svg" alt="Paper">
	</a>
	<a href="https://github.com/alibaba/unified-audio/tree/main/QuarkAudio-UniSE">
	<img src="https://img.shields.io/badge/GitHub-Code-green.svg" alt="GitHub">
	</a>
	<a href="https://github.com/alibaba/unified-audio/tree/main/QuarkAudio-HCodec/HCodec-1.5/">
	<img src="https://img.shields.io/badge/Model-Hugging%20Face-yellow.svg" alt="Hugging Face">
	</a>
	<a href="https://www.modelscope.cn/models/QuarkAudio/QuarkAudio-HCodec/">
	<img src="https://img.shields.io/badge/Model-%20%E9%AD%94%E6%90%AD-orange.svg" alt="ModelScope">
	</a>
	</p>

	<p align="center">
	<a href="https://arxiv.org/pdf/2512.20151"><img src="HCodec-1.5.png" width="70%" /></a>
	</p>


	## 🎯 Quick Start: Run Inference in 3 Minutes
	## 1. Installation
	1. Install dependencies from requirement.txt via pypi
	2. Download pretrained weights from Huggingface 🤗: [QuarkAudio/HCodec-1.5-adaptive](https://huggingface.co/QuarkAudio/HCodec-1.5-adaptive) and save them to ./checkpoints/
	3. confirm the `ckpt_path` in file `conf/config_adaptive_v3.yaml` is valid


	### 2. Clone Repository

	```bash
	git clone https://github.com/alibaba/unified-audio.git
	cd QuarkAudio-HCodec
	```

	### 3. Create a Conda environment and install dependencies

	```bash
	conda create -n unise python=3.10
	conda activate unise
	pip install -r requirements.txt
	```

	## 4. Tokenizer

	```bash
	#!/bin/bash

	python audio_tokenizer.py
	```

	## 5. Optional configuration
	+ Customize your testing options about adaptive frame rate

	```yaml
	# hyperparameter configuration in conf/config_adaptive_v3.yaml

	training: false # keep false when testing
	use_similarity_alignment: true
	use_dynamic_similarity_threshold: false
	infer_using_dynamic_threshold: true # work when manual_threshold is null
	similarity_threshold: 0.7
	similarity_threshold_lower: 0.7
	similarity_threshold_upper: 1.0 # valid interval of dynamic threshold when 'infer_using_dynamic_threshold' turns on
	max_tokens_per_group: 8
	manual_threshold: 0.6 # set to a fixed value when evaluate specific threshold
	```

	## 😘 Acknowlegement
	We would like to thank the great work of following projects:

	- The adaptive mechanism implementation is based on the work from [FlexiCodec](https://github.com/amphionspace/FlexiCodec) and [VARSTok](https://github.com/FunAudioLLM/FunResearch/tree/main/VARSTok).
	- Transformer implementation is based on the work from [Mimi Codec](https://github.com/kyutai-labs/moshi)