QuarkAudio/HCodec-1.5-adaptive
Audio-to-Audio
•
Updated
•
3
Audio Tokenizer Speech enhancement Audio Generation
This project contains a series of works developed for audio (including speech, music, and general audio events) processing and generation, which helps reproducible research in the field of audio. The target of QuarkAudio is to explore a unified framework to handle different audio processing and generation tasks, including:
🚀 Key Highlights:
📄 Paper: arXiv:2510.20441 | 🎤 Listen: Demo Page | 🤗 Model: Hugging Face Spaces
| Task | Full Name | Status | Description |
|---|---|---|---|
| SR | Speech Restoration | ⛳ supported | Recover clean speech from corrupted inputs (e.g., noise, reverb, packet loss) |
| TSE | Target Speaker Extraction | ⛳ supported | Extract target speaker using reference enrollment audio |
| SS | Speech Separation | ⛳ supported | Separate mixed speakers or sound sources |
| VC | Voice Conversion | ⛳ supported | Convert the speaker identity of input speech while preserving linguistic content |
| LASS | Language-Queried Audio Source Separatio | ⛳ supported | Separate sound sources based on natural language queries (e.g., "remove the man's voice") |
| CODEC | Audio Tokenization | ⛳ supported | Encode speech into compact discrete tokens and reconstruct high-fidelity audio via decoding |
| AE | Audio Editing | ⛳ supported | Edit spoken content by inserting, deleting, or substituting words/phrases in the audio domain |
| TTA | Text to Audio | ⏳ Developing | Generate speech or environmental sounds directly from text prompts (upcoming in next release) |
| AEC | Acoustic Echo Cancellation | ⏳ Developing | Remove echo artifacts in teleconferencing scenarios (upcoming in next release) |
In addition to the frameworks for specific audio tasks, QuarkAudio also provides works involving neural audio codec (NAC), which is the fundamental module to combine audio modality with language models.