File size: 3,537 Bytes
46d4167 6ba5117 387b4d5 46d4167 cb2ddc2 46d4167 cb2ddc2 46d4167 cb2ddc2 46d4167 cb2ddc2 46d4167 cb2ddc2 46d4167 cb2ddc2 46d4167 cb2ddc2 46d4167 cb2ddc2 46d4167 cb2ddc2 46d4167 cb2ddc2 46d4167 cb2ddc2 46d4167 cb2ddc2 46d4167 cb2ddc2 46d4167 cb2ddc2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 | ---
license: apache-2.0
library_name: pytorch
tags:
- text-to-speech
- speech-synthesis
- discrete-speech-synthesis
- neural-codec-language-model
- spoof-detection
- hierarchical-decoding
- pytorch
---
# MSpoofTTS Discriminator Checkpoints
This repository provides the discriminator checkpoints used in **MSpoofTTS: Hierarchical Decoding for Discrete Speech Synthesis with Multi-Resolution Spoof Detection**.
Paper: [Hierarchical Decoding for Discrete Speech Synthesis with Multi-Resolution Spoof Detection](https://arxiv.org/abs/2603.05373)
Demo: https://danny-nus.github.io/MSpoofTTS.github.io/
This repository is intended as a **checkpoint hosting repository**. The discriminator architecture definitions are not included here. Please use these checkpoints together with the official MSpoofTTS codebase.
## Checkpoints
| File | Model Type | Segment Length | Scale |
|---|---|---:|---:|
| `checkpoints/segment_len50.ckpt` | SegmentTokenDiscriminator | 50 | - |
| `checkpoints/segment_len25.ckpt` | SegmentTokenDiscriminator | 25 | - |
| `checkpoints/segment_len10.ckpt` | SegmentTokenDiscriminator | 10 | - |
| `checkpoints/strided_seg50_scale10.ckpt` | StridedSegmentTokenDiscriminator | 50 | 10 |
| `checkpoints/strided_seg50_scale25.ckpt` | StridedSegmentTokenDiscriminator | 50 | 25 |
## Model Configuration
All discriminators use the following base configuration:
```python
vocab_size = 65536
d_model = 256
nhead = 8
num_layers = 4
dim_feedforward = 1024
dropout = 0.1
```
The segment-level discriminators use `segment_len` values of 10, 25, and 50.
The strided discriminators use `segment_len=50` with scales 10 and 25.
## Usage
Install the Hugging Face Hub package:
```bash
pip install -U huggingface_hub
```
Download a checkpoint:
```python
from huggingface_hub import hf_hub_download
repo_id = "Chanson-0803/MSpoofTTS"
ckpt_path = hf_hub_download(
repo_id=repo_id,
filename="checkpoints/segment_len50.ckpt",
repo_type="model",
)
print(ckpt_path)
```
Then load the checkpoint using the corresponding discriminator class from the MSpoofTTS codebase:
```python
import torch
# Import this from the official MSpoofTTS codebase.
# from your_mspoof_code import SegmentTokenDiscriminator
state = torch.load(ckpt_path, map_location="cpu")
model.load_state_dict(state["model_state_dict"])
model.eval()
```
For hierarchical decoding, use the following checkpoint files:
```python
checkpoint_files = {
"segment_len50": "checkpoints/segment_len50.ckpt",
"segment_len25": "checkpoints/segment_len25.ckpt",
"segment_len10": "checkpoints/segment_len10.ckpt",
"strided_seg50_scale10": "checkpoints/strided_seg50_scale10.ckpt",
"strided_seg50_scale25": "checkpoints/strided_seg50_scale25.ckpt",
}
```
## Intended Use
These checkpoints are intended for research on discrete speech synthesis, neural codec language models, inference-time decoding guidance, spoof detection for generated speech tokens, and hierarchical multi-resolution decoding.
## Limitations
These checkpoints are designed for the speech-token vocabulary and discriminator architectures used in MSpoofTTS. They may not be directly compatible with other codec tokenizers, vocabulary layouts, or speech language models without adaptation.
## Citation
```bibtex
@article{zhao2026hierarchical,
title={Hierarchical Decoding for Discrete Speech Synthesis with Multi-Resolution Spoof Detection},
author={Zhao, Junchuan and Vu, Minh Duc and Wang, Ye},
journal={arXiv preprint arXiv:2603.05373},
year={2026}
}
```
|