File size: 7,400 Bytes
5914410 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 | ---
license: mit
language:
- en
library_name: pytorch
tags:
- rigging
- skinning
- skeleton
- autoregressive
- fsq
- vae
- 3d
- animation
- VAST
- Tripo
---
# SkinTokens
Pretrained checkpoints for **SkinTokens: A Learned Compact Representation for Unified Autoregressive Rigging**.
[](https://zjp-shadow.github.io/works/SkinTokens/)
[](https://arxiv.org/abs/2602.04805)
[](https://github.com/VAST-AI-Research/SkinTokens)
[](https://www.tripo3d.ai)
This repository stores the model checkpoints used by the [SkinTokens codebase](https://github.com/VAST-AI-Research/SkinTokens), including:
- the **FSQ-CVAE** that learns the *SkinTokens* discrete representation of skinning weights, and
- the **TokenRig** autoregressive Transformer (Qwen3-0.6B architecture, GRPO-refined) that jointly generates skeletons and SkinTokens from a 3D mesh.
SkinTokens is the successor to [UniRig](https://github.com/VAST-AI-Research/UniRig) (SIGGRAPH '25). While UniRig treats skeleton and skinning as decoupled stages, SkinTokens unifies both into a single autoregressive sequence via learned discrete skin tokens, yielding **98%β133%** improvement in skinning accuracy and **17%β22%** improvement in bone prediction over state-of-the-art baselines.
## What Is Included
The repository is organized exactly like the `experiments/` folder expected by the main SkinTokens codebase:
```text
experiments/
βββ articulation_xl_quantization_256_token_4/
β βββ grpo_1400.ckpt # TokenRig autoregressive rigging model (GRPO-refined)
βββ skin_vae_2_10_32768/
βββ last.ckpt # FSQ-CVAE for SkinTokens (skin-weight tokenizer)
```
Approximate total size: about **1.6 GB**.
> The training data (`ArticulationXL` splits and processed meshes) used to train these checkpoints will be released separately in a future update.
## Checkpoint Overview
### SkinTokens β FSQ-CVAE (skin-weight tokenizer)
**File:** `experiments/skin_vae_2_10_32768/last.ckpt`
Compresses sparse skinning weights into discrete *SkinTokens* using a Finite Scalar Quantized Conditional VAE with codebook levels `[8, 8, 8, 5, 5, 5]` (64,000 entries). Used both to tokenize ground-truth weights during training and to decode TokenRig's output tokens back into per-vertex skinning at inference.
### TokenRig β autoregressive rigging model
**File:** `experiments/articulation_xl_quantization_256_token_4/grpo_1400.ckpt`
Qwen3-0.6B-based Transformer trained on a composite of **ArticulationXL 2.0 (70%)**, **VRoid Hub (20%)**, and **ModelsResource (10%)**, with quantization 256 and 4 skin tokens per bone, then refined with GRPO for 1,400 steps. **This is the recommended checkpoint** β it generates the skeleton and the SkinTokens in a single unified sequence.
> Both checkpoints are required for end-to-end inference: TokenRig generates the rig as a token sequence, and the FSQ-CVAE decoder turns SkinTokens back into dense per-vertex skinning weights.
## How To Use
The easiest way is to use the helper script in the main SkinTokens codebase, which downloads both checkpoints and the required Qwen3-0.6B config into the expected layout:
```bash
git clone https://github.com/VAST-AI-Research/SkinTokens.git
cd SkinTokens
python download.py --model
```
### Option 1 β Download with `hf` CLI
```bash
hf download VAST-AI/SkinTokens \
--repo-type model \
--local-dir .
```
### Option 2 β Download with `huggingface_hub` (Python)
```python
from huggingface_hub import snapshot_download
snapshot_download(
repo_id="VAST-AI/SkinTokens",
repo_type="model",
local_dir=".",
local_dir_use_symlinks=False,
)
```
### Option 3 β Download individual files
```python
from huggingface_hub import hf_hub_download
tokenrig_ckpt = hf_hub_download(
repo_id="VAST-AI/SkinTokens",
filename="experiments/articulation_xl_quantization_256_token_4/grpo_1400.ckpt",
)
skin_vae_ckpt = hf_hub_download(
repo_id="VAST-AI/SkinTokens",
filename="experiments/skin_vae_2_10_32768/last.ckpt",
)
```
### Option 4 β Web UI
Browse the [Files and versions](https://huggingface.co/VAST-AI/SkinTokens/tree/main) tab and download the folders manually, keeping the `experiments/...` layout intact.
After download, you should have:
```text
experiments/articulation_xl_quantization_256_token_4/grpo_1400.ckpt
experiments/skin_vae_2_10_32768/last.ckpt
```
## Run TokenRig With These Weights
Once the `experiments/` folder is in place (and the environment is installed per the [GitHub README](https://github.com/VAST-AI-Research/SkinTokens#installation)), you can run:
```bash
python demo.py --input examples/giraffe.glb --output results/giraffe.glb --use_transfer
```
Or launch the Gradio demo:
```bash
python demo.py
```
Then open `http://127.0.0.1:1024` in your browser.
## Notes
- **Keep the directory names unchanged.** The SkinTokens code expects the exact `experiments/.../*.ckpt` layout shown above.
- **TokenRig requires both checkpoints.** `grpo_1400.ckpt` generates discrete tokens; the SkinTokens FSQ-CVAE (`last.ckpt`) is needed to decode them into per-vertex skinning weights.
- **Qwen3-0.6B architecture.** TokenRig adopts the Qwen3-0.6B architecture (GQA + RoPE) for its autoregressive backbone; the [Qwen3 config](https://huggingface.co/Qwen/Qwen3-0.6B) is fetched automatically by `download.py`.
- **Hardware.** An NVIDIA GPU with at least **14 GB** of memory is required for inference.
- **Training data.** The checkpoints were trained on a composite of ArticulationXL 2.0 (70%), VRoid Hub (20%), and ModelsResource (10%); the processed data splits will be released as a separate dataset repository later.
## Related Links
- Your 3D AI workspace β **Tripo**: <https://www.tripo3d.ai>
- Project page: <https://zjp-shadow.github.io/works/SkinTokens/>
- Paper (arXiv): <https://arxiv.org/abs/2602.04805>
- Main code repository: <https://github.com/VAST-AI-Research/SkinTokens>
- Predecessor: [UniRig (SIGGRAPH '25)](https://github.com/VAST-AI-Research/UniRig)
- More from VAST-AI Research: <https://huggingface.co/VAST-AI>
## Acknowledgements
- [UniRig](https://github.com/VAST-AI-Research/UniRig) β the predecessor to this work.
- [Qwen3](https://github.com/QwenLM/Qwen3) β the LLM architecture used by the TokenRig autoregressive backbone.
- [3DShape2VecSet](https://github.com/1zb/3DShape2VecSet), [Michelangelo](https://github.com/NeuralCarver/Michelangelo) β the shape encoder backbone used by the FSQ-CVAE.
- [FSQ](https://arxiv.org/abs/2309.15505) β Finite Scalar Quantization, the discretization scheme behind SkinTokens.
- [GRPO](https://arxiv.org/abs/2402.03300) β the policy-optimization method used for RL refinement.
## Citation
If you find this work helpful, please consider citing our paper:
```bibtex
@article{zhang2026skintokens,
title = {SkinTokens: A Learned Compact Representation for Unified Autoregressive Rigging},
author = {Zhang, Jia-Peng and Pu, Cheng-Feng and Guo, Meng-Hao and Cao, Yan-Pei and Hu, Shi-Min},
journal = {arXiv preprint arXiv:2602.04805},
year = {2026}
}
```
|