| --- |
| license: mit |
| library_name: pytorch |
| tags: |
| - motion-generation |
| - text-to-motion |
| - humanml3d |
| - controllable-generation |
| - kv-control |
| pipeline_tag: text-to-motion |
| --- |
| |
| # KV-Control (T-Concat v4 backbone) |
|
|
| Sparse-keyframe, multi-joint controllable text-to-motion generation. The |
| repository at [github.com/Tevior/KV-Control](https://github.com/Tevior/KV-Control) |
| contains the full training and inference code. |
|
|
| ## What is here |
|
|
| | Path | Content | Size | |
| |---|---|---| |
| | `base_t_concat_v4/model/net_best_fid.tar` | Pre-trained T-Concat v4 masked-transformer base (the paper main backbone) | 168 MB | |
| | `kv_control/model/net_best_kps.tar` | KV-Control adapter trained on the base above | 520 MB | |
| | `vqvae/net_best_fid.pth` | Part-aware VQ-VAE tokenizer (128 codes Γ 6 parts) | 236 MB | |
| | `vqvae/skeleton_partition.json` | Skeleton partition for the part-aware VQ | 1 KB | |
| | `stats/{mean,std}.npy` | Normalization stats matching the released VQ | 4 KB | |
| | `clip/ViT-B-32.pt` | OpenAI CLIP ViT-B/32 visual + text encoder | 336 MB | |
| | `t2m/Comp_v6_KLD005/opt.txt + meta/` | Frozen evaluation encoder config & stats | 3 KB | |
| | `t2m/text_mot_match/model/finest.tar` | Pre-trained text-motion eval encoder (Guo et al., 2022) | 235 MB | |
| | `t2m/length_estimator/model/finest.tar` | Pre-trained motion-length predictor | 1.7 MB | |
| | `aux/body_models/` | SMPL neutral mesh + face / J_regressor (SMPL license) | 234 MB | |
| | `aux/glove/` | Vocab files for the length estimator | 10 MB | |
| |
| ## How to use |
| |
| ```bash |
| git clone https://github.com/Tevior/KV-Control.git |
| cd KV-Control |
| bash scripts/download_checkpoints.sh # populates checkpoints/, aux/ β glove/, body_models/ |
| ``` |
| |
| Refer to the GitHub README for installation and quick-start commands. |
| |
| ## Licenses |
| |
| * Our weights (`base_t_concat_v4`, `kv_control`, `vqvae`, `stats`) β **MIT**. |
| * CLIP ViT-B/32 β released by OpenAI under MIT. |
| * SMPL body model under `aux/body_models/` β original SMPL license (research-only). |
| * Text-motion eval encoder / length estimator under `t2m/` β re-distributed |
| from the HumanML3D / Guo et al. 2022 release for reproducibility. |
|
|
| ## Citation |
|
|
| ```bibtex |
| @article{kvcontrol2026, |
| title = {KV-Control: Sparse-Keyframe Multi-Joint Text-to-Motion Generation}, |
| author = {... (under review) ...}, |
| year = {2026}, |
| } |
| ``` |
|
|