CAR-T2M β Small Unconstrained GPT (4L/512D/8H)
Text-to-motion model trained on the 7k-clip subset of BONES-seed. Unconstrained baseline: text caption -> motion, no waypoint conditioning.
Files
stats/{Mean,Std,ActiveDims,ConstFill}.npyβ dataset stats used to normalize 310-dim velocity-space features at inference time.vqvae/net_best_fid.pthβ frozen VQ-VAE tokenizer (codebook 512x512, 4x temporal downsample). Required for both encoding and decoding.gpt/net_last.pthβ final iter-100k checkpoint of the unconstrained GPT. (net_best_fid.pthis iter ~5k due to a CE-gating quirk and is not the trained model βnet_last.pthis the one to use.)
Architecture
| GPT layers / dim / heads | 4 / 512 / 8 |
| FFN multiplier | 4 |
| Block size (tokens) | 51 (~204 frames @ 30 fps after 4x downsample) |
| VQ-VAE codebook | 512 codes x 512 dim |
| Feature dim | 310 (pruned velocity-space SOMA-30) |
Loading
from huggingface_hub import snapshot_download
local = snapshot_download("mpilligua/car-t2m-small-unconstrained")
# Then load like any local checkpoint dir; see CAR-T2M demo/inference.py.
Source code
Trainer + inference: https://github.com/mpilligua/CAR-T2M (branch refactor).
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support