CAR-T2M — Small Unconstrained GPT (4L/512D/8H)

Text-to-motion model trained on the 7k-clip subset of BONES-seed. Unconstrained baseline: text caption -> motion, no waypoint conditioning.

Files

stats/{Mean,Std,ActiveDims,ConstFill}.npy — dataset stats used to normalize 310-dim velocity-space features at inference time.
vqvae/net_best_fid.pth — frozen VQ-VAE tokenizer (codebook 512x512, 4x temporal downsample). Required for both encoding and decoding.
gpt/net_last.pth — final iter-100k checkpoint of the unconstrained GPT. (net_best_fid.pth is iter ~5k due to a CE-gating quirk and is not the trained model — net_last.pth is the one to use.)

Architecture


GPT layers / dim / heads	4 / 512 / 8
FFN multiplier	4
Block size (tokens)	51 (~204 frames @ 30 fps after 4x downsample)
VQ-VAE codebook	512 codes x 512 dim
Feature dim	310 (pruned velocity-space SOMA-30)

Loading

from huggingface_hub import snapshot_download
local = snapshot_download("mpilligua/car-t2m-small-unconstrained")
# Then load like any local checkpoint dir; see CAR-T2M demo/inference.py.

Source code

Trainer + inference: https://github.com/mpilligua/CAR-T2M (branch refactor).

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support