CAR-T2M β€” Small Unconstrained GPT (4L/512D/8H)

Text-to-motion model trained on the 7k-clip subset of BONES-seed. Unconstrained baseline: text caption -> motion, no waypoint conditioning.

Files

  • stats/{Mean,Std,ActiveDims,ConstFill}.npy β€” dataset stats used to normalize 310-dim velocity-space features at inference time.
  • vqvae/net_best_fid.pth β€” frozen VQ-VAE tokenizer (codebook 512x512, 4x temporal downsample). Required for both encoding and decoding.
  • gpt/net_last.pth β€” final iter-100k checkpoint of the unconstrained GPT. (net_best_fid.pth is iter ~5k due to a CE-gating quirk and is not the trained model β€” net_last.pth is the one to use.)

Architecture

GPT layers / dim / heads 4 / 512 / 8
FFN multiplier 4
Block size (tokens) 51 (~204 frames @ 30 fps after 4x downsample)
VQ-VAE codebook 512 codes x 512 dim
Feature dim 310 (pruned velocity-space SOMA-30)

Loading

from huggingface_hub import snapshot_download
local = snapshot_download("mpilligua/car-t2m-small-unconstrained")
# Then load like any local checkpoint dir; see CAR-T2M demo/inference.py.

Source code

Trainer + inference: https://github.com/mpilligua/CAR-T2M (branch refactor).

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support