Instructions to use andreaskoepf/cosmos3-dk1-cartesian with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Cosmos
How to use andreaskoepf/cosmos3-dk1-cartesian with Cosmos:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- LeRobot
How to use andreaskoepf/cosmos3-dk1-cartesian with LeRobot:
- Notebooks
- Google Colab
- Kaggle
license: other
license_name: openmdw-1.1
license_link: LICENSE
base_model: nvidia/Cosmos3-Nano
tags:
- robotics
- cosmos
- world-model
- action
- dk1
- lerobot
cosmos3-dk1-cartesian
Fine-tune of NVIDIA Cosmos3-Nano on the DK-1 bimanual robot datamix — multi-mode world + action SFT with a cartesian end-effector action space (single-step SE(3) pose deltas, à la the Cosmos DROID layout). Checkpoint at iter 10000.
⚠️ This is a DELTA, not a standalone model. It contains only the ~1.56 B trained parameters and must be applied on top of the public
nvidia/Cosmos3-Nanobase (the other ~13.6 B params are frozen and not included here).
What's in here
| file | what |
|---|---|
cosmos3-dk1-cartesian-delta.safetensors |
the 1.555 B trained params (bf16, 3.1 GB) |
merge_delta.py |
fold this delta into a base Cosmos3-Nano → stock-architecture checkpoint |
lora_config.json |
gen-MLP LoRA config (r16 / α32 / targets) |
dk1_action_normalization_cartesian.json |
quantile q01/q99 stats for the 20-D cartesian action |
Trained parameters (365 tensors, 1,555,175,424 params):
- Gen attention
q/k/v/o_proj_moe_gen— full fine-tuned (1.51 B). Existing base modules, trained in place. - Action I/O
action2llm/llm2action/action_modality_embed— full fine-tuned (17 M). Existing base modules. - Gen MLP
mlp_moe_gen.{gate,up,down}_proj— LoRA r16 (28 M adapters,lora_*keys). The only structurally-new keys.
Compatibility (loads in the stock Cosmos framework)
The architecture is identical to Cosmos3-Nano — our additions are full-FT of existing modules or extra LoRA keys, never shape changes:
- The full-FT parts (98% of the fine-tuning) load directly into a vanilla base (same keys/shapes; overwrite values).
- The LoRA adapters are the only extra keys. Either inject LoRA (
lora_config.json) and load them, or fold them in withmerge_delta.py→ after merging the model is 100% stock architecture, no framework patches required. - The
dk1_cartesianembodiment isdomain_id = 26, an existing row of the basenum_embodiment_domains = 32embedding — no new params. To use it, just passdomain_id = 26with the 20-D action layout. - The training modes (
policy/causal_policy/forward_dynamics/inverse_dynamics) are input-masking recipes, not model features — the net is bidirectional gen-attention, so e.g.causal_policyjust means "give past frames + an RTC action prefix." Any user can run them; they need the masking logic, not special weights.
How to use
# Stock, patch-free model (recommended): fold the delta into the base.
python merge_delta.py \
--base /path/to/Cosmos3-Nano-dcp/model \
--delta cosmos3-dk1-cartesian-delta.safetensors \
--out /path/to/Cosmos3-dk1-cartesian-merged/model
For DK-1 cartesian closed-loop control, the action convention is:
- 20-D bimanual = per arm
[pos_delta(3), rot6d_delta(6), gripper(1)], left then right. - Single-step (
backward_framewise) SE(3) deltasΔT_t = T_{t-1}⁻¹ T_tof the end-effector (flange frame, OpenCV convention: z=approach/front, x=right), 6D rotation (Zhou et al. 2019); gripper is the absolute state (not a delta). - Quantile-normalized with
dk1_action_normalization_cartesian.json(grippers forced (0,1)→(−1,1)). - EE poses via FK over the DK-1 dual-arm URDF; closed-loop needs IK back to joints.
Training recipe (summary)
- Base: Cosmos3-Nano (MoT: frozen Qwen3-VL-8B reasoner + diffusion gen expert).
- Gen attention full-FT + gen MLP LoRA r16 (path-qualified to the gen expert only) + action I/O full-FT.
- Multi-mode SFT (policy / causal_policy / forward_dynamics / inverse_dynamics), RTC, JSON caption metadata + CFG dropout.
- 21-source DK-1 datamix, 480p, chunk_length 32,
action_loss_weight = 2, ~0.18 epoch at 25k steps. - Full project + training code: https://github.com/andreaskoepf/cosmos3-dk1
License
OpenMDW-1.1 (inherits from Cosmos3-Nano). The DK-1 URDF used for FK is Apache-2.0.