Release: SemanticVLA-LIBERO checkpoint + config + dataset_statistics + model card

Browse files

Files changed (4) hide show

README.md +35 -37
config.yaml +131 -0
dataset_statistics.json +133 -0
final_model/pytorch_model.pt +3 -0

README.md CHANGED Viewed

@@ -13,9 +13,7 @@ tags:
 # SemanticVLA · LIBERO
-> 🚧 **Placeholder.** The URL is stable; checkpoints will be uploaded incrementally per the [release roadmap](https://github.com/Fei-Ni/SemanticVLA_Offcial/blob/main/docs/ROADMAP.md).
-[SemanticVLA](https://github.com/Fei-Ni/SemanticVLA_Offcial) finetuned on the [LIBERO](https://github.com/Lifelong-Robot-Learning/LIBERO) benchmark.
 ## Headline result
@@ -23,58 +21,58 @@ tags:
 |---|---:|
 | `libero_spatial` | 0.988 |
 | `libero_object`  | 0.996 |
-| `libero_goal`    | 0.986 |
-| `libero_10`      | 0.966 |
-| **4-suite mean** | **0.9840** |
-Best configuration: `TL_saembs_lw010` (trace + latent semantic output, `sa_embs` injection, LM loss weight 0.10, step 30000).
 ## Architecture
-- **Backbone**: Qwen2.5-VL-3B (with trace + latent-action semantic heads)
-- **Action head**: GR00T-style flow-matching expert (continuous action chunks)
-- **LAM tokenizer**: [`SemanticVLA-LAM` → `libero/v5`](https://huggingface.co/spikefly/SemanticVLA-LAM)
-- **Action horizon**: 16
-## Planned layout
 ```
 SemanticVLA-LIBERO/
-├── tl-saembs-lw010-best/
-│   ├── pytorch_model.pt
-│   ├── config.yaml
-│   └── model_card.md
-└── README.md
 ```
-Additional ablation variants (`L_none_lw010`, `TL_none_lw010`, `TL_saembs_lw005`, etc.) may be uploaded as additional subdirectories upon release.
 ## Sibling SemanticVLA checkpoint repos
 | Repo | Purpose |
 |---|---|
-| 🤗 [`SemanticVLA-LAM`](https://huggingface.co/spikefly/SemanticVLA-LAM) | LAM tokenizers used by this VLA |
-| 🤗 [`SemanticVLA-Bridge`](https://huggingface.co/spikefly/SemanticVLA-Bridge) | Bridge-finetuned VLA for SimplerEnv WidowX |
 ## Related resources
 - **Code**: https://github.com/Fei-Ni/SemanticVLA_Offcial
-- **Datasets**: https://hf.co/collections/spikefly/semanticvla-datasets
-- **Collection · Model Zoo**: https://hf.co/collections/spikefly/semanticvla-model-zoo
-## How to load (placeholder API)
-```python
-from huggingface_hub import hf_hub_download
-import torch
-ckpt = hf_hub_download(
-    repo_id="spikefly/SemanticVLA-LIBERO",
-    filename="tl-saembs-lw010-best/pytorch_model.pt",
-)
-state = torch.load(ckpt, map_location="cpu")
-# loader will be released with the code repo
-```
 ## Citation

 # SemanticVLA · LIBERO
+[SemanticVLA](https://github.com/Fei-Ni/SemanticVLA_Offcial) finetuned on the [LIBERO](https://github.com/Lifelong-Robot-Learning/LIBERO) benchmark. The unified OXE LAM is used as the latent-action tokenizer, and the trace + latent-action auxiliary heads are supervised in the VLM's language stream.
 ## Headline result
 |---|---:|
 | `libero_spatial` | 0.988 |
 | `libero_object`  | 0.996 |
+| `libero_goal`    | 0.974 |
+| `libero_10`      | 0.970 |
+| **4-suite mean** | **0.982** |
 ## Architecture
+| Component | Choice |
+|---|---|
+| VLM backbone | Qwen3-VL-4B-Instruct |
+| Action head | DiT-B (flow matching) |
+| LAM tokenizer | [`SemanticVLA-LAM`](https://huggingface.co/spikefly/SemanticVLA-LAM) (unified OXE LAM) |
+| Semantic supervision | Trace + latent action tokens predicted in the VLM's language stream; action decoder unmodified |
+| Latent vocabulary size | 32 |
+| Latent tokens per sample | 4 |
+| Action horizon | 8 |
+## Files
 ```
 SemanticVLA-LIBERO/
+├── README.md
+├── config.yaml              # loadable model config
+├── dataset_statistics.json  # action normalization stats
+└── final_model/
+    └── pytorch_model.pt     # policy state_dict
 ```
+## How to load
+```python
+from semanticvla.model.framework.base_framework import baseframework
+policy = baseframework.from_pretrained("final_model/pytorch_model.pt")
+policy.eval()
+```
+`baseframework.from_pretrained()` walks two directory levels up from the checkpoint file to locate `config.yaml` and `dataset_statistics.json`. The released layout follows this convention.
+To run a full LIBERO evaluation, see [`examples/LIBERO/`](https://github.com/Fei-Ni/SemanticVLA_Offcial/tree/main/examples/LIBERO) in the code repo.
 ## Sibling SemanticVLA checkpoint repos
 | Repo | Purpose |
 |---|---|
+| 🤗 [`SemanticVLA-LAM`](https://huggingface.co/spikefly/SemanticVLA-LAM) | Unified OXE LAM consumed by this policy |
+| 🤗 [`SemanticVLA-SimplerEnv`](https://huggingface.co/spikefly/SemanticVLA-SimplerEnv) | SimplerEnv WidowX policy |
 ## Related resources
 - **Code**: https://github.com/Fei-Ni/SemanticVLA_Offcial
+- **Datasets collection**: https://hf.co/collections/spikefly/semanticvla-datasets
+- **Model Zoo collection**: https://hf.co/collections/spikefly/semanticvla-model-zoo
 ## Citation

config.yaml ADDED Viewed

	@@ -0,0 +1,131 @@

+# Loadable config for SemanticVLA-LIBERO.
+#
+# Load via:
+#   from semanticvla.model.framework.base_framework import baseframework
+#   policy = baseframework.from_pretrained("final_model/pytorch_model.pt")
+#
+# The loader walks two directory levels up from the checkpoint file to locate
+# this `config.yaml` and the sibling `dataset_statistics.json`.
+seed: 42
+framework:
+  name: SemanticVLA
+  qwenvl:
+    base_vlm: Qwen/Qwen3-VL-4B-Instruct
+    attn_implementation: flash_attention_2
+    vl_hidden_dim: 2048
+  dino:
+    dino_backbone: dinov2_vits14
+  action_model:
+    action_model_type: DiT-B
+    action_hidden_dim: 1024
+    hidden_size: 1024
+    add_pos_embed: true
+    max_seq_len: 1024
+    action_dim: 7
+    state_dim: 7
+    future_action_window_size: 7
+    action_horizon: 8
+    past_action_window_size: 0
+    repeated_diffusion_steps: 8
+    noise_beta_alpha: 1.5
+    noise_beta_beta: 1.0
+    noise_s: 0.999
+    num_timestep_buckets: 1000
+    num_inference_timesteps: 4
+    num_target_vision_tokens: 32
+    diffusion_model_cfg:
+      cross_attention_dim: 2048
+      dropout: 0.2
+      final_dropout: true
+      interleave_self_attention: true
+      norm_type: ada_norm
+      num_layers: 16
+      output_dim: 1024
+      positional_embeddings: null
+      progress_dim: 0
+      trace_dim: 0
+    trace:
+      injection_mode: none
+      hidden_dim: 256
+      num_layers: 3
+      num_heads: 8
+      window_size: 12
+      num_tokens: 4
+      dropout: 0.1
+      num_anchor_points: 4
+      lm_aux_loss: false
+      aux_loss_weight: 0.1
+      coord_range: 1000
+      prompt_style: plain
+    semantic_output:
+      enabled: true
+      mode: trace_latent
+      order: trace_latent
+      lm_loss_weight: 0.1
+      latent_vocab_size: 32
+      latent_num_tokens: 4
+      latent_token_prefix: LAM
+      prompt_style: plain
+      trace_anchor_points: 4
+      parse_trace_for_decoder: false
+      trainable_token_rows: false
+  reduce_in_full_precision: true
+datasets:
+  vla_data:
+    dataset_py: lerobot_datasets
+    data_root_dir: /path/to/libero_lerobot
+    data_mix: libero_all
+    action_type: delta_qpos
+    CoT_prompt: Your task is {instruction}. To identify the key objects for your task. Locate their bounding boxes in [x1,y1,x2,y2] format.
+    CoT_answer: bbox
+    default_image_resolution: [3, 224, 224]
+    per_device_batch_size: 16
+    load_all_data_for_training: true
+    obs: [image_0]
+    trace:
+      enabled: true
+      root: /path/to/trace_annotations/libero
+      window_size: 12
+      normalize: true
+      num_anchor_points: 4
+    latent_action_labels:
+      enabled: true
+      root: /path/to/lam_labels
+      variant: semanticvla_lam
+      strict: true
+      missing_policy: error
+      out_key: latent_action_idx
+trainer:
+  epochs: 100
+  max_train_steps: 30000
+  num_warmup_steps: 5000
+  save_interval: 5000
+  eval_interval: 2000
+  learning_rate:
+    base: 4.0e-05
+    qwen_vl_interface: 1.0e-05
+    action_model: 1.0e-04
+  lr_scheduler_type: cosine_with_min_lr
+  scheduler_specific_kwargs:
+    min_lr: 1.0e-06
+  freeze_modules: ''
+  loss_scale:
+    vla: 1.0
+    vlm: 0.1
+  max_grad_norm: 1.0
+  warmup_ratio: 0.1
+  weight_decay: 0.0
+  logging_frequency: 100
+  gradient_clipping: 1.0
+  gradient_accumulation_steps: 1
+  optimizer:
+    name: AdamW
+    betas: [0.9, 0.95]
+    eps: 1.0e-08
+    weight_decay: 1.0e-08
+  enable_gradient_checkpointing: true
+  enable_mixed_precision_training: true

dataset_statistics.json ADDED Viewed

	@@ -0,0 +1,133 @@

+{
+  "franka": {
+    "action": {
+      "mean": [
+        0.07237596483901143,
+        0.08987006871029735,
+        -0.10144743137061596,
+        -0.00045383188989944756,
+        0.006273590726777911,
+        -0.003878799732774496,
+        0.524486355483532
+      ],
+      "std": [
+        0.3498823308902479,
+        0.37794140366375184,
+        0.460084266976933,
+        0.0403885784928603,
+        0.06616144248501059,
+        0.07763074391911857,
+        0.4994683356809767
+      ],
+      "max": [
+        0.9375,
+        0.9375,
+        0.9375,
+        0.3557142913341522,
+        0.375,
+        0.375,
+        1.0
+      ],
+      "min": [
+        -0.9375,
+        -0.9375,
+        -0.9375,
+        -0.2582142949104309,
+        -0.375,
+        -0.3675000071525574,
+        0.0
+      ],
+      "q01": [
+        -0.8785714507102966,
+        -0.8758928775787354,
+        -0.9375,
+        -0.1510714292526245,
+        -0.20678570866584778,
+        -0.2742857038974762,
+        0.0
+      ],
+      "q99": [
+        0.9375,
+        0.9107142686843872,
+        0.9375,
+        0.20357142388820648,
+        0.26357144117355347,
+        0.375,
+        1.0
+      ],
+      "mask": [
+        true,
+        true,
+        true,
+        true,
+        true,
+        true,
+        false
+      ]
+    },
+    "state": {
+      "mean": [
+        -0.04889854742214084,
+        0.03689368185587227,
+        0.7890402488410473,
+        2.9771945476531982,
+        -0.1417286954820156,
+        -0.11769362539052963,
+        0.026436020154505968,
+        -0.02665513101965189
+      ],
+      "std": [
+        0.10639013941746686,
+        0.15115733130675715,
+        0.38406895599530033,
+        0.3530238395244304,
+        0.8227341427331599,
+        0.32357567121520087,
+        0.014583991652936385,
+        0.014467005007200339
+      ],
+      "max": [
+        0.21031762659549713,
+        0.39128610491752625,
+        1.3660105466842651,
+        3.6714255809783936,
+        3.560650587081909,
+        1.386339545249939,
+        0.04233968257904053,
+        0.0013633022317662835
+      ],
+      "min": [
+        -0.4828203022480011,
+        -0.3255046010017395,
+        0.008128180168569088,
+        0.35277295112609863,
+        -3.641430377960205,
+        -1.842738389968872,
+        -0.0013586411951109767,
+        -0.042040832340717316
+      ],
+      "q01": [
+        -0.42401049643754957,
+        -0.2838300323486328,
+        0.009925739830359817,
+        1.3085840785503386,
+        -2.886677579879761,
+        -1.1599004411697387,
+        0.001503719249740243,
+        -0.040336399003863335
+      ],
+      "q99": [
+        0.1530261474847791,
+        0.3629165390133857,
+        1.2910678112506866,
+        3.303542451858519,
+        2.7496529006957933,
+        0.6893712210655194,
+        0.040610933862626555,
+        -0.0015016929572448147
+      ]
+    },
+    "num_transitions": 272104,
+    "num_trajectories": 1693
+  }
+}

final_model/pytorch_model.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ee9ab7537a8b25a628ed506ea8bf347b5f24131ce82f2f001f57c2f234d65446
+size 9974427154