Raw Robot Video to VLA-Ready Training Data: Annotating LeRobot Datasets with Nomadic and HuggingFace Buckets

Community Article Published March 21, 2026

TL;DR β€” We show how to go from raw robotics video stored in Hugging Face Buckets to richly annotated, VLA-training-ready data using Nomadic.

The Data Quality Problem in Robotics

πŸ’‘ "Generalization is not just a model propertyβ€”it's a data phenomenon." β€” LeRobot Community Datasets Blog

Robotic VLAs are getting better fast, but they're only as good as their training data. The LeRobot Community Datasets Blog identifies several recurring quality issues that exist across community-contributed datasets:

  • Incomplete or inconsistent task annotations: descriptions that are empty, too short ("Hold", "Up"), or meaningless ("task desc")
  • Missing temporal detail: no sub-task segmentation, so complex multi-step episodes are described by a single sentence
  • Object misclassification: generic labels that don't distinguish between visually similar objects

These issues are widespread across community datasets. Fixing them by hand doesn't scale to thousands of episodes, and running videos through a general-purpose VLM doesn't produce the granularity that robotics training needs.

The solution lies in better tools.

Nomadic: The Data Engine Built for Physical AI

Nomadic can curate training sets from your video archives with:

  • Detailed timestamps of every action, segmented at the sub-task level
  • Accurate object identification that distinguishes visually similar objects (a chocolate chip cookie from a shortbread biscuit, a Phillips head screwdriver from a flathead)
  • Spatial tracking including 3D object positions, inferred from RGB video
  • Scene segmentation that maps to the granularity VLA training actually requires

You can see this in action here where Nomadic is segmenting robotic manipulation actions from a LeRobot Piper video.

ezgif-379af299ee64a6ce

HuggingFace Buckets: The Storage Layer

HuggingFace Storage Buckets is a mutable, S3-like object storage on the Hub. Built on Xet, Buckets give you S3-like storage for the artifacts that don't belong in Git: checkpoints, intermediate data, logs, and in our case, raw robotics video.

Robotics recordings produce gigabytes of video that need to be accessible to your team and your training pipeline. With Buckets, that video lives on the Hub alongside your datasets and models, rather than in a separate bucket disconnected from the rest of your workflow.

Buckets support CDN pre-warming across AWS and GCP regions, so your data is already close to compute when training starts.

This is where our pipeline begins! Raw robotics video sitting in a HF Bucket.

HuggingFace Buckets x Nomadic Integration

You can connect HuggingFace Buckets to Nomadic through the UI or the SDK.

Via the Nomadic UI, go to Integrations and add your HF Bucket. Once connected, videos from that bucket appear directly in the Nomadic platform for analysis.

HuggingFace Bucket integration in Nomadic UI

Via the SDK, πŸ““Run the pipeline end-to-end in Colab

Nomadic natively accepts hf://buckets/ URIs for public buckets.

# Install dependencies
!pip install nomadicml 

import os
from nomadicml import NomadicML
from nomadicml.video import AnalysisType, CustomCategory

client = NomadicML(api_key="NOMADICML_API_KEY")

# Upload directly from a public HF Bucket
response = client.upload("hf://buckets/your-org/robotics-data/episode_001.mp4")

For private buckets, register the integration once, then upload the same way:

# One-time setup for private buckets
client.cloud_integrations.add_hf_bucket(
    name="hf_integration",
    bucket="your-org/your-bucket",
    token="HF_TOKEN",
)

# Then upload as normal
response = client.upload("hf://buckets/your-org/your-bucket/episode_001.mp4")

Analyze with robotics context

analysis = client.analyze(
    response["video_id"],
    analysis_type=AnalysisType.ASK,
    custom_event="Find all robot manipulation actions. Describe specific objects and movements.",
    custom_category=CustomCategory.ROBOTICS,
)

Inspect the Results

Results are available as JSON or CSV through the SDK, or you can view and review them in the Nomadic UI.

Nomadic generating 3D object trajectories from raw video
Nomadic can generate 3D object trajectories from raw video
[
  {
    "start_time": 1.0,
    "end_time": 3.2,
    "label": "Right arm reaches toward chocolate chip cookie on plate"
  },
  {
    "start_time": 3.2,
    "end_time": 5.8,
    "label": "Right arm grasps chocolate chip cookie and lifts from plate"
  },
  {
    "start_time": 5.8,
    "end_time": 8.1,
    "label": "Right arm places cookie into blue bowl"
  },
  {
    "start_time": 8.1,
    "end_time": 9.4,
    "label": "Right arm retracts to home position"
  }
]

Mapping Nomadic Output to LeRobot v3 Dataset Format

Nomadic's timestamped output can be converted into LeRobot v3-compatible dataset artifacts that you can load directly into a Hugging Face training loop. The task descriptions become the language conditioning that VLAs like SmolVLA and Ο€β‚€ use during training, and the timestamps map to frame-level annotations in the dataset's Parquet files.

Why This Matters for VLA Training

The LeRobot community is scaling data collection rapidly: thousands of datasets from hundreds of contributors. But scaling collection without scaling curation creates a quality bottleneck.

Better annotations also unlock multi-dataset training. Right now, two datasets collected in different labs might describe the same task completely differently, or not at all. Automated annotation can standardize these descriptions across datasets, so VLAs can train across all of them at once, learning from the combined diversity of environments, objects, and robot embodiments.

Nomadic is solving this bottleneck by pushing the frontiers of fine-grained motion understanding with visual agents that go beyond standalone VLMs, orchestrating specialized tools for 3D spatial awareness, motion understanding, and temporal reasoning over video.

We're looking forward to a future where the community collects data, HuggingFace stores and serves it, Nomadic annotates and curates it, and LeRobot users train on it.

Get Started

  • πŸš€ Nomadic Platform: app.nomadicml.com/live-demo β€” Try the Nomadic platform to annotate and curate your robotics data
  • πŸ““ Full notebook: Colab β€” Run the complete Buckets β†’ Nomadic β†’ annotated data pipeline
  • πŸ€— HuggingFace Buckets: huggingface.co/storage β€” Set up your own robotics data bucket
  • πŸ”§ Nomadic SDK: docs.nomadicml.com β€” API docs and quickstart

Community

Sign up or log in to comment