Hugging Face–Centric Minimal Data Stack

Single-backbone workflow for robotics datasets (manipulation, perception, reasoning, HRI) with minimal tools and frictionless integration.

{/* Stage definitions */}

Stage Definitions & Examples

Data Collection: Raw recordings from robots or simulations. Example: RGB-D video, audio, and joint states captured during human-robot interaction.
Annotation: Assign labels or semantics to collected data. Example: gesture type, emotion, manipulated object, speech act.
Curation: Filter, validate, and organize annotated data into usable splits (train/val/test). Example: remove bad frames, balance human/robot perspectives.
Publishing (Hub): Versioned dataset hosting on {link('https://huggingface.co/','Hugging Face Hub')}, with metadata and documentation. Example: pushing curated subsets for manipulation learning.
Visualization (Spaces): Interactive dashboards or viewers built in {link('https://gradio.app/','Gradio')} or {link('https://streamlit.io/','Streamlit')} for exploration or validation. Example: playback of synchronized gaze, pose, and audio segments.
Reuse & Training: Loading datasets directly via {link('https://huggingface.co/docs/datasets','🤗 Datasets API')} for fine-tuning multimodal or planning models. Example: training z_social encoders or expressive decoders.

{/* Main flow diagram */}

Robot logs (RGB-D, audio, pose)
Sim runs & demos
Interaction clips
Planning/intent traces

{link('https://labelstud.io/','Label Studio')} (self-host or cloud)
{link('https://cvat.org/','CVAT')} / {link('https://roboflow.com/','Roboflow')} (export)
Exports: COCO, JSON, CSV

{link('https://voxel51.com/','FiftyOne')}: filter, QA, splits
{link('https://cleanlab.ai/','Cleanlab')} / Pandas checks
Embed search for edge cases

{link('https://huggingface.co/','Datasets & models')} in repos
Git + LFS versioning
Private org, permissions
Tags, README, cards

{link('https://huggingface.co/spaces','Gradio/Streamlit viewers')}
Clip browser, 3D previews
Eval dashboards & demos

{/* Tool comparison */}

Comparison: Annotation & Curation Tools

Tool	Strengths	Limitations	Integration with HF
{link('https://labelstud.io/','Label Studio')}	Open source, multi-modal (image, audio, text, video). Very flexible schema; plugin ecosystem.	Requires setup for teams; interface slower with 100k+ samples.	Native {link('https://huggingface.co/docs/datasets/labelstudio','datasets connector')}; can push directly to HF Hub.
{link('https://cvat.org/','CVAT')}	Great for video and dense bounding-box/pose annotations; powerful auto-annotation tools.	Primarily vision-focused; heavier deployment (Docker).	Exports in COCO/VOC formats easily loadable with `datasets.load_dataset`.
{link('https://roboflow.com/','Roboflow')}	Cloud-based; fast web UI and built-in preprocessing and augmentation.	Closed-source, limited free tier; less flexible schemas.	Exports compatible with HF datasets; no native connector but simple upload via API.
{link('https://voxel51.com/','FiftyOne')}	Advanced filtering, visualization, embedding-based analysis.	Not for annotation itself; local-first.	Direct push/export to HF Hub for curated dataset versions.

{/* Output / training */}

Load via {link('https://huggingface.co/docs/datasets','datasets streaming')}
Fine-tune VL/VLA/ASR models
Push checkpoints to HF

{link('https://aws.amazon.com/s3/','AWS S3')} / {link('https://cloud.google.com/storage','GCS')} / {link('https://min.io/','MinIO')} for TB+ raw
Keep curated subsets on HF
Link via metadata/URIs

Repo permissions & reviews
Semantic tags & licenses
Changelogs & model cards

{/* Notes */}

Keep the workflow lean: Hugging Face Hub as the single backbone.
One annotation tool ({link('https://labelstud.io/','Label Studio')}, {link('https://cvat.org/','CVAT')}, or {link('https://roboflow.com/','Roboflow')}).
Optional curation with {link('https://voxel51.com/','FiftyOne')} before each release.
Push each validated dataset as a new HF Hub version.
Provide {link('https://huggingface.co/spaces','Spaces')} for exploration, demo, and review.

{`datasets/
  eurecat/haru-social-vla/
    README.md  # dataset card with tags + license
    data/      # small/curated samples or manifests
    annotations/
    splits/    # train/val/test lists
    scripts/   # loading + eval utils
models/
  eurecat/haru-expressive-decoder/
    README.md  # model card (training data, metrics)
    config/
    checkpoints/`}

{/* ============================= */} {/* MODEL TRAINING & REUSE STACK */} {/* ============================= */}

Hugging Face–Centric Model Lifecycle Stack

Unified workflow for model training, evaluation, storage, deployment, and reuse — using the fewest possible tools while supporting robotics and multimodal tasks.

{/* Stage definitions */}

Stage Definitions & Examples

Training: Model optimization using GPUs (local or {link('https://www.runpod.io/','RunPod')} cloud). Example: fine-tuning a multimodal encoder on robot-social datasets.
Evaluation: Measure metrics, visualize results. Example: compute CCC for valence/arousal or success rate for manipulation plans.
Storage & Versioning: Upload model checkpoints and configs to {link('https://huggingface.co/','Hugging Face Hub')} for long-term reproducibility.
Deployment: Serve models for inference in {link('https://huggingface.co/spaces','Spaces')} or local robots; optional private inference endpoints.
Local Inference (On‑Prem/Edge): Package models with {link('https://www.docker.com/','Docker')} + {link('https://fastapi.tiangolo.com/','FastAPI')} for REST/gRPC; optimize with {link('https://onnxruntime.ai/','ONNX Runtime')}, {link('https://developer.nvidia.com/tensorrt','TensorRT')} (NVIDIA), or {link('https://www.intel.com/openvino','OpenVINO')} (Intel). Integrate as a {link('https://www.ros.org/','ROS 2')} node on the robot.
Reuse / Continual Learning: Load models via transformers API; continue training or integrate into reasoning/interaction systems.

{/* Model lifecycle flow (added Local Deployment step) */}

Train locally or on {link('https://www.runpod.io/','RunPod')} cloud GPUs
Use {link('https://huggingface.co/docs/transformers','Transformers')} + {link('https://huggingface.co/docs/accelerate','Accelerate')} for training
Track metrics with {link('https://wandb.ai/site','Weights & Biases')} or built-in logs

Use {link('https://huggingface.co/docs/evaluate','Evaluate')} library for metrics
Visualize predictions with FiftyOne or Spaces
Generate benchmark reports

Push models via huggingface_hub API
Keep config, tokenizer, and weights
Versioned releases, changelogs, model cards

Serve via HF {link('https://huggingface.co/inference-api','Inference API')} or Spaces
Integrate into robot planner / dialogue manager
Public or private endpoints

{link('https://www.docker.com/','Docker')} image + {link('https://fastapi.tiangolo.com/','FastAPI')} service
Accelerate with {link('https://onnxruntime.ai/','ONNX Runtime')}, {link('https://developer.nvidia.com/tensorrt','TensorRT')}, {link('https://www.intel.com/openvino','OpenVINO')}
Expose as {link('https://www.ros.org/','ROS 2')} node or local REST/gRPC

Load via {link('https://huggingface.co/docs/transformers/quicktour','Transformers.load_pretrained')}
Adapt models for new domains or robot skills
Fine-tune periodically with new curated data

{/* Summary */}

Training: RunPod + HF Accelerate
Evaluation: HF Evaluate + simple scripts
Storage: Hugging Face Hub
Deployment (Cloud): HF Spaces / Inference API
Deployment (Local Optional): FastAPI + Docker (+ ONNX/TensorRT/OpenVINO)
Reuse: Transformers API

Keep one model repo per skill (e.g., gaze decoder, z_social encoder)
Tag model cards with dataset and evaluation metrics
Use Spaces for lightweight demos or robot simulations
Automate CI/CD: push training logs + model eval to Hub
Export optimized runners (ONNX/TensorRT/OpenVINO) for edge deployment
Provide ROS 2 wrappers for robot-side integration

{/* --- Dev self-checks (simple tests) --- */}

Dev Tests

{t.pass ? 'PASS' : 'FAIL'} — {t.name}

Links tracked: {requiredLinks.length}