Instructions to use WorldArchive/mono-india-workplace-sample with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LeRobot
How to use WorldArchive/mono-india-workplace-sample with LeRobot:
- Notebooks
- Google Colab
- Kaggle
- World Archive Mono — India Workplace Egocentric Manipulation
World Archive Mono — India Workplace Egocentric Manipulation
Ground-truth egocentric manipulation from the Indian real economy — robot-ready labels, not just video.
A public evaluation sample from World Archive. We run managed, consent-first egocentric capture at real Indian workplaces — factories, kitchens, repair bays, workshops — and ship a full annotation stack built for training and evaluating manipulation policies, VLA models, and world models.
| Clips | 9 (~48 min total) |
| Action segments | 218 (human-reviewed verb–noun phases) |
| Median segment | ~8s |
| Annotation layers | 8+ (segments, captions, hands, objects, contact, metadata, QA, consent) |
| CI QA pass | 9/9 clips |
| LeRobot mirror | WorldArchive/mono-india-workplace-lerobot — 9 episodes, 46,436 frames @ 15fps |
| Full pack | S3 sample index (~19 GB, no login) |
| Live explorer | HF Space |
| Collection | Physical AI India |
Dataset Description
Nine egocentric video clips of real manual work in Indian workplaces: factory packaging, industrial sewing, heat-shrink batching, garment ironing, commercial catering, cane weaving, car detailing, auto-body primer/painting, and denting/filing. Each clip ships with temporal action segments, per-frame hand keypoints, object bounding boxes, hand–object contact samples, metadata, QA flags, and commercial AI-training consent documentation.
Source: Managed partner-site capture (not contributor apps). Head-mounted smartphone rigs operated by workers under documented consent.
Geography: India — factory floors, restaurants, roadside shops, showrooms, and repair bays across the real economy.
Intended use: Training and evaluating vision-language-action models, imitation learning, hand-object interaction research, egocentric video understanding, and physical-AI benchmarks in industrial and service settings.
Out of scope: Surveillance, worker performance scoring, biometric identification, or any use that re-identifies participants.
Verticals
shuttle-tube packaging · industrial sewing · heat-gun batching · garment ironing & packing · commercial catering · cane weaving · car detailing · primer & painting · denting & filing
Related assets
- LeRobot mirror: mono-india-workplace-lerobot
- Interactive explorer: data-explorer Space
- Factory program: factory.worldarchive.co
- Technical memo:
docs/buyer-technical-memo.mdin this repo - Paper (forthcoming): placeholder — cite this dataset card until preprint is live
Technical essays
Long-form notes on annotation density, capture ops, and trainable signal:
- The density advantage: labels per minute that actually train policies — also in
blog/01-annotation-density.mdin this repo - Beyond the Monocular Plateau: How DataOps Wins the Next Quarter — also in
blog/04-future-of-physical-ai-dataops.mdin this repo - Essay index: worldarchive.co/blog
Dataset Structure
Repository layout
mono-india-workplace-sample/
├── README.md
├── DATACARD.md
├── DELIVERY_OVERVIEW.md
├── data/
│ ├── clips.parquet # 9 rows — one per clip
│ ├── segments.parquet # 218 rows — verb–noun phases
│ └── pack_summary.json
├── clips_preview/ # 6s MP4 previews (plain / skeleton / boxes)
│ └── sample_XX_*/{plain,skeleton,boxes}.mp4
├── schema/ # Field dictionaries
│ ├── annotation_schema.md
│ ├── action_taxonomy.md
│ ├── object_boxes_schema.md
│ └── ...
└── docs/
└── buyer-technical-memo.md
Full MP4 + JSONL annotations (~19 GB) live on S3.
clips.parquet columns
| Column | Type | Description |
|---|---|---|
clip_id |
string | Stem, e.g. sample_01_shuttle_tube_packaging |
title |
string | Human-readable task name |
environment |
string | factory, restaurant, repair shop, etc. |
device |
string | Capture smartphone model |
session_id |
string | Session identifier |
video_file |
string | MP4 filename |
duration_sec |
float | Clip length |
fps |
float | Native frame rate |
resolution |
string | e.g. 1920x1080 |
mount_type |
string | Headband mount |
segment_count |
int | Action segments in clip |
hands_visible_pct |
float | Fraction of frames with visible hands |
two_hands_pct |
float | Fraction with two hands visible |
manipulation_density_pct |
float | Derived manipulation score |
qa_pass |
bool | CI QA pass flag |
consent_signed |
bool | Commercial AI consent on file |
s3_video_url |
string | Full-resolution MP4 on S3 |
s3_overlay_url |
string | Hand skeleton overlay MP4 |
s3_boxes_preview_url |
string | Object-box preview MP4 |
s3_metadata_url |
string | Per-clip metadata JSON |
hf_preview_plain_url |
string | 6s plain preview on HF |
hf_preview_skeleton_url |
string | 6s skeleton preview on HF |
hf_preview_boxes_url |
string | 6s boxes preview on HF |
segments.parquet columns
| Column | Type | Description |
|---|---|---|
clip_id |
string | Clip stem |
video |
string | MP4 filename |
start_sec |
float | Segment start (clip-relative) |
end_sec |
float | Segment end |
duration_sec |
float | Segment length |
action |
string | Verb (human-reviewed) |
object |
string | Noun / manipulated object |
task |
string | Combined task label |
notes |
string | Operator notes |
Full-pack JSONL fields (S3)
| File pattern | Key fields |
|---|---|
annotations/action_segments.jsonl |
video, start_sec, end_sec, action, object, task, notes |
annotations/*_hand_keypoints.jsonl |
frame_idx, timestamp_sec, hands[] with 21 landmarks (x,y,z) |
annotations/*_object_boxes.jsonl |
frame_idx, boxes[] with bbox, label, track_id, source |
annotations/*_hand_boxes.jsonl |
Per-hand axis-aligned boxes |
annotations/*_hand_object_contact.jsonl |
Derived contact events |
annotations/*_captions.jsonl |
Natural-language clip summary |
metadata/*.json |
Device, consent, QA flags, manipulator stats |
Label provenance is explicit: segments & captions are human; keypoints & boxes are model-generated with source fields.
Browse previews in the Dataset Viewer
- Open the clips config in the Dataset Viewer.
- Click
hf_preview_plain_url,hf_preview_skeleton_url, orhf_preview_boxes_urlon any row to play a 6s inline preview. - For layer switching across all 9 clips, use the data-explorer Space.
Preview files live under clips_preview/{clip_id}/{plain,skeleton,boxes}.mp4.
Supported Tasks
- Egocentric action recognition (verb–noun segments)
- Temporal action segmentation and phase detection
- Hand pose estimation (21-joint 2D landmarks)
- Hand–object interaction and contact modeling
- Object detection and tracking in manipulation scenes
- Vision-language-action (VLA) pretraining on human video
- Imitation learning from egocentric demonstrations
- Robot policy evaluation on out-of-distribution industrial tasks
- Cross-embodiment transfer (human ego → robot arms)
- World-model training with action-conditioned video
- Manipulation density and hand-visibility benchmarking
- Geographic / cultural distribution analysis (India real economy)
- Consent-aware dataset auditing for commercial AI training
- LeRobot-format policy learning (via mirror dataset)
- Physical-AI benchmark design for factory and service labor
- Tool-use and dexterous manipulation in unstructured workshops
Usage
Metadata index (Hugging Face datasets)
from datasets import load_dataset
clips = load_dataset(
"WorldArchive/mono-india-workplace-sample",
"clips",
split="train",
)
segments = load_dataset(
"WorldArchive/mono-india-workplace-sample",
"segments",
split="train",
)
print(clips[0]["title"], clips[0]["hf_preview_plain_url"])
print(segments[0]["action"], segments[0]["object"])
Robot-ready frames (LeRobot)
from lerobot.datasets.lerobot_dataset import LeRobotDataset
ds = LeRobotDataset("WorldArchive/mono-india-workplace-lerobot")
print(ds.num_episodes, ds.num_frames, ds.fps)
sample = ds[0] # observation.images.ego, observation.state (126-d), task
Full videos + dense JSONL
aws s3 sync s3://ggn-egocentric-data-sample/sample_data_june ./Master_Sample_v1 --no-sign-request
Comparison with public egocentric corpora
| Ego4D | Build AI Egocentric-100K | World Archive Mono | |
|---|---|---|---|
| Scale | ~3,670 hrs daily-life ego | ~100k hrs factory (China) | 9 clips, ~48 min (evaluation sample) |
| Setting | Western-heavy daily life (cooking, social, errands) | Chinese factory floors | Indian real economy (factory, catering, repair, craft) |
| Annotations | Partial (narrations, AV, some hands/objects) | Minimal public labels; raw video + intrinsics | 218 human verb–noun segments; hands, boxes, contact, QA |
| Geography | US/EU/Singapore-heavy | China | India |
| License / access | Research license (FAIR) | Gated; commercial terms | CC BY-NC 4.0 eval sample; commercial training license available |
| Robot format | Custom JSON exports | Raw video | Native LeRobot mirror |
| Capture model | Crowd + research partners | Managed factory deployment | Managed partner sites, consent-first |
| Consent for commercial AI | Research-oriented | Enterprise (gated) | Documented commercial AI-training consent |
Why this matters for VLA / robot learning
Generalization in manipulation is bottlenecked by distribution diversity. Most public ego data skews Western, kitchen/household, or lab teleop. World Archive contributes real industrial and service-economy manipulation with the spatial and temporal labels policies consume (hand pose, contact, verb–noun, object grounding).
Capture & QA pipeline
- Capture — managed partner sites, headband ego rig
- Consent — commercial AI-training consent before delivery
- Anonymize — PII/face review; audio stripped
- Annotate — segments, captions, hands, objects, contact
- Manual QA — human verification before promote
- Deliver — MP4 + JSONL + schema docs
Product tiers
| Tier | Description |
|---|---|
| Mono Clear (this repo) | Headband smartphone ego + full annotation stack |
| Pro Multi-Sensor (pilot) | Ego + wrist cam + IMU + depth + exo, time-synced — contact us |
Limitations
- Sample size — 9 clips for evaluation, not pretraining at scale.
- Geography — India workplaces only; not globally representative.
- Monocular — no wrist camera, depth, or IMU in this sample (see Pro tier).
- Object boxes — sampled ~1 Hz, not dense per-frame.
- Hand keypoints — estimated 2D (MediaPipe), not metric 3D ground truth.
- License — CC BY-NC 4.0 for evaluation; production commercial training requires a separate license.
License
This evaluation sample is released under CC BY-NC 4.0. Commercial production training and enterprise delivery are available under separate terms — contact shubham@worldarchive.co.
Citation
@dataset{worldarchive_mono_india_2026,
title = {World Archive Mono: India Workplace Egocentric Manipulation Sample},
author = {World Archive / GGN},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/datasets/WorldArchive/mono-india-workplace-sample}},
note = {9 clips, 218 action segments, LeRobot mirror available}
}
Contact
- Book a call: Calendly
- Email: shubham@worldarchive.co · dheeraj@worldarchive.co
- Web: worldarchive.co · factory.worldarchive.co