World Archive Mono — India Workplace Egocentric Manipulation

Ground-truth egocentric manipulation from the Indian real economy — robot-ready labels, not just video.

A public evaluation sample from World Archive. We run managed, consent-first egocentric capture at real Indian workplaces — factories, kitchens, repair bays, workshops — and ship a full annotation stack built for training and evaluating manipulation policies, VLA models, and world models.


Clips	9 (~48 min total)
Action segments	218 (human-reviewed verb–noun phases)
Median segment	~8s
Annotation layers	8+ (segments, captions, hands, objects, contact, metadata, QA, consent)
CI QA pass	9/9 clips
LeRobot mirror	`WorldArchive/mono-india-workplace-lerobot` — 9 episodes, 46,436 frames @ 15fps
Full pack	S3 sample index (~19 GB, no login)
Live explorer	HF Space
Collection	Physical AI India

Dataset Description

Nine egocentric video clips of real manual work in Indian workplaces: factory packaging, industrial sewing, heat-shrink batching, garment ironing, commercial catering, cane weaving, car detailing, auto-body primer/painting, and denting/filing. Each clip ships with temporal action segments, per-frame hand keypoints, object bounding boxes, hand–object contact samples, metadata, QA flags, and commercial AI-training consent documentation.

Source: Managed partner-site capture (not contributor apps). Head-mounted smartphone rigs operated by workers under documented consent.

Geography: India — factory floors, restaurants, roadside shops, showrooms, and repair bays across the real economy.

Intended use: Training and evaluating vision-language-action models, imitation learning, hand-object interaction research, egocentric video understanding, and physical-AI benchmarks in industrial and service settings.

Out of scope: Surveillance, worker performance scoring, biometric identification, or any use that re-identifies participants.

Verticals

shuttle-tube packaging · industrial sewing · heat-gun batching · garment ironing & packing · commercial catering · cane weaving · car detailing · primer & painting · denting & filing

Related assets

LeRobot mirror: mono-india-workplace-lerobot
Interactive explorer: data-explorer Space
Factory program: factory.worldarchive.co
Technical memo: docs/buyer-technical-memo.md in this repo
Paper (forthcoming): placeholder — cite this dataset card until preprint is live

Technical essays

Long-form notes on annotation density, capture ops, and trainable signal:

The density advantage: labels per minute that actually train policies — also in blog/01-annotation-density.md in this repo
Beyond the Monocular Plateau: How DataOps Wins the Next Quarter — also in blog/04-future-of-physical-ai-dataops.md in this repo
Essay index: worldarchive.co/blog

Dataset Structure

Repository layout

mono-india-workplace-sample/
├── README.md
├── DATACARD.md
├── DELIVERY_OVERVIEW.md
├── data/
│   ├── clips.parquet          # 9 rows — one per clip
│   ├── segments.parquet       # 218 rows — verb–noun phases
│   └── pack_summary.json
├── clips_preview/             # 6s MP4 previews (plain / skeleton / boxes)
│   └── sample_XX_*/{plain,skeleton,boxes}.mp4
├── schema/                    # Field dictionaries
│   ├── annotation_schema.md
│   ├── action_taxonomy.md
│   ├── object_boxes_schema.md
│   └── ...
└── docs/
    └── buyer-technical-memo.md

Full MP4 + JSONL annotations (~19 GB) live on S3.

`clips.parquet` columns

Column	Type	Description
`clip_id`	string	Stem, e.g. `sample_01_shuttle_tube_packaging`
`title`	string	Human-readable task name
`environment`	string	`factory`, `restaurant`, `repair shop`, etc.
`device`	string	Capture smartphone model
`session_id`	string	Session identifier
`video_file`	string	MP4 filename
`duration_sec`	float	Clip length
`fps`	float	Native frame rate
`resolution`	string	e.g. `1920x1080`
`mount_type`	string	Headband mount
`segment_count`	int	Action segments in clip
`hands_visible_pct`	float	Fraction of frames with visible hands
`two_hands_pct`	float	Fraction with two hands visible
`manipulation_density_pct`	float	Derived manipulation score
`qa_pass`	bool	CI QA pass flag
`consent_signed`	bool	Commercial AI consent on file
`s3_video_url`	string	Full-resolution MP4 on S3
`s3_overlay_url`	string	Hand skeleton overlay MP4
`s3_boxes_preview_url`	string	Object-box preview MP4
`s3_metadata_url`	string	Per-clip metadata JSON
`hf_preview_plain_url`	string	6s plain preview on HF
`hf_preview_skeleton_url`	string	6s skeleton preview on HF
`hf_preview_boxes_url`	string	6s boxes preview on HF

`segments.parquet` columns

Column	Type	Description
`clip_id`	string	Clip stem
`video`	string	MP4 filename
`start_sec`	float	Segment start (clip-relative)
`end_sec`	float	Segment end
`duration_sec`	float	Segment length
`action`	string	Verb (human-reviewed)
`object`	string	Noun / manipulated object
`task`	string	Combined task label
`notes`	string	Operator notes

Full-pack JSONL fields (S3)

File pattern	Key fields
`annotations/action_segments.jsonl`	`video`, `start_sec`, `end_sec`, `action`, `object`, `task`, `notes`
`annotations/*_hand_keypoints.jsonl`	`frame_idx`, `timestamp_sec`, `hands[]` with 21 landmarks (`x`,`y`,`z`)
`annotations/*_object_boxes.jsonl`	`frame_idx`, `boxes[]` with `bbox`, `label`, `track_id`, `source`
`annotations/*_hand_boxes.jsonl`	Per-hand axis-aligned boxes
`annotations/*_hand_object_contact.jsonl`	Derived contact events
`annotations/*_captions.jsonl`	Natural-language clip summary
`metadata/*.json`	Device, consent, QA flags, manipulator stats

Label provenance is explicit: segments & captions are human; keypoints & boxes are model-generated with source fields.

Browse previews in the Dataset Viewer

Open the clips config in the Dataset Viewer.
Click hf_preview_plain_url, hf_preview_skeleton_url, or hf_preview_boxes_url on any row to play a 6s inline preview.
For layer switching across all 9 clips, use the data-explorer Space.

Preview files live under clips_preview/{clip_id}/{plain,skeleton,boxes}.mp4.

Supported Tasks

Egocentric action recognition (verb–noun segments)
Temporal action segmentation and phase detection
Hand pose estimation (21-joint 2D landmarks)
Hand–object interaction and contact modeling
Object detection and tracking in manipulation scenes
Vision-language-action (VLA) pretraining on human video
Imitation learning from egocentric demonstrations
Robot policy evaluation on out-of-distribution industrial tasks
Cross-embodiment transfer (human ego → robot arms)
World-model training with action-conditioned video
Manipulation density and hand-visibility benchmarking
Geographic / cultural distribution analysis (India real economy)
Consent-aware dataset auditing for commercial AI training
LeRobot-format policy learning (via mirror dataset)
Physical-AI benchmark design for factory and service labor
Tool-use and dexterous manipulation in unstructured workshops

Usage

Metadata index (Hugging Face `datasets`)

from datasets import load_dataset

clips = load_dataset(
    "WorldArchive/mono-india-workplace-sample",
    "clips",
    split="train",
)
segments = load_dataset(
    "WorldArchive/mono-india-workplace-sample",
    "segments",
    split="train",
)
print(clips[0]["title"], clips[0]["hf_preview_plain_url"])
print(segments[0]["action"], segments[0]["object"])

Robot-ready frames (LeRobot)

from lerobot.datasets.lerobot_dataset import LeRobotDataset

ds = LeRobotDataset("WorldArchive/mono-india-workplace-lerobot")
print(ds.num_episodes, ds.num_frames, ds.fps)
sample = ds[0]  # observation.images.ego, observation.state (126-d), task

Full videos + dense JSONL

aws s3 sync s3://ggn-egocentric-data-sample/sample_data_june ./Master_Sample_v1 --no-sign-request

Comparison with public egocentric corpora

	Ego4D	Build AI Egocentric-100K	World Archive Mono
Scale	~3,670 hrs daily-life ego	~100k hrs factory (China)	9 clips, ~48 min (evaluation sample)
Setting	Western-heavy daily life (cooking, social, errands)	Chinese factory floors	Indian real economy (factory, catering, repair, craft)
Annotations	Partial (narrations, AV, some hands/objects)	Minimal public labels; raw video + intrinsics	218 human verb–noun segments; hands, boxes, contact, QA
Geography	US/EU/Singapore-heavy	China	India
License / access	Research license (FAIR)	Gated; commercial terms	CC BY-NC 4.0 eval sample; commercial training license available
Robot format	Custom JSON exports	Raw video	Native LeRobot mirror
Capture model	Crowd + research partners	Managed factory deployment	Managed partner sites, consent-first
Consent for commercial AI	Research-oriented	Enterprise (gated)	Documented commercial AI-training consent

Why this matters for VLA / robot learning

Generalization in manipulation is bottlenecked by distribution diversity. Most public ego data skews Western, kitchen/household, or lab teleop. World Archive contributes real industrial and service-economy manipulation with the spatial and temporal labels policies consume (hand pose, contact, verb–noun, object grounding).

Capture & QA pipeline

Capture — managed partner sites, headband ego rig
Consent — commercial AI-training consent before delivery
Anonymize — PII/face review; audio stripped
Annotate — segments, captions, hands, objects, contact
Manual QA — human verification before promote
Deliver — MP4 + JSONL + schema docs

Product tiers

Tier	Description
Mono Clear (this repo)	Headband smartphone ego + full annotation stack
Pro Multi-Sensor (pilot)	Ego + wrist cam + IMU + depth + exo, time-synced — contact us

Limitations

Sample size — 9 clips for evaluation, not pretraining at scale.
Geography — India workplaces only; not globally representative.
Monocular — no wrist camera, depth, or IMU in this sample (see Pro tier).
Object boxes — sampled ~1 Hz, not dense per-frame.
Hand keypoints — estimated 2D (MediaPipe), not metric 3D ground truth.
License — CC BY-NC 4.0 for evaluation; production commercial training requires a separate license.

License

This evaluation sample is released under CC BY-NC 4.0. Commercial production training and enterprise delivery are available under separate terms — contact shubham@worldarchive.co.

Citation

@dataset{worldarchive_mono_india_2026,
  title        = {World Archive Mono: India Workplace Egocentric Manipulation Sample},
  author       = {World Archive / GGN},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/datasets/WorldArchive/mono-india-workplace-sample}},
  note         = {9 clips, 218 action segments, LeRobot mirror available}
}

Contact

Book a call: Calendly
Email: shubham@worldarchive.co · dheeraj@worldarchive.co
Web: worldarchive.co · factory.worldarchive.co

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support