InteriorFusion / docs /DATASET_STRATEGY.md

Upload docs/DATASET_STRATEGY.md

c88ec9c verified 6 days ago

4.2 kB

	# InteriorFusion Dataset Strategy

	## Core Training Dataset: InteriorFusion-Train

	We curate a composite dataset from multiple sources, processed into a unified format.

	### Dataset Composition

	\| Source \| Split \| Rooms/Scenes \| Images \| Purpose \| Weight \|
	\|--------\|-------\|-------------\|--------\|---------\|--------\|
	\| 3D-FRONT (HF MIDI-3D) \| train \| 14,000 \| ~500K \| Primary training \| 40% \|
	\| Structured3D \| train \| 18,000 \| ~360K \| Layout structure \| 25% \|
	\| InteriorNet \| train \| 50,000 \| ~1M \| Scale pre-training \| 20% \|
	\| ScanNet++ \| train \| 1,200 \| ~50K \| Real-world adaptation \| 10% \|
	\| HM3D \| train \| 800 \| ~30K \| Real-world adaptation \| 5% \|

	Total: ~85K rooms, ~2M training images

	### Unified Data Format

	```python
	@dataclass
	class InteriorSample:
	# Input
	image: torch.Tensor # [3, H, W] — single interior photo
	depth: torch.Tensor # [1, H, W] — metric depth in meters
	normal: torch.Tensor # [3, H, W] — surface normals

	# Scene understanding
	room_layout: RoomLayout # Walls, floor, ceiling planes
	room_type: str # "living_room", "bedroom", "kitchen"
	style: str # "modern", "scandinavian", "luxury"
	scene_graph: SceneGraph # Object nodes + spatial relations

	# Per-object data
	objects: List[ObjectData] # Individual furniture items

	# 3D ground truth
	room_mesh: trimesh.Trimesh # Full room mesh (walls + floor + ceiling)
	object_meshes: List[trimesh.Trimesh] # Per-object meshes
	gaussian_cloud: GaussianCloud # 3D Gaussian representation

	# Materials
	materials: List[PBRMaterial] # Per-object PBR materials
	wall_material: PBRMaterial
	floor_material: PBRMaterial

	# Camera
	camera_pose: CameraPose # Intrinsics + extrinsics
	fov: float

	# Metadata
	source: str # "3dfront", "structured3d", "scannet"
	caption: str # Natural language description
	```

	### Preprocessing Pipeline

	```
	Raw Dataset → Filter → Render Views → Compute Depth →
	Segment Objects → Extract Layout →
	Generate Multi-View → Create SLAT →
	Validate → Package → Upload to HF
	```

	### Filtering Criteria

	1. Quality filter: Minimum resolution 512×512
	2. Content filter: Must contain at least 2 furniture objects
	3. Occlusion filter: Main objects must be >30% visible
	4. Room type filter: Exclude bathrooms, garages, outdoor
	5. Lighting filter: Exclude extremely dark or overexposed scenes
	6. Duplicate filter: Perceptual hash deduplication

	### Augmentation Pipeline

	1. Color jitter: brightness ±0.2, contrast ±0.2, saturation ±0.2, hue ±0.1
	2. Random crop: 0.8–1.0 scale, maintain aspect ratio
	3. Horizontal flip: 50% probability
	4. Perspective warp: Simulate different camera angles (±15° pitch, ±20° yaw)
	5. Synthetic occlusion: Add random rectangles simulating foreground objects
	6. Depth noise: Add Gaussian noise to depth map (σ=0.05m) for robustness
	7. Lighting variation: Re-render with different HDRI environments

	### Captioning Strategy

	Automatic captions from Cap3D-style generation:
	- Room type: "a modern living room with a gray sofa and wooden coffee table"
	- Style: "scandinavian minimalist interior with natural light"
	- Objects: "contains: sofa, coffee table, floor lamp, bookshelf"
	- Materials: "wooden floor, white walls, leather sofa"
	- Spatial: "sofa against back wall, coffee table centered, lamp in corner"

	Manual review: 10% random sample reviewed by interior designers for quality.

	### Synthetic Data Generation

	Using ProcTHOR + AI2-THOR simulator:
	1. Generate 100K additional procedural rooms
	2. Randomize: furniture placement, materials, lighting, camera position
	3. Render 20 views per room
	4. Add to training mix with 15% weight

	### Data Splits

	\| Split \| Rooms \| Images \| Purpose \|
	\|-------\|-------\|--------\|---------\|
	\| Train \| 75,000 \| 1,800,000 \| Model training \|
	\| Val \| 5,000 \| 120,000 \| Hyperparameter tuning \|
	\| Test \| 5,000 \| 120,000 \| Final evaluation \|
	\| Benchmark \| 500 \| 12,000 \| Leaderboard / comparison \|