# InteriorFusion Dataset Strategy ## Core Training Dataset: InteriorFusion-Train We curate a composite dataset from multiple sources, processed into a unified format. ### Dataset Composition | Source | Split | Rooms/Scenes | Images | Purpose | Weight | |--------|-------|-------------|--------|---------|--------| | 3D-FRONT (HF MIDI-3D) | train | 14,000 | ~500K | Primary training | 40% | | Structured3D | train | 18,000 | ~360K | Layout structure | 25% | | InteriorNet | train | 50,000 | ~1M | Scale pre-training | 20% | | ScanNet++ | train | 1,200 | ~50K | Real-world adaptation | 10% | | HM3D | train | 800 | ~30K | Real-world adaptation | 5% | **Total: ~85K rooms, ~2M training images** ### Unified Data Format ```python @dataclass class InteriorSample: # Input image: torch.Tensor # [3, H, W] — single interior photo depth: torch.Tensor # [1, H, W] — metric depth in meters normal: torch.Tensor # [3, H, W] — surface normals # Scene understanding room_layout: RoomLayout # Walls, floor, ceiling planes room_type: str # "living_room", "bedroom", "kitchen" style: str # "modern", "scandinavian", "luxury" scene_graph: SceneGraph # Object nodes + spatial relations # Per-object data objects: List[ObjectData] # Individual furniture items # 3D ground truth room_mesh: trimesh.Trimesh # Full room mesh (walls + floor + ceiling) object_meshes: List[trimesh.Trimesh] # Per-object meshes gaussian_cloud: GaussianCloud # 3D Gaussian representation # Materials materials: List[PBRMaterial] # Per-object PBR materials wall_material: PBRMaterial floor_material: PBRMaterial # Camera camera_pose: CameraPose # Intrinsics + extrinsics fov: float # Metadata source: str # "3dfront", "structured3d", "scannet" caption: str # Natural language description ``` ### Preprocessing Pipeline ``` Raw Dataset → Filter → Render Views → Compute Depth → Segment Objects → Extract Layout → Generate Multi-View → Create SLAT → Validate → Package → Upload to HF ``` ### Filtering Criteria 1. **Quality filter**: Minimum resolution 512×512 2. **Content filter**: Must contain at least 2 furniture objects 3. **Occlusion filter**: Main objects must be >30% visible 4. **Room type filter**: Exclude bathrooms, garages, outdoor 5. **Lighting filter**: Exclude extremely dark or overexposed scenes 6. **Duplicate filter**: Perceptual hash deduplication ### Augmentation Pipeline 1. **Color jitter**: brightness ±0.2, contrast ±0.2, saturation ±0.2, hue ±0.1 2. **Random crop**: 0.8–1.0 scale, maintain aspect ratio 3. **Horizontal flip**: 50% probability 4. **Perspective warp**: Simulate different camera angles (±15° pitch, ±20° yaw) 5. **Synthetic occlusion**: Add random rectangles simulating foreground objects 6. **Depth noise**: Add Gaussian noise to depth map (σ=0.05m) for robustness 7. **Lighting variation**: Re-render with different HDRI environments ### Captioning Strategy **Automatic captions** from Cap3D-style generation: - Room type: "a modern living room with a gray sofa and wooden coffee table" - Style: "scandinavian minimalist interior with natural light" - Objects: "contains: sofa, coffee table, floor lamp, bookshelf" - Materials: "wooden floor, white walls, leather sofa" - Spatial: "sofa against back wall, coffee table centered, lamp in corner" **Manual review**: 10% random sample reviewed by interior designers for quality. ### Synthetic Data Generation Using ProcTHOR + AI2-THOR simulator: 1. Generate 100K additional procedural rooms 2. Randomize: furniture placement, materials, lighting, camera position 3. Render 20 views per room 4. Add to training mix with 15% weight ### Data Splits | Split | Rooms | Images | Purpose | |-------|-------|--------|---------| | Train | 75,000 | 1,800,000 | Model training | | Val | 5,000 | 120,000 | Hyperparameter tuning | | Test | 5,000 | 120,000 | Final evaluation | | Benchmark | 500 | 12,000 | Leaderboard / comparison |