| # InteriorFusion Dataset Strategy |
|
|
| ## Core Training Dataset: InteriorFusion-Train |
|
|
| We curate a composite dataset from multiple sources, processed into a unified format. |
|
|
| ### Dataset Composition |
|
|
| | Source | Split | Rooms/Scenes | Images | Purpose | Weight | |
| |--------|-------|-------------|--------|---------|--------| |
| | 3D-FRONT (HF MIDI-3D) | train | 14,000 | ~500K | Primary training | 40% | |
| | Structured3D | train | 18,000 | ~360K | Layout structure | 25% | |
| | InteriorNet | train | 50,000 | ~1M | Scale pre-training | 20% | |
| | ScanNet++ | train | 1,200 | ~50K | Real-world adaptation | 10% | |
| | HM3D | train | 800 | ~30K | Real-world adaptation | 5% | |
|
|
| **Total: ~85K rooms, ~2M training images** |
|
|
| ### Unified Data Format |
|
|
| ```python |
| @dataclass |
| class InteriorSample: |
| # Input |
| image: torch.Tensor # [3, H, W] β single interior photo |
| depth: torch.Tensor # [1, H, W] β metric depth in meters |
| normal: torch.Tensor # [3, H, W] β surface normals |
| |
| # Scene understanding |
| room_layout: RoomLayout # Walls, floor, ceiling planes |
| room_type: str # "living_room", "bedroom", "kitchen" |
| style: str # "modern", "scandinavian", "luxury" |
| scene_graph: SceneGraph # Object nodes + spatial relations |
| |
| # Per-object data |
| objects: List[ObjectData] # Individual furniture items |
| |
| # 3D ground truth |
| room_mesh: trimesh.Trimesh # Full room mesh (walls + floor + ceiling) |
| object_meshes: List[trimesh.Trimesh] # Per-object meshes |
| gaussian_cloud: GaussianCloud # 3D Gaussian representation |
| |
| # Materials |
| materials: List[PBRMaterial] # Per-object PBR materials |
| wall_material: PBRMaterial |
| floor_material: PBRMaterial |
| |
| # Camera |
| camera_pose: CameraPose # Intrinsics + extrinsics |
| fov: float |
| |
| # Metadata |
| source: str # "3dfront", "structured3d", "scannet" |
| caption: str # Natural language description |
| ``` |
|
|
| ### Preprocessing Pipeline |
|
|
| ``` |
| Raw Dataset β Filter β Render Views β Compute Depth β |
| Segment Objects β Extract Layout β |
| Generate Multi-View β Create SLAT β |
| Validate β Package β Upload to HF |
| ``` |
|
|
| ### Filtering Criteria |
|
|
| 1. **Quality filter**: Minimum resolution 512Γ512 |
| 2. **Content filter**: Must contain at least 2 furniture objects |
| 3. **Occlusion filter**: Main objects must be >30% visible |
| 4. **Room type filter**: Exclude bathrooms, garages, outdoor |
| 5. **Lighting filter**: Exclude extremely dark or overexposed scenes |
| 6. **Duplicate filter**: Perceptual hash deduplication |
|
|
| ### Augmentation Pipeline |
|
|
| 1. **Color jitter**: brightness Β±0.2, contrast Β±0.2, saturation Β±0.2, hue Β±0.1 |
| 2. **Random crop**: 0.8β1.0 scale, maintain aspect ratio |
| 3. **Horizontal flip**: 50% probability |
| 4. **Perspective warp**: Simulate different camera angles (Β±15Β° pitch, Β±20Β° yaw) |
| 5. **Synthetic occlusion**: Add random rectangles simulating foreground objects |
| 6. **Depth noise**: Add Gaussian noise to depth map (Ο=0.05m) for robustness |
| 7. **Lighting variation**: Re-render with different HDRI environments |
|
|
| ### Captioning Strategy |
|
|
| **Automatic captions** from Cap3D-style generation: |
| - Room type: "a modern living room with a gray sofa and wooden coffee table" |
| - Style: "scandinavian minimalist interior with natural light" |
| - Objects: "contains: sofa, coffee table, floor lamp, bookshelf" |
| - Materials: "wooden floor, white walls, leather sofa" |
| - Spatial: "sofa against back wall, coffee table centered, lamp in corner" |
|
|
| **Manual review**: 10% random sample reviewed by interior designers for quality. |
|
|
| ### Synthetic Data Generation |
|
|
| Using ProcTHOR + AI2-THOR simulator: |
| 1. Generate 100K additional procedural rooms |
| 2. Randomize: furniture placement, materials, lighting, camera position |
| 3. Render 20 views per room |
| 4. Add to training mix with 15% weight |
|
|
| ### Data Splits |
|
|
| | Split | Rooms | Images | Purpose | |
| |-------|-------|--------|---------| |
| | Train | 75,000 | 1,800,000 | Model training | |
| | Val | 5,000 | 120,000 | Hyperparameter tuning | |
| | Test | 5,000 | 120,000 | Final evaluation | |
| | Benchmark | 500 | 12,000 | Leaderboard / comparison | |
|
|