stevee00 commited on
Commit
c88ec9c
·
verified ·
1 Parent(s): 708fe64

Upload docs/DATASET_STRATEGY.md

Browse files
Files changed (1) hide show
  1. docs/DATASET_STRATEGY.md +111 -0
docs/DATASET_STRATEGY.md ADDED
@@ -0,0 +1,111 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # InteriorFusion Dataset Strategy
2
+
3
+ ## Core Training Dataset: InteriorFusion-Train
4
+
5
+ We curate a composite dataset from multiple sources, processed into a unified format.
6
+
7
+ ### Dataset Composition
8
+
9
+ | Source | Split | Rooms/Scenes | Images | Purpose | Weight |
10
+ |--------|-------|-------------|--------|---------|--------|
11
+ | 3D-FRONT (HF MIDI-3D) | train | 14,000 | ~500K | Primary training | 40% |
12
+ | Structured3D | train | 18,000 | ~360K | Layout structure | 25% |
13
+ | InteriorNet | train | 50,000 | ~1M | Scale pre-training | 20% |
14
+ | ScanNet++ | train | 1,200 | ~50K | Real-world adaptation | 10% |
15
+ | HM3D | train | 800 | ~30K | Real-world adaptation | 5% |
16
+
17
+ **Total: ~85K rooms, ~2M training images**
18
+
19
+ ### Unified Data Format
20
+
21
+ ```python
22
+ @dataclass
23
+ class InteriorSample:
24
+ # Input
25
+ image: torch.Tensor # [3, H, W] — single interior photo
26
+ depth: torch.Tensor # [1, H, W] — metric depth in meters
27
+ normal: torch.Tensor # [3, H, W] — surface normals
28
+
29
+ # Scene understanding
30
+ room_layout: RoomLayout # Walls, floor, ceiling planes
31
+ room_type: str # "living_room", "bedroom", "kitchen"
32
+ style: str # "modern", "scandinavian", "luxury"
33
+ scene_graph: SceneGraph # Object nodes + spatial relations
34
+
35
+ # Per-object data
36
+ objects: List[ObjectData] # Individual furniture items
37
+
38
+ # 3D ground truth
39
+ room_mesh: trimesh.Trimesh # Full room mesh (walls + floor + ceiling)
40
+ object_meshes: List[trimesh.Trimesh] # Per-object meshes
41
+ gaussian_cloud: GaussianCloud # 3D Gaussian representation
42
+
43
+ # Materials
44
+ materials: List[PBRMaterial] # Per-object PBR materials
45
+ wall_material: PBRMaterial
46
+ floor_material: PBRMaterial
47
+
48
+ # Camera
49
+ camera_pose: CameraPose # Intrinsics + extrinsics
50
+ fov: float
51
+
52
+ # Metadata
53
+ source: str # "3dfront", "structured3d", "scannet"
54
+ caption: str # Natural language description
55
+ ```
56
+
57
+ ### Preprocessing Pipeline
58
+
59
+ ```
60
+ Raw Dataset → Filter → Render Views → Compute Depth →
61
+ Segment Objects → Extract Layout →
62
+ Generate Multi-View → Create SLAT →
63
+ Validate → Package → Upload to HF
64
+ ```
65
+
66
+ ### Filtering Criteria
67
+
68
+ 1. **Quality filter**: Minimum resolution 512×512
69
+ 2. **Content filter**: Must contain at least 2 furniture objects
70
+ 3. **Occlusion filter**: Main objects must be >30% visible
71
+ 4. **Room type filter**: Exclude bathrooms, garages, outdoor
72
+ 5. **Lighting filter**: Exclude extremely dark or overexposed scenes
73
+ 6. **Duplicate filter**: Perceptual hash deduplication
74
+
75
+ ### Augmentation Pipeline
76
+
77
+ 1. **Color jitter**: brightness ±0.2, contrast ±0.2, saturation ±0.2, hue ±0.1
78
+ 2. **Random crop**: 0.8–1.0 scale, maintain aspect ratio
79
+ 3. **Horizontal flip**: 50% probability
80
+ 4. **Perspective warp**: Simulate different camera angles (±15° pitch, ±20° yaw)
81
+ 5. **Synthetic occlusion**: Add random rectangles simulating foreground objects
82
+ 6. **Depth noise**: Add Gaussian noise to depth map (σ=0.05m) for robustness
83
+ 7. **Lighting variation**: Re-render with different HDRI environments
84
+
85
+ ### Captioning Strategy
86
+
87
+ **Automatic captions** from Cap3D-style generation:
88
+ - Room type: "a modern living room with a gray sofa and wooden coffee table"
89
+ - Style: "scandinavian minimalist interior with natural light"
90
+ - Objects: "contains: sofa, coffee table, floor lamp, bookshelf"
91
+ - Materials: "wooden floor, white walls, leather sofa"
92
+ - Spatial: "sofa against back wall, coffee table centered, lamp in corner"
93
+
94
+ **Manual review**: 10% random sample reviewed by interior designers for quality.
95
+
96
+ ### Synthetic Data Generation
97
+
98
+ Using ProcTHOR + AI2-THOR simulator:
99
+ 1. Generate 100K additional procedural rooms
100
+ 2. Randomize: furniture placement, materials, lighting, camera position
101
+ 3. Render 20 views per room
102
+ 4. Add to training mix with 15% weight
103
+
104
+ ### Data Splits
105
+
106
+ | Split | Rooms | Images | Purpose |
107
+ |-------|-------|--------|---------|
108
+ | Train | 75,000 | 1,800,000 | Model training |
109
+ | Val | 5,000 | 120,000 | Hyperparameter tuning |
110
+ | Test | 5,000 | 120,000 | Final evaluation |
111
+ | Benchmark | 500 | 12,000 | Leaderboard / comparison |