businesslion commited on
Commit
f0a0140
·
verified ·
1 Parent(s): 9e2fe95

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Sonata Scene Completion: Diffusion-Based 3D Scene Completion from LiDAR and RGB
2
+
3
+ Checkpoints for cross-modal diffusion-based 3D scene completion on SemanticKITTI.
4
+
5
+ ## Models
6
+
7
+ ### v2 GT (ICP-refined ground truth, 50x denser)
8
+
9
+ | Model | Path | CD | Input | Description |
10
+ |-------|------|-----|-------|-------------|
11
+ | **Teacher v2GT** | `teacher_v2gt/best_model.pth` | **0.039 +/- 0.009** | LiDAR | Direct diffusion, frozen Sonata/PTv3 encoder (108M) + DenoisingNetwork (8.9M) |
12
+ | **Student v2GT** | `student_v2gt/best_model.pth` | **0.242 +/- 0.263** | RGB (DA2 pseudo-depth) | Task-loss-only distillation from teacher |
13
+
14
+ ### v1 GT (original accumulated ground truth)
15
+
16
+ | Model | Path | CD | Input | Description |
17
+ |-------|------|-----|-------|-------------|
18
+ | Teacher v1GT | `teacher_v1gt/best_model.pth` | 0.608 +/- 0.141 | LiDAR | Same architecture, trained on v1 GT |
19
+ | Student v1GT | `student_v1gt/best_model.pth` | 0.721 +/- 0.167 | RGB (DA2 pseudo-depth) | Task-loss-only distillation from v1 teacher |
20
+
21
+ ## Architecture
22
+
23
+ - **Encoder**: Frozen Sonata/PTv3 (108M params, pretrained)
24
+ - **Denoiser**: DenoisingNetwork (8.9M trainable params)
25
+ - **Diffusion**: 1000 timesteps, cosine schedule, epsilon-prediction
26
+ - **Inference**: Single-step x0 prediction at t=200
27
+ - **Distillation**: Task-loss-only (alignment losses are harmful due to cross-modal gradient conflicts)
28
+
29
+ ## Key Finding
30
+
31
+ Standard multi-loss distillation (task + output matching + feature alignment + structural) fails for cross-modal generative distillation. Feature alignment gradients conflict with task loss gradients (cosine similarity = -0.023, 58% of batches negative). Task-loss-only achieves the best student performance.
32
+
33
+ ## Dataset
34
+
35
+ SemanticKITTI outdoor driving scenes. v2 GT uses anchor-based ICP refinement (50x denser, 5-10x tighter bounding boxes).
36
+
37
+ ## Training Details
38
+
39
+ | Model | Epochs | LR | Batch Size | GPU |
40
+ |-------|--------|-----|-----------|-----|
41
+ | Teacher v2GT | 30 | 1e-4 | 2 | RTX 4090 24GB |
42
+ | Student v2GT | 15 | 1e-4 | 2 | RTX 4090 24GB |
43
+ | VAE v3 | 100 | 3e-4 | 4 | RTX 4090 24GB |
44
+
45
+ ## Citation
46
+
47
+ Paper submitted to IEEE SMC 2026.
student_v1gt/best_model.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:10c3a89feace41810f91b984cf25be65794585ab0739c3b78cdda959e355074f
3
+ size 541603043
student_v2gt/best_model.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7a9f4eccb57fe4115f34d68978a924de7afae0add52c4d8fb4b24dc9540d1a48
3
+ size 541600675
teacher_v1gt/best_model.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:984bb76c64b9742821ab10bdf6bb5e31ce6989dcd12e514133d37219754ad7ec
3
+ size 541602211
teacher_v2gt/best_model.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:eb599c5222cb80cd964a555677a48ee52359e7706a288f3f6865ebc1221f6c48
3
+ size 541602211