Upload folder using huggingface_hub

Browse files

Files changed (5) hide show

README.md +47 -0
student_v1gt/best_model.pth +3 -0
student_v2gt/best_model.pth +3 -0
teacher_v1gt/best_model.pth +3 -0
teacher_v2gt/best_model.pth +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,47 @@

+# Sonata Scene Completion: Diffusion-Based 3D Scene Completion from LiDAR and RGB
+Checkpoints for cross-modal diffusion-based 3D scene completion on SemanticKITTI.
+## Models
+### v2 GT (ICP-refined ground truth, 50x denser)
+| Model | Path | CD | Input | Description |
+|-------|------|-----|-------|-------------|
+| **Teacher v2GT** | `teacher_v2gt/best_model.pth` | **0.039 +/- 0.009** | LiDAR | Direct diffusion, frozen Sonata/PTv3 encoder (108M) + DenoisingNetwork (8.9M) |
+| **Student v2GT** | `student_v2gt/best_model.pth` | **0.242 +/- 0.263** | RGB (DA2 pseudo-depth) | Task-loss-only distillation from teacher |
+### v1 GT (original accumulated ground truth)
+| Model | Path | CD | Input | Description |
+|-------|------|-----|-------|-------------|
+| Teacher v1GT | `teacher_v1gt/best_model.pth` | 0.608 +/- 0.141 | LiDAR | Same architecture, trained on v1 GT |
+| Student v1GT | `student_v1gt/best_model.pth` | 0.721 +/- 0.167 | RGB (DA2 pseudo-depth) | Task-loss-only distillation from v1 teacher |
+## Architecture
+- **Encoder**: Frozen Sonata/PTv3 (108M params, pretrained)
+- **Denoiser**: DenoisingNetwork (8.9M trainable params)
+- **Diffusion**: 1000 timesteps, cosine schedule, epsilon-prediction
+- **Inference**: Single-step x0 prediction at t=200
+- **Distillation**: Task-loss-only (alignment losses are harmful due to cross-modal gradient conflicts)
+## Key Finding
+Standard multi-loss distillation (task + output matching + feature alignment + structural) fails for cross-modal generative distillation. Feature alignment gradients conflict with task loss gradients (cosine similarity = -0.023, 58% of batches negative). Task-loss-only achieves the best student performance.
+## Dataset
+SemanticKITTI outdoor driving scenes. v2 GT uses anchor-based ICP refinement (50x denser, 5-10x tighter bounding boxes).
+## Training Details
+| Model | Epochs | LR | Batch Size | GPU |
+|-------|--------|-----|-----------|-----|
+| Teacher v2GT | 30 | 1e-4 | 2 | RTX 4090 24GB |
+| Student v2GT | 15 | 1e-4 | 2 | RTX 4090 24GB |
+| VAE v3 | 100 | 3e-4 | 4 | RTX 4090 24GB |
+## Citation
+Paper submitted to IEEE SMC 2026.

student_v1gt/best_model.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:10c3a89feace41810f91b984cf25be65794585ab0739c3b78cdda959e355074f
+size 541603043

student_v2gt/best_model.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7a9f4eccb57fe4115f34d68978a924de7afae0add52c4d8fb4b24dc9540d1a48
+size 541600675

teacher_v1gt/best_model.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:984bb76c64b9742821ab10bdf6bb5e31ce6989dcd12e514133d37219754ad7ec
+size 541602211

teacher_v2gt/best_model.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:eb599c5222cb80cd964a555677a48ee52359e7706a288f3f6865ebc1221f6c48
+size 541602211