Fix param counts: DA3-NESTED-GIANT-LARGE is 1.69B, total 2.5B
Browse files
README.md
CHANGED
|
@@ -6,8 +6,6 @@ tags:
|
|
| 6 |
- segmentation
|
| 7 |
- language-grounding
|
| 8 |
- pose-free
|
| 9 |
-
datasets:
|
| 10 |
-
- scannetpp
|
| 11 |
pipeline_tag: image-segmentation
|
| 12 |
---
|
| 13 |
|
|
@@ -25,25 +23,22 @@ TrianguLang is a feed-forward, pose-free method for language-guided 3D localizat
|
|
| 25 |
|
| 26 |
## Checkpoints
|
| 27 |
|
| 28 |
-
| Checkpoint | Description |
|
| 29 |
-
|---|---|
|
| 30 |
-
| `mo_v11/best.pt` | Multi-object (text + spatial), 230 scenes, 8 views, 100 epochs |
|
| 31 |
-
| `fullscale_no_qp/best.pt` | Single-object (text-only), 230 scenes, 100 epochs |
|
| 32 |
-
|
| 33 |
-
## Usage
|
| 34 |
|
| 35 |
-
|
| 36 |
-
from triangulang.training.train import TrianguLangModel
|
| 37 |
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
model.load_state_dict(checkpoint["model_state_dict"])
|
| 41 |
-
```
|
| 42 |
|
| 43 |
-
##
|
| 44 |
|
| 45 |
-
|
| 46 |
-
-
|
|
|
|
|
|
|
| 47 |
|
| 48 |
## Citation
|
| 49 |
|
|
|
|
| 6 |
- segmentation
|
| 7 |
- language-grounding
|
| 8 |
- pose-free
|
|
|
|
|
|
|
| 9 |
pipeline_tag: image-segmentation
|
| 10 |
---
|
| 11 |
|
|
|
|
| 23 |
|
| 24 |
## Checkpoints
|
| 25 |
|
| 26 |
+
| Checkpoint | Description |
|
| 27 |
+
|---|---|
|
| 28 |
+
| `mo_v11/best.pt` | Multi-object (text + spatial), 230 scenes, 8 views, 100 epochs |
|
| 29 |
+
| `fullscale_no_qp/best.pt` | Single-object (text-only), 230 scenes, 100 epochs |
|
|
|
|
|
|
|
| 30 |
|
| 31 |
+
## Architecture
|
|
|
|
| 32 |
|
| 33 |
+
- **Frozen:** SAM3 (841M) + DA3-NESTED-GIANT-LARGE (1.69B) = ~2.5B params
|
| 34 |
+
- **Trainable:** GASA Decoder (~13.5M params)
|
|
|
|
|
|
|
| 35 |
|
| 36 |
+
## Results (ScanNet++)
|
| 37 |
|
| 38 |
+
| Setting | mIoU | mAcc |
|
| 39 |
+
|---|---|---|
|
| 40 |
+
| Text-only (single-object) | **62.4%** | **77.4%** |
|
| 41 |
+
| Text-only + CRF | **65.2%** | - |
|
| 42 |
|
| 43 |
## Citation
|
| 44 |
|