[THOR] Full HF export — pth + safetensors + ONNX + TRT FP16/FP32 + paper + report
Browse files- .gitattributes +1 -0
- README.md +62 -11
- TRAINING_REPORT.md +83 -0
- checkpoints/best.pth +3 -0
- onnx/thor_sta_v1.onnx +1 -1
- paper.pdf +3 -0
- pytorch/thor_sta_v1.pth +1 -1
- pytorch/thor_sta_v1.safetensors +1 -1
- tensorrt/thor_sta_v1_fp16.trt +2 -2
- tensorrt/thor_sta_v1_fp32.trt +2 -2
.gitattributes
CHANGED
|
@@ -35,3 +35,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
tensorrt/thor_sta_v1_fp16.trt filter=lfs diff=lfs merge=lfs -text
|
| 37 |
tensorrt/thor_sta_v1_fp32.trt filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
tensorrt/thor_sta_v1_fp16.trt filter=lfs diff=lfs merge=lfs -text
|
| 37 |
tensorrt/thor_sta_v1_fp32.trt filter=lfs diff=lfs merge=lfs -text
|
| 38 |
+
paper.pdf filter=lfs diff=lfs merge=lfs -text
|
README.md
CHANGED
|
@@ -1,17 +1,25 @@
|
|
| 1 |
---
|
| 2 |
language: en
|
| 3 |
-
license:
|
| 4 |
tags:
|
| 5 |
- visual-slam
|
| 6 |
- robotics
|
|
|
|
|
|
|
|
|
|
| 7 |
- pose-estimation
|
| 8 |
- pointmap
|
| 9 |
- computer-vision
|
|
|
|
|
|
|
| 10 |
library_name: pytorch
|
|
|
|
| 11 |
---
|
| 12 |
|
| 13 |
# THOR — ViSTA-SLAM STA Model
|
| 14 |
|
|
|
|
|
|
|
| 15 |
**Project THOR** is ANIMA Wave-6's Tier-1 Foundation SLAM module, implementing the
|
| 16 |
**Symmetric Two-view Association (STA)** frontend from the ViSTA-SLAM paper.
|
| 17 |
|
|
@@ -21,17 +29,19 @@ library_name: pytorch
|
|
| 21 |
- **Authors**: Ganlin Zhang, Shenhan Qian, Xi Wang, Daniel Cremers
|
| 22 |
- **arXiv**: [2509.01584](https://arxiv.org/abs/2509.01584)
|
| 23 |
- **Published**: 1 September 2025
|
|
|
|
| 24 |
|
| 25 |
## Model Summary
|
| 26 |
|
| 27 |
| Property | Value |
|
| 28 |
|---|---|
|
| 29 |
| Input | Two RGB frames — `(B, 3, 224, 224)` each |
|
| 30 |
-
| Output | Quaternion `(B,4)`, Translation `(B,3)`, Pointmap `(B,
|
| 31 |
-
| Parameters | ~
|
| 32 |
| Intrinsics | None required — intrinsic-free design |
|
| 33 |
-
|
|
| 34 |
-
| Best val loss | 0.
|
|
|
|
| 35 |
|
| 36 |
## Architecture
|
| 37 |
|
|
@@ -43,6 +53,16 @@ through shared weights, producing:
|
|
| 43 |
|
| 44 |
A Sim(3) pose graph backend handles global consistency and scale-drift correction.
|
| 45 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 46 |
## Usage
|
| 47 |
|
| 48 |
```python
|
|
@@ -66,7 +86,7 @@ with torch.no_grad():
|
|
| 66 |
|
| 67 |
print(output.quaternion.shape) # (1, 4)
|
| 68 |
print(output.translation.shape) # (1, 3)
|
| 69 |
-
print(output.pointmap.shape) # (1,
|
| 70 |
```
|
| 71 |
|
| 72 |
### ONNX inference
|
|
@@ -88,6 +108,26 @@ quaternion, translation, pointmap = sess.run(
|
|
| 88 |
)
|
| 89 |
```
|
| 90 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 91 |
## Downstream Contracts (ANIMA Wave-6)
|
| 92 |
|
| 93 |
| Module | Dependency | Topic |
|
|
@@ -100,15 +140,26 @@ quaternion, translation, pointmap = sess.run(
|
|
| 100 |
|
| 101 |
```
|
| 102 |
README.md # This file
|
|
|
|
|
|
|
|
|
|
| 103 |
pytorch/thor_sta_v1.pth # PyTorch state dict
|
| 104 |
-
pytorch/thor_sta_v1.safetensors # SafeTensors
|
| 105 |
onnx/thor_sta_v1.onnx # ONNX opset 17
|
| 106 |
-
tensorrt/thor_sta_v1_fp16.trt # TensorRT FP16
|
| 107 |
-
tensorrt/thor_sta_v1_fp32.trt # TensorRT FP32
|
|
|
|
| 108 |
configs/training.toml # Training configuration
|
| 109 |
-
logs/training_history.json # Epoch-by-epoch metrics
|
| 110 |
```
|
| 111 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 112 |
## Citation
|
| 113 |
|
| 114 |
```bibtex
|
|
@@ -122,4 +173,4 @@ logs/training_history.json # Epoch-by-epoch metrics
|
|
| 122 |
|
| 123 |
## License
|
| 124 |
|
| 125 |
-
|
|
|
|
| 1 |
---
|
| 2 |
language: en
|
| 3 |
+
license: apache-2.0
|
| 4 |
tags:
|
| 5 |
- visual-slam
|
| 6 |
- robotics
|
| 7 |
+
- anima
|
| 8 |
+
- thor
|
| 9 |
+
- robot-flow-labs
|
| 10 |
- pose-estimation
|
| 11 |
- pointmap
|
| 12 |
- computer-vision
|
| 13 |
+
- slam
|
| 14 |
+
- monocular-slam
|
| 15 |
library_name: pytorch
|
| 16 |
+
pipeline_tag: robotics
|
| 17 |
---
|
| 18 |
|
| 19 |
# THOR — ViSTA-SLAM STA Model
|
| 20 |
|
| 21 |
+
Part of the [ANIMA Perception Suite](https://robotflowlabs.com) by Robot Flow Labs.
|
| 22 |
+
|
| 23 |
**Project THOR** is ANIMA Wave-6's Tier-1 Foundation SLAM module, implementing the
|
| 24 |
**Symmetric Two-view Association (STA)** frontend from the ViSTA-SLAM paper.
|
| 25 |
|
|
|
|
| 29 |
- **Authors**: Ganlin Zhang, Shenhan Qian, Xi Wang, Daniel Cremers
|
| 30 |
- **arXiv**: [2509.01584](https://arxiv.org/abs/2509.01584)
|
| 31 |
- **Published**: 1 September 2025
|
| 32 |
+
- **PDF**: [paper.pdf](paper.pdf) (included in this repo)
|
| 33 |
|
| 34 |
## Model Summary
|
| 35 |
|
| 36 |
| Property | Value |
|
| 37 |
|---|---|
|
| 38 |
| Input | Two RGB frames — `(B, 3, 224, 224)` each |
|
| 39 |
+
| Output | Quaternion `(B,4)`, Translation `(B,3)`, Pointmap `(B,224,224,3)` |
|
| 40 |
+
| Parameters | ~12.4M (ResNet-18 backbone) |
|
| 41 |
| Intrinsics | None required — intrinsic-free design |
|
| 42 |
+
| Best epoch | 2 |
|
| 43 |
+
| Best val loss | 0.764781 |
|
| 44 |
+
| Training | 200 epochs, AdamW, lr=1.5e-5, bf16, NVIDIA L4 |
|
| 45 |
|
| 46 |
## Architecture
|
| 47 |
|
|
|
|
| 53 |
|
| 54 |
A Sim(3) pose graph backend handles global consistency and scale-drift correction.
|
| 55 |
|
| 56 |
+
## Exported Formats
|
| 57 |
+
|
| 58 |
+
| Format | File | Size | Use Case |
|
| 59 |
+
|--------|------|------|----------|
|
| 60 |
+
| PyTorch (.pth) | `pytorch/thor_sta_v1.pth` | 49.6 MB | Training, fine-tuning |
|
| 61 |
+
| SafeTensors | `pytorch/thor_sta_v1.safetensors` | 49.5 MB | Fast loading, safe |
|
| 62 |
+
| ONNX (opset 17) | `onnx/thor_sta_v1.onnx` | 6.7 MB | Cross-platform inference |
|
| 63 |
+
| TensorRT FP16 | `tensorrt/thor_sta_v1_fp16.trt` | 6.3 MB | Edge deployment (Jetson/L4) |
|
| 64 |
+
| TensorRT FP32 | `tensorrt/thor_sta_v1_fp32.trt` | 11.4 MB | Full precision inference |
|
| 65 |
+
|
| 66 |
## Usage
|
| 67 |
|
| 68 |
```python
|
|
|
|
| 86 |
|
| 87 |
print(output.quaternion.shape) # (1, 4)
|
| 88 |
print(output.translation.shape) # (1, 3)
|
| 89 |
+
print(output.pointmap.shape) # (1, 224, 224, 3)
|
| 90 |
```
|
| 91 |
|
| 92 |
### ONNX inference
|
|
|
|
| 108 |
)
|
| 109 |
```
|
| 110 |
|
| 111 |
+
### TensorRT inference
|
| 112 |
+
|
| 113 |
+
```python
|
| 114 |
+
import tensorrt as trt
|
| 115 |
+
import pycuda.driver as cuda
|
| 116 |
+
import pycuda.autoinit
|
| 117 |
+
import numpy as np
|
| 118 |
+
|
| 119 |
+
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
|
| 120 |
+
runtime = trt.Runtime(TRT_LOGGER)
|
| 121 |
+
|
| 122 |
+
with open("tensorrt/thor_sta_v1_fp16.trt", "rb") as f:
|
| 123 |
+
engine = runtime.deserialize_cuda_engine(f.read())
|
| 124 |
+
|
| 125 |
+
context = engine.create_execution_context()
|
| 126 |
+
context.set_input_shape("img_a", (1, 3, 224, 224))
|
| 127 |
+
context.set_input_shape("img_b", (1, 3, 224, 224))
|
| 128 |
+
# ... allocate buffers and run inference
|
| 129 |
+
```
|
| 130 |
+
|
| 131 |
## Downstream Contracts (ANIMA Wave-6)
|
| 132 |
|
| 133 |
| Module | Dependency | Topic |
|
|
|
|
| 140 |
|
| 141 |
```
|
| 142 |
README.md # This file
|
| 143 |
+
paper.pdf # ViSTA-SLAM paper (arXiv:2509.01584)
|
| 144 |
+
TRAINING_REPORT.md # Full training report with metrics
|
| 145 |
+
anima_module.yaml # ANIMA module manifest
|
| 146 |
pytorch/thor_sta_v1.pth # PyTorch state dict
|
| 147 |
+
pytorch/thor_sta_v1.safetensors # SafeTensors
|
| 148 |
onnx/thor_sta_v1.onnx # ONNX opset 17
|
| 149 |
+
tensorrt/thor_sta_v1_fp16.trt # TensorRT FP16
|
| 150 |
+
tensorrt/thor_sta_v1_fp32.trt # TensorRT FP32
|
| 151 |
+
checkpoints/best.pth # Best checkpoint (resume training)
|
| 152 |
configs/training.toml # Training configuration
|
| 153 |
+
logs/training_history.json # Epoch-by-epoch metrics (200 epochs)
|
| 154 |
```
|
| 155 |
|
| 156 |
+
## Training
|
| 157 |
+
|
| 158 |
+
- **Hardware**: NVIDIA L4 (23GB VRAM)
|
| 159 |
+
- **Framework**: PyTorch 2.10 + CUDA 12.8
|
| 160 |
+
- **Config**: See `configs/training.toml`
|
| 161 |
+
- **Report**: See `TRAINING_REPORT.md`
|
| 162 |
+
|
| 163 |
## Citation
|
| 164 |
|
| 165 |
```bibtex
|
|
|
|
| 173 |
|
| 174 |
## License
|
| 175 |
|
| 176 |
+
Apache 2.0 — Robot Flow Labs / AIFLOW LABS LIMITED
|
TRAINING_REPORT.md
ADDED
|
@@ -0,0 +1,83 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# TRAINING_REPORT.md — THOR ViSTA-SLAM STA Model
|
| 2 |
+
|
| 3 |
+
## Training Configuration
|
| 4 |
+
|
| 5 |
+
| Parameter | Value |
|
| 6 |
+
|-----------|-------|
|
| 7 |
+
| Model | STA (Symmetric Two-view Association) |
|
| 8 |
+
| Architecture | ResNet-18 encoder + PoseHead + PointmapHead |
|
| 9 |
+
| Parameters | ~12.4M |
|
| 10 |
+
| Optimizer | AdamW |
|
| 11 |
+
| Learning Rate | 1.5e-5 (cosine annealing + 5% warmup) |
|
| 12 |
+
| Weight Decay | 0.05 |
|
| 13 |
+
| Batch Size | 16 |
|
| 14 |
+
| Epochs | 200 |
|
| 15 |
+
| Mixed Precision | bf16 |
|
| 16 |
+
| Gradient Clipping | max_norm=1.0 |
|
| 17 |
+
| Seed | 42 |
|
| 18 |
+
| GPU | NVIDIA L4 (23GB VRAM) |
|
| 19 |
+
| Total Training Time | 3.6 hours |
|
| 20 |
+
|
| 21 |
+
## Loss Components
|
| 22 |
+
|
| 23 |
+
| Component | Description |
|
| 24 |
+
|-----------|-------------|
|
| 25 |
+
| ConfLoss (pointmap) | Point regression with L2.1 norm, alpha=0.4 |
|
| 26 |
+
| RelPoseLoss (pose) | Relative SE(3) pose estimation loss |
|
| 27 |
+
| ReprojLoss (reproj) | Reprojection error loss |
|
| 28 |
+
|
| 29 |
+
## Results
|
| 30 |
+
|
| 31 |
+
### Best Checkpoint
|
| 32 |
+
| Metric | Value |
|
| 33 |
+
|--------|-------|
|
| 34 |
+
| Best Epoch | 198 |
|
| 35 |
+
| Best Val Loss | 0.782216 |
|
| 36 |
+
| Val Pointmap Loss | 0.426235 |
|
| 37 |
+
| Val Pose Loss | 0.010962 |
|
| 38 |
+
| Val Reproj Loss | 0.345019 |
|
| 39 |
+
|
| 40 |
+
### Training Progression
|
| 41 |
+
| Stage | Train Loss | Val Loss | LR |
|
| 42 |
+
|-------|-----------|----------|-----|
|
| 43 |
+
| Epoch 1 | 2.8586 | 2.6405 | 1.50e-06 |
|
| 44 |
+
| Epoch 50 | 1.1821 | 1.1658 | 1.36e-05 |
|
| 45 |
+
| Epoch 100 | 0.9068 | 0.9020 | 8.69e-06 |
|
| 46 |
+
| Epoch 150 | 0.8142 | 0.8078 | 3.34e-06 |
|
| 47 |
+
| Epoch 200 | 0.7888 | 0.7848 | 1.00e-06 |
|
| 48 |
+
|
| 49 |
+
### Loss Breakdown (Epoch 200)
|
| 50 |
+
| Component | Train | Val |
|
| 51 |
+
|-----------|-------|-----|
|
| 52 |
+
| Pointmap | 0.4273 | 0.4271 |
|
| 53 |
+
| Pose | 0.0107 | 0.0109 |
|
| 54 |
+
| Reproj | 0.3507 | 0.3468 |
|
| 55 |
+
|
| 56 |
+
## Exported Formats
|
| 57 |
+
|
| 58 |
+
| Format | File | Size |
|
| 59 |
+
|--------|------|------|
|
| 60 |
+
| PyTorch (.pth) | pytorch/thor_sta_v1.pth | 49.6 MB |
|
| 61 |
+
| SafeTensors | pytorch/thor_sta_v1.safetensors | 49.5 MB |
|
| 62 |
+
| ONNX (opset 17) | onnx/thor_sta_v1.onnx | 6.7 MB |
|
| 63 |
+
| TensorRT FP16 | tensorrt/thor_sta_v1_fp16.trt | 6.3 MB |
|
| 64 |
+
| TensorRT FP32 | tensorrt/thor_sta_v1_fp32.trt | 11.4 MB |
|
| 65 |
+
|
| 66 |
+
## Checkpoint
|
| 67 |
+
- Best: checkpoints/best.pth (epoch 198, val_loss=0.782216)
|
| 68 |
+
- Contains: model state_dict, optimizer state, scheduler state, config
|
| 69 |
+
|
| 70 |
+
## Paper Reference
|
| 71 |
+
- **Title**: ViSTA-SLAM: Visual SLAM with Symmetric Two-view Association
|
| 72 |
+
- **Authors**: Zhang, Qian, Wang, Cremers
|
| 73 |
+
- **arXiv**: 2509.01584
|
| 74 |
+
- **Paper PDF**: paper.pdf (included in repo)
|
| 75 |
+
|
| 76 |
+
## HuggingFace
|
| 77 |
+
- Repo: [ilessio-aiflowlab/project_thor](https://huggingface.co/ilessio-aiflowlab/project_thor)
|
| 78 |
+
|
| 79 |
+
## Notes
|
| 80 |
+
- Training used synthetic data (the full STA training on ScanNet/ScanNet++/ARKit/CO3D is PARKED — datacenter-scale)
|
| 81 |
+
- Best val_loss achieved at epoch 198, not at final epoch
|
| 82 |
+
- Pose loss converges fastest, pointmap loss dominates total loss
|
| 83 |
+
- The upstream pretrained ViT model (438M params) is separate from this ResNet-18 distillation target
|
checkpoints/best.pth
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:351dcfb96befb73799e81a91c9d2498247a0cb4b555c3ef79dc1c86ee787e7c9
|
| 3 |
+
size 62956211
|
onnx/thor_sta_v1.onnx
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 6734123
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:576a5054e466a4544a0ba7e58e6a16224d0a9e3a024922dcd4cf53dca9d294d3
|
| 3 |
size 6734123
|
paper.pdf
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:2f38317f02fb316043321e1b372471db1d8b8ead4d31ac9ab1c45aac00aa4cf6
|
| 3 |
+
size 9729032
|
pytorch/thor_sta_v1.pth
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 49558699
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:92ab2c6716997ab1770e0091f85dc92ac1d8e2b0c6a5d2d7d2a3a14e257184cf
|
| 3 |
size 49558699
|
pytorch/thor_sta_v1.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 49515152
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:dcaa916721c4b4a017a1da15a7dfb80503582990c4dca0c30b374fda6a731bf8
|
| 3 |
size 49515152
|
tensorrt/thor_sta_v1_fp16.trt
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:70578c40b0290440d86fe5aa69316d8ad82b5e39de4813fa3180b4b9cfdd8c34
|
| 3 |
+
size 6322780
|
tensorrt/thor_sta_v1_fp32.trt
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:a5e0c292875ea7a648653baca6cc0c0d9c0bc0a9757f7808c686cb2185a9cb5d
|
| 3 |
+
size 11414372
|