HERMES Navigation Model v1
Indoor semantic navigation model combining vision and 3D point cloud understanding.
Architecture
- Vision encoder: CNN backbone (5-layer, 256-dim output)
- Point cloud encoder: MLP with max-pooling (2048 points โ 256-dim)
- Fusion: 512-dim MLP with LayerNorm + Dropout
- Heads: Direction (3D unit vector) + Traversability (scalar 0-1)
Training
- Dataset: SUN RGB-D (5,509 indoor scenes)
- Split: 90/5/5 (train/val/test)
- Optimizer: AdamW (lr=2e-4, cosine schedule)
- Mixed precision: bf16 on CUDA
Formats
| Format | File | Use Case |
|---|---|---|
| PyTorch | pytorch/hermes_nav_v1.pth |
Training/fine-tuning |
| SafeTensors | pytorch/hermes_nav_v1.safetensors |
Fast safe loading |
| ONNX | onnx/hermes_nav_v1.onnx |
Cross-platform inference |
Usage
import torch
from hermes.training.model import HermesNavigationModel
model = HermesNavigationModel()
model.load_state_dict(torch.load("hermes_nav_v1.pth"))
model.eval()
image = torch.randn(1, 3, 256, 256)
points = torch.randn(1, 2048, 3)
output = model(image, points)
# output["direction"]: [1, 3] goal direction
# output["traversability"]: [1, 1] traversability score
Citation
ANIMA Suite โ Robot Flow Labs
- Downloads last month
- -
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support