LOKI — HDINO-T Open-Vocabulary Detector

Part of the ANIMA Perception Suite by Robot Flow Labs.

Paper

HDINO: A Concise and Efficient Open-Vocabulary Detector (Mar 2026) Hao Zhang, Yiqun Wang, Qinran Lin, Runze Fan, Yong Li Paper | Code

Architecture

Backbone: Swin-T (96-dim, depths 2-2-6-2)
Transformer: 6 encoder + 6 decoder layers, deformable attention
Text encoder: CLIP ViT-B/32 (frozen)
Detection head: contrastive embedding + bbox regression
111.5M params total, 48.4M trainable

Training

Data: COCO 2017 (118K images, 80 classes)
LR: 0.0001 (cosine decay)
Batch size: 6
Precision: bf16
Best val_loss: 2.36 (epoch 2)
Hardware: NVIDIA L4 23GB

Exported Formats

Format	File	Size
PyTorch	`pytorch/loki_v1.pth`	792MB
SafeTensors	`pytorch/loki_v1.safetensors`	426MB
Text embeddings	`pytorch/text_embeddings.pt`	83KB
Checkpoint (resume)	`checkpoints/best.pth`	792MB
Training config	`configs/training.toml`	—
Training history	`logs/training_history.json`	—

Usage

# Requires HDINO repo: https://github.com/HaoZ416/HDINO
from safetensors.torch import load_file
weights = load_file("pytorch/loki_v1.safetensors")
model.load_state_dict(weights, strict=False)

ANIMA Module

Codename: LOKI
Tier: 2 Perception
Wave: 6
Downstream: TYR (tracking), FENRIR (manipulation)

License

Apache 2.0 — Robot Flow Labs / AIFLOW LABS LIMITED

Downloads last month: -; Downloads are not tracked for this model. How to track

Paper for ilessio-aiflowlab/project_loki

HDINO: A Concise and Efficient Open-Vocabulary Detector

Paper • 2603.02924 • Published Mar 3 • 1