LOKI β€” HDINO-T Open-Vocabulary Detector

Part of the ANIMA Perception Suite by Robot Flow Labs.

Paper

HDINO: A Concise and Efficient Open-Vocabulary Detector (Mar 2026) Hao Zhang, Yiqun Wang, Qinran Lin, Runze Fan, Yong Li Paper | Code

Architecture

  • Backbone: Swin-T (96-dim, depths 2-2-6-2)
  • Transformer: 6 encoder + 6 decoder layers, deformable attention
  • Text encoder: CLIP ViT-B/32 (frozen)
  • Detection head: contrastive embedding + bbox regression
  • 111.5M params total, 48.4M trainable

Training

  • Data: COCO 2017 (118K images, 80 classes)
  • LR: 0.0001 (cosine decay)
  • Batch size: 6
  • Precision: bf16
  • Best val_loss: 2.36 (epoch 2)
  • Hardware: NVIDIA L4 23GB

Exported Formats

Format File Size
PyTorch pytorch/loki_v1.pth 792MB
SafeTensors pytorch/loki_v1.safetensors 426MB
Text embeddings pytorch/text_embeddings.pt 83KB
Checkpoint (resume) checkpoints/best.pth 792MB
Training config configs/training.toml β€”
Training history logs/training_history.json β€”

Usage

# Requires HDINO repo: https://github.com/HaoZ416/HDINO
from safetensors.torch import load_file
weights = load_file("pytorch/loki_v1.safetensors")
model.load_state_dict(weights, strict=False)

ANIMA Module

  • Codename: LOKI
  • Tier: 2 Perception
  • Wave: 6
  • Downstream: TYR (tracking), FENRIR (manipulation)

License

Apache 2.0 β€” Robot Flow Labs / AIFLOW LABS LIMITED

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for ilessio-aiflowlab/project_loki