YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

arc-vision

Vision models for arc-vision — the autonomous drone vision pipeline.

Files

File Size Description
yolov11s.hef 16 MB YOLO11s object detection (640×640, 80 COCO classes, Hailo-10H)
clip_vit_b_16_text_encoder.safetensors 254 MB CLIP ViT-B/16 text encoder weights for target designation
clip_tokenizer.json 3.6 MB CLIP tokenizer
clip_vit_b_16_image_encoder.hef 76 MB CLIP ViT-B/16 image encoder compiled for Hailo-10H NPU

Architecture

  • Detection: YOLO11s on Hailo-10H NPU (~46 FPS)
  • Target designation: CLIP text encoder (CPU, candle) + image encoder (NPU, ~53 FPS) for zero-shot matching
  • Re-ID: CLIP image embeddings for cross-frame track association

Describe a target in natural language ("silver car", "person in red jacket") and the pipeline matches against live camera frames.

Sources

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support