arc-vision

Vision models for arc-vision — the autonomous drone vision pipeline.

Files

File	Size	Description
`yolov11s.hef`	16 MB	YOLO11s object detection (640×640, 80 COCO classes, Hailo-10H)
`clip_vit_b_16_text_encoder.safetensors`	254 MB	CLIP ViT-B/16 text encoder weights for target designation
`clip_tokenizer.json`	3.6 MB	CLIP tokenizer
`clip_vit_b_16_image_encoder.hef`	76 MB	CLIP ViT-B/16 image encoder compiled for Hailo-10H NPU

Detection: YOLO11s on Hailo-10H NPU (~46 FPS)
Target designation: CLIP text encoder (CPU, candle) + image encoder (NPU, ~53 FPS) for zero-shot matching
Re-ID: CLIP image embeddings for cross-frame track association

Describe a target in natural language ("silver car", "person in red jacket") and the pipeline matches against live camera frames.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support