YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
arc-vision
Vision models for arc-vision — the autonomous drone vision pipeline.
Files
| File | Size | Description |
|---|---|---|
yolov11s.hef |
16 MB | YOLO11s object detection (640×640, 80 COCO classes, Hailo-10H) |
clip_vit_b_16_text_encoder.safetensors |
254 MB | CLIP ViT-B/16 text encoder weights for target designation |
clip_tokenizer.json |
3.6 MB | CLIP tokenizer |
clip_vit_b_16_image_encoder.hef |
76 MB | CLIP ViT-B/16 image encoder compiled for Hailo-10H NPU |
Architecture
- Detection: YOLO11s on Hailo-10H NPU (~46 FPS)
- Target designation: CLIP text encoder (CPU, candle) + image encoder (NPU, ~53 FPS) for zero-shot matching
- Re-ID: CLIP image embeddings for cross-frame track association
Describe a target in natural language ("silver car", "person in red jacket") and the pipeline matches against live camera frames.
Sources
- YOLO11s + CLIP ViT-B/16 image encoder: compiled from Hailo Model Zoo v5.2.0
- CLIP text encoder + tokenizer: exported from
openai/clip-vit-base-patch16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support