llama-farm
/

arc-vision

bobbyrownd commited on 13 days ago

Commit

5ac7ab2

verified ·

1 Parent(s): f653820

Upload README.md with huggingface_hub

Files changed (1) hide show

README.md ADDED Viewed

+# arc-vision
+Vision models for [arc-vision](https://github.com/llamadrone/llamadrone) — the autonomous drone vision pipeline.
+## Files
+| File | Size | Description |
+|------|------|-------------|
+| `yolov11s.hef` | 16 MB | YOLO11s object detection (640×640, 80 COCO classes, Hailo-10H) |
+| `clip_vit_b_16_text_encoder.safetensors` | 254 MB | CLIP ViT-B/16 text encoder weights for target designation |
+| `clip_tokenizer.json` | 3.6 MB | CLIP tokenizer |
+| `clip_vit_b_16_image_encoder.hef` | 76 MB | CLIP ViT-B/16 image encoder compiled for Hailo-10H NPU |
+## Architecture
+- **Detection:** YOLO11s on Hailo-10H NPU (~46 FPS)
+- **Target designation:** CLIP text encoder (CPU, candle) + image encoder (NPU, ~53 FPS) for zero-shot matching
+- **Re-ID:** CLIP image embeddings for cross-frame track association
+Describe a target in natural language ("silver car", "person in red jacket") and the pipeline matches against live camera frames.
+## Sources
+- YOLO11s + CLIP ViT-B/16 image encoder: compiled from [Hailo Model Zoo](https://github.com/hailo-ai/hailo_model_zoo) v5.2.0
+- CLIP text encoder + tokenizer: exported from [`openai/clip-vit-base-patch16`](https://huggingface.co/openai/clip-vit-base-patch16)