bobbyrownd commited on
Commit
5ac7ab2
·
verified ·
1 Parent(s): f653820

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +25 -0
README.md ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # arc-vision
2
+
3
+ Vision models for [arc-vision](https://github.com/llamadrone/llamadrone) — the autonomous drone vision pipeline.
4
+
5
+ ## Files
6
+
7
+ | File | Size | Description |
8
+ |------|------|-------------|
9
+ | `yolov11s.hef` | 16 MB | YOLO11s object detection (640×640, 80 COCO classes, Hailo-10H) |
10
+ | `clip_vit_b_16_text_encoder.safetensors` | 254 MB | CLIP ViT-B/16 text encoder weights for target designation |
11
+ | `clip_tokenizer.json` | 3.6 MB | CLIP tokenizer |
12
+ | `clip_vit_b_16_image_encoder.hef` | 76 MB | CLIP ViT-B/16 image encoder compiled for Hailo-10H NPU |
13
+
14
+ ## Architecture
15
+
16
+ - **Detection:** YOLO11s on Hailo-10H NPU (~46 FPS)
17
+ - **Target designation:** CLIP text encoder (CPU, candle) + image encoder (NPU, ~53 FPS) for zero-shot matching
18
+ - **Re-ID:** CLIP image embeddings for cross-frame track association
19
+
20
+ Describe a target in natural language ("silver car", "person in red jacket") and the pipeline matches against live camera frames.
21
+
22
+ ## Sources
23
+
24
+ - YOLO11s + CLIP ViT-B/16 image encoder: compiled from [Hailo Model Zoo](https://github.com/hailo-ai/hailo_model_zoo) v5.2.0
25
+ - CLIP text encoder + tokenizer: exported from [`openai/clip-vit-base-patch16`](https://huggingface.co/openai/clip-vit-base-patch16)