qqc1989 commited on
Commit
40616d4
·
verified ·
1 Parent(s): 687916d

Upload 21 files

Browse files
.gitattributes CHANGED
@@ -34,3 +34,12 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  *.axmodel filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  *.axmodel filter=lfs diff=lfs merge=lfs -text
37
+ football.jpg filter=lfs diff=lfs merge=lfs -text
38
+ install/bin/axcl_aarch64/test_detect_by_text filter=lfs diff=lfs merge=lfs -text
39
+ install/bin/axcl_x86/test_detect_by_text filter=lfs diff=lfs merge=lfs -text
40
+ install/bin/host_650/test_detect_by_text filter=lfs diff=lfs merge=lfs -text
41
+ install/lib/axcl_aarch64/libyoloworld.so filter=lfs diff=lfs merge=lfs -text
42
+ install/lib/axcl_x86/libyoloworld.so filter=lfs diff=lfs merge=lfs -text
43
+ install/lib/host_650/libyoloworld.so filter=lfs diff=lfs merge=lfs -text
44
+ pyyoloworld/gardio_example.jpg filter=lfs diff=lfs merge=lfs -text
45
+ result.png filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,155 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ - zh
6
+ tags:
7
+ - YOLO World
8
+ ---
9
+
10
+ # YOLOWorld
11
+
12
+ This SDK enables efficient Open-Vocabulary-Object-Detection using YOLO-Worldv2 Large, optimized for Axera’s NPU-based SoC platforms including AX650 Series, AX630C Series, AX8850 Series, or Axera's dedicated AI accelerator.
13
+
14
+ ## References links:
15
+
16
+ For those who are interested in model conversion, you can try to export axmodel through
17
+
18
+ - [The github repo of yoloworld.axera open source](https://github.com/AXERA-TECH/yoloworld.axera)
19
+ - [How to convert the yoloworld models](https://github.com/AXERA-TECH/ONNX-YOLO-World-Open-Vocabulary-Object-Detection)
20
+ - [Pulsar2 Link, How to Convert ONNX to axmodel](https://pulsar2-docs.readthedocs.io/en/latest/pulsar2/introduction.html)
21
+
22
+
23
+ ## Support Platform
24
+
25
+ - AX650
26
+ - [M4N-Dock(爱芯派Pro)](https://wiki.sipeed.com/hardware/zh/maixIV/m4ndock/m4ndock.html)
27
+ - [M.2 Accelerator card](https://axcl-docs.readthedocs.io/zh-cn/latest/doc_guide_hardware.html)
28
+ - AX630C
29
+ - [爱芯派2](https://axera-pi-2-docs-cn.readthedocs.io/zh-cn/latest/index.html)
30
+ - [Module-LLM](https://docs.m5stack.com/zh_CN/module/Module-LLM)
31
+ - [LLM630 Compute Kit](https://docs.m5stack.com/zh_CN/core/LLM630%20Compute%20Kit)
32
+
33
+
34
+ ## Performance
35
+
36
+ | Model | Input Shape | Latency (ms) | CMM Usage (MB) |
37
+ |-------|------------|--------------|------------|
38
+ | yolo_u16_ax650.axmodel| 1 x 640 x 640 x 3 | 9.522 ms | 21 MB |
39
+ | clip_b1_u16_ax650.axmodel | 1 x 77 | 2.997 ms | 137 MB |
40
+ | yolo_u16_ax630c.axmodel | 1 x 640 x 640 x 3 | 43.450 ms | 31 MB |
41
+ | clip_b1_u16_ax630c.axmodel | 1 x 77 | 10.703 ms | 134 MB |
42
+
43
+ ## How to use
44
+
45
+ Download all files from this repository to the device
46
+
47
+ ```
48
+ (py312) axera@raspberrypi:~/samples/yoloworldv2 $ tree
49
+ .
50
+ ├── config.json
51
+ ├── football.jpg
52
+ ├── install
53
+ │   ├── bin
54
+ │   │   ├── axcl_aarch64
55
+ │   │   │   └── test_detect_by_text
56
+ │   │   ├── axcl_x86
57
+ │   │   │   └── test_detect_by_text
58
+ │   │   └── host_650
59
+ │   │   └── test_detect_by_text
60
+ │   └── lib
61
+ │   ├── axcl_aarch64
62
+ │   │   └── libyoloworld.so
63
+ │   ├── axcl_x86
64
+ │   │   └── libyoloworld.so
65
+ │   └── host_650
66
+ │   └── libyoloworld.so
67
+ ├── models
68
+ │   ├── clip_b1_u16_ax630c.axmodel
69
+ │   ├── clip_b1_u16_ax650.axmodel
70
+ │   ├── yolo_u16_ax630c.axmodel
71
+ │   └── yolo_u16_ax650.axmodel
72
+ ├── pyyoloworld
73
+ │   ├── example.py
74
+ │   ├── gardio_example.jpg
75
+ │   ├── gradio_example.py
76
+ │   ├── libyoloworld.so
77
+ │   ├── pyaxdev.py
78
+ │   ├── __pycache__
79
+ │   │   ├── pyaxdev.cpython-312.pyc
80
+ │   │   └── pyyoloworld.cpython-312.pyc
81
+ │   ├── pyyoloworld.py
82
+ │   └── requirements.txt
83
+ ├── README.md
84
+ └── vocab.txt
85
+
86
+ 13 directories, 23 files
87
+ ```
88
+
89
+ ### python env requirement
90
+
91
+ ```
92
+ pip install -r pyyoloworld/requirements.txt
93
+ ```
94
+
95
+ #### Inference with AX650 Host, such as M4N-Dock(爱芯派Pro)
96
+
97
+ TODO
98
+
99
+ #### Inference with M.2 Accelerator card
100
+ [What is M.2 Accelerator card?](https://axcl-docs.readthedocs.io/zh-cn/latest/doc_guide_hardware.html), Show this DEMO based on Raspberry PI 5.
101
+
102
+ ```
103
+ (py312) axera@raspberrypi:~/samples/yoloworldv2-new.hg $ export LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libstdc++.so.6
104
+ (py312) axera@raspberrypi:~/samples/yoloworldv2-new.hg $ cp install/lib/axcl_aarch64/libyoloworld.so pyyoloworld/
105
+ (py312) axera@raspberrypi:~/samples/yoloworldv2-new.hg $ cd pyyoloworld/
106
+ (py312) axera@raspberrypi:~/samples/yoloworldv2-new.hg/pyyoloworld $ python gradio_example.py --yoloworld ../models/yolo_u16_ax650.axmodel --tenc ../models/clip_b1_u16_ax650.axmodel --vocab ../vocab.txt
107
+ Trying to load: /home/axera/samples/yoloworldv2-new.hg/pyyoloworld/aarch64/libyoloworld.so
108
+ ✅ Successfully loaded: /home/axera/samples/yoloworldv2-new.hg/pyyoloworld/libyoloworld.so
109
+ [I][ run][ 31]: AXCLWorker start with devid 0
110
+
111
+ input size: 2
112
+ name: images [unknown] [unknown]
113
+ 1 x 640 x 640 x 3 size: 1228800
114
+
115
+ name: txt_feats [unknown] [unknown]
116
+ 1 x 4 x 512 size: 8192
117
+
118
+
119
+ output size: 3
120
+ name: stride8
121
+ 1 x 80 x 80 x 68 size: 1740800
122
+
123
+ name: stride16
124
+ 1 x 40 x 40 x 68 size: 435200
125
+
126
+ name: stride32
127
+ 1 x 20 x 20 x 68 size: 108800
128
+
129
+ [I][ yw_create][ 408]: num_classes: 4, num_features: 512, input w: 640, h: 640
130
+ is_output_nhwc: 1
131
+
132
+ input size: 1
133
+ name: text_token [unknown] [unknown]
134
+ 1 x 77 size: 308
135
+
136
+
137
+ output size: 1
138
+ name: 2202
139
+ 1 x 1 x 512 size: 2048
140
+
141
+ [I][ load_text_encoder][ 44]: text feature len 512
142
+ [I][ load_tokenizer][ 60]: text token len 77
143
+ * Running on local URL: http://0.0.0.0:7860
144
+ * To create a public link, set `share=True` in `launch()`.
145
+ ```
146
+
147
+ If your Raspberry PI 5 IP Address is 192.168.1.100, so using this URL `http://192.168.1.100:7860` with your WebApp.
148
+
149
+ Input:`man`, `shoes`, `ball`, `person` and the test image
150
+
151
+ ![](./football.jpg)
152
+
153
+ Result:
154
+
155
+ ![](result.png)
config.json ADDED
File without changes
football.jpg ADDED

Git LFS Details

  • SHA256: e7c4b752ef447bfec409888cea8709be15c01d0f6bf91bd16b7762deb90950dc
  • Pointer size: 131 Bytes
  • Size of remote file: 325 kB
install/bin/axcl_aarch64/test_detect_by_text ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0be75d1a0ac72f7b9081c6f7ac5c20dfc0e9d11d1996fcc6c8de3e26a6b281db
3
+ size 157416
install/bin/axcl_x86/test_detect_by_text ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6c4a8aa10ee6141d1931f83258eb7f84d819eb81b4b7d608fd436a2577d0f5fd
3
+ size 112048
install/bin/host_650/test_detect_by_text ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:71f4a149a398db124b3e75010b1b672a935d093f2eb5772d459a4f248eebd665
3
+ size 5925168
install/lib/axcl_aarch64/libyoloworld.so ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:983c341755649bffd7fd35675bf114a29938e09204aeec1d3ca5cd32f20ddaab
3
+ size 1179736
install/lib/axcl_x86/libyoloworld.so ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4d07d6349940db32177b87bf6fddd476b838ef26ab3184f7869e710ec7a39f7c
3
+ size 1155448
install/lib/host_650/libyoloworld.so ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:66e0b7b2f6ebf92538751dd00c192e1675752689a98fb6adf9cf4ecbae9daf41
3
+ size 4373192
models/clip_b1_u16_ax630c.axmodel ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:980f80dd17847b7db685e66bc0ddfaa00e5bfff56b05cb3467e6da8058d6b9c7
3
+ size 140712067
models/clip_b1_u16_ax650.axmodel ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:22afd07e0cbc8ca35be930aa171b37feaa5653d0e01402e1c35bdec8dee5da32
3
+ size 143852095
models/yolo_u16_ax630c.axmodel ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8519e96fb61801e1bdd186547a8a32d1e9e10e94d5365f4d6b72bee63d0927cb
3
+ size 14722509
models/yolo_u16_ax650.axmodel ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9304f60c84fb06db8a0cc9742d7280e7e41cc4f4d7f516ab335863d9da873c3c
3
+ size 14161499
pyyoloworld/example.py ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ from pyaxdev import enum_devices, sys_init, sys_deinit, AxDeviceType
3
+ from pyyoloworld import YOLOWORLD
4
+ import cv2
5
+ import glob
6
+ import argparse
7
+ import tqdm
8
+
9
+ if __name__ == '__main__':
10
+ parser = argparse.ArgumentParser()
11
+ parser.add_argument('--yoloworld', type=str, default='cnclip/cnclip_vit_l14_336px_vision_u16u8.axmodel')
12
+ parser.add_argument('--tenc', type=str, default='cnclip/cnclip_vit_l14_336px_text_u16.axmodel')
13
+ parser.add_argument('--vocab', type=str, default='cnclip/cn_vocab.txt')
14
+ parser.add_argument('--image', type=str)
15
+ args = parser.parse_args()
16
+
17
+
18
+ # 枚举设备
19
+ devices_info = enum_devices()
20
+ print("可用设备:", devices_info)
21
+ if devices_info['host']['available']:
22
+ print("host device available")
23
+ sys_init(AxDeviceType.host_device, -1)
24
+ elif devices_info['devices']['count'] > 0:
25
+ print("axcl device available, use device-0")
26
+ sys_init(AxDeviceType.axcl_device, 0)
27
+ else:
28
+ raise Exception("No available device")
29
+
30
+ try:
31
+ # 创建CLIP实例
32
+ yw = YOLOWORLD({
33
+ 'text_encoder_path': args.tenc,
34
+ 'tokenizer_path': args.vocab,
35
+ 'yoloworld_path': args.yoloworld,
36
+ })
37
+
38
+ yw.set_classes(["person", "dog", "car", "horse"])
39
+
40
+ img = cv2.imread(args.image)
41
+ img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
42
+
43
+ results = yw.detect(img)
44
+ print(results)
45
+ img = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)
46
+ for result in results:
47
+ x = result['x']
48
+ y = result['y']
49
+ w = result['w']
50
+ h = result['h']
51
+ conf = result['score']
52
+ class_id = result['label']
53
+ cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)
54
+ cv2.putText(img, f"{class_id}: {conf:.2f}", (x, y - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
55
+ cv2.imwrite('result.jpg', img)
56
+
57
+ finally:
58
+ # 反初始化系统
59
+ if devices_info['host']['available']:
60
+ sys_deinit(AxDeviceType.host_device, -1)
61
+ elif devices_info['devices']['count'] > 0:
62
+ sys_deinit(AxDeviceType.axcl_device, 0)
pyyoloworld/gardio_example.jpg ADDED

Git LFS Details

  • SHA256: 71aa61d84008c3443a927e516f57826aa2595bdeabda7cf2de353c5a84553c0b
  • Pointer size: 131 Bytes
  • Size of remote file: 413 kB
pyyoloworld/gradio_example.py ADDED
@@ -0,0 +1,103 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import colorsys
2
+ import gradio as gr
3
+ import cv2
4
+ from pyaxdev import enum_devices, sys_init, sys_deinit, AxDeviceType
5
+ from pyyoloworld import YOLOWORLD
6
+ import numpy as np
7
+ from PIL import Image
8
+ import argparse
9
+ import random
10
+
11
+ parser = argparse.ArgumentParser()
12
+ parser.add_argument('--yoloworld', type=str, default='cnclip/cnclip_vit_l14_336px_vision_u16u8.axmodel')
13
+ parser.add_argument('--tenc', type=str, default='cnclip/cnclip_vit_l14_336px_text_u16.axmodel')
14
+ parser.add_argument('--vocab', type=str, default='cnclip/cn_vocab.txt')
15
+ args = parser.parse_args()
16
+
17
+ # ========== 模型和设备初始化 ==========
18
+ devices_info = enum_devices()
19
+ if devices_info['host']['available']:
20
+ sys_init(AxDeviceType.host_device, -1)
21
+ device_type = AxDeviceType.host_device
22
+ device_id = -1
23
+ elif devices_info['devices']['count'] > 0:
24
+ sys_init(AxDeviceType.axcl_device, 0)
25
+ device_type = AxDeviceType.axcl_device
26
+ device_id = 0
27
+ else:
28
+ raise Exception("No available device")
29
+
30
+ yw = YOLOWORLD({
31
+ 'text_encoder_path': args.tenc,
32
+ 'tokenizer_path': args.vocab,
33
+ 'yoloworld_path': args.yoloworld,
34
+ })
35
+
36
+ def generate_vivid_colors(n):
37
+ colors = []
38
+ for i in range(n):
39
+ # 均匀分布的 hue,饱和度、亮度都设高一点
40
+ h = i / n
41
+ s = 0.9 + random.random() * 0.1 # 饱和度 0.9~1.0
42
+ v = 0.9 + random.random() * 0.1 # 亮度 0.9~1.0
43
+ r, g, b = colorsys.hsv_to_rgb(h, s, v)
44
+ colors.append((int(r * 255), int(g * 255), int(b * 255)))
45
+ return colors
46
+
47
+
48
+ colors = generate_vivid_colors(4)
49
+ # ========== 推理函数 ==========
50
+ def detect_image(image, class1, class2, class3, class4, threshold):
51
+ if image is None:
52
+ return None
53
+ class_list = [class1, class2, class3, class4]
54
+ if len(class_list) == 0:
55
+ return image # 未设类别时不检测
56
+
57
+ yw.set_classes(class_list)
58
+ yw.set_threshold(threshold)
59
+
60
+ # 转换为 RGB 格式
61
+ img = np.array(image.convert('RGB')) # PIL -> np.ndarray
62
+ results = yw.detect(img)
63
+
64
+ # 可视化
65
+ for result in results:
66
+ x, y, w, h = result['x'], result['y'], result['w'], result['h']
67
+ conf = result['score']
68
+ label = result['label']
69
+ cv2.rectangle(img, (x, y), (x + w, y + h), colors[label], 3)
70
+ cv2.putText(img, f"{class_list[label]}: {conf:.2f}", (x, y - 5), cv2.FONT_HERSHEY_SIMPLEX, 2, colors[label], 3)
71
+
72
+ return Image.fromarray(img) # 返回PIL图像
73
+
74
+
75
+ # ========== 控制分类框数量 ==========
76
+ NUM_CLASSES = 4 # 可调节输入框数量
77
+
78
+ # ========== 构建Gradio界面 ==========
79
+ with gr.Blocks() as demo:
80
+ gr.Markdown("# YOLOWORLD 图像检测 Demo")
81
+
82
+ with gr.Row():
83
+ with gr.Column():
84
+ class1 = gr.Textbox(label="类别 0", value="person")
85
+ class2 = gr.Textbox(label="类别 1", value="dog")
86
+ class3 = gr.Textbox(label="类别 2", value="car")
87
+ class4 = gr.Textbox(label="类别 3", value="horse")
88
+
89
+ threshold_slider = gr.Slider(minimum=0.0, maximum=1.0, value=0.1, step=0.01, label="阈值")
90
+ image_input = gr.Image(type="pil", label="输入图片",height=415)
91
+ with gr.Column():
92
+ detect_button = gr.Button("检测")
93
+ image_output = gr.Image(type="pil", label="检测结果", height=800)
94
+
95
+ # 绑定事件
96
+ detect_button.click(
97
+ fn=detect_image,
98
+ inputs=[image_input, class1, class2, class3, class4, threshold_slider],
99
+ outputs=image_output
100
+ )
101
+
102
+ # ========== 启动 ==========
103
+ demo.launch(server_name="0.0.0.0")
pyyoloworld/pyaxdev.py ADDED
@@ -0,0 +1,149 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import ctypes
2
+ import os
3
+ import platform
4
+
5
+ lib_name = 'libyoloworld.so'
6
+
7
+ def check_error(code: int):
8
+ if code != 0:
9
+ raise Exception(f"API错误: {code}")
10
+
11
+ base_dir = os.path.dirname(__file__)
12
+ arch = platform.machine()
13
+
14
+ if arch == 'x86_64':
15
+ arch_dir = 'x86_64'
16
+ elif arch in ('aarch64', 'arm64'):
17
+ arch_dir = 'aarch64'
18
+ else:
19
+ raise RuntimeError(f"Unsupported architecture: {arch}")
20
+
21
+ lib_paths = [
22
+ os.path.join(base_dir, arch_dir, lib_name),
23
+ os.path.join(base_dir, lib_name)
24
+ ]
25
+
26
+ last_error = None
27
+ diagnostic_shown = set()
28
+
29
+ for lib_path in lib_paths:
30
+ try:
31
+ print(f"Trying to load: {lib_path}")
32
+ _lib = ctypes.CDLL(lib_path)
33
+ print(f"✅ Successfully loaded: {lib_path}")
34
+ break
35
+ except OSError as e:
36
+ last_error = e
37
+ err_str = str(e)
38
+ print(f"\n❌ Failed to load: {lib_path}")
39
+ print(f" {err_str}")
40
+
41
+ # Only show GLIBCXX tip once
42
+ if "GLIBCXX" in err_str and "not found" in err_str:
43
+ if "missing_glibcxx" not in diagnostic_shown:
44
+ diagnostic_shown.add("missing_glibcxx")
45
+ print("🔍 Detected missing GLIBCXX version in libstdc++.so.6")
46
+ print("💡 This usually happens when your environment (like Conda) uses an older libstdc++")
47
+ print(f"👉 Try running with system libstdc++ preloaded:")
48
+ print(f" export LD_PRELOAD=/usr/lib/{arch_dir}-linux-gnu/libstdc++.so.6\n")
49
+ elif "No such file" in err_str:
50
+ if "file_not_found" not in diagnostic_shown:
51
+ diagnostic_shown.add("file_not_found")
52
+ print("🔍 File not found. Please verify that libclip.so exists and the path is correct.\n")
53
+ elif "wrong ELF class" in err_str:
54
+ if "elf_mismatch" not in diagnostic_shown:
55
+ diagnostic_shown.add("elf_mismatch")
56
+ print("🔍 ELF class mismatch — likely due to architecture conflict (e.g., loading x86_64 .so on aarch64).")
57
+ print(f"👉 Run `file {lib_path}` to verify the binary architecture.\n")
58
+ else:
59
+ if "generic_error" not in diagnostic_shown:
60
+ diagnostic_shown.add("generic_error")
61
+ print("📎 Tip: Use `ldd` to inspect missing dependencies:")
62
+ print(f" ldd {lib_path}\n")
63
+ else:
64
+ raise RuntimeError(f"\n❗ Failed to load libclip.so.\nLast error:\n{last_error}")
65
+
66
+
67
+ # 定义枚举类型
68
+ class AxDeviceType(ctypes.c_int):
69
+ unknown_device = 0
70
+ host_device = 1
71
+ axcl_device = 2
72
+
73
+ # 定义结构体
74
+ class AxMemInfo(ctypes.Structure):
75
+ _fields_ = [
76
+ ('remain', ctypes.c_int),
77
+ ('total', ctypes.c_int)
78
+ ]
79
+
80
+ class AxHostInfo(ctypes.Structure):
81
+ _fields_ = [
82
+ ('available', ctypes.c_char),
83
+ ('version', ctypes.c_char * 32),
84
+ ('mem_info', AxMemInfo)
85
+ ]
86
+
87
+ class AxDeviceInfo(ctypes.Structure):
88
+ _fields_ = [
89
+ ('temp', ctypes.c_int),
90
+ ('cpu_usage', ctypes.c_int),
91
+ ('npu_usage', ctypes.c_int),
92
+ ('mem_info', AxMemInfo)
93
+ ]
94
+
95
+ class AxDevices(ctypes.Structure):
96
+ _fields_ = [
97
+ ('host', AxHostInfo),
98
+ ('host_version', ctypes.c_char * 32),
99
+ ('dev_version', ctypes.c_char * 32),
100
+ ('count', ctypes.c_ubyte),
101
+ ('devices_info', AxDeviceInfo * 16)
102
+ ]
103
+
104
+
105
+ _lib.ax_dev_enum_devices.argtypes = [ctypes.POINTER(AxDevices)]
106
+ _lib.ax_dev_enum_devices.restype = ctypes.c_int
107
+
108
+ _lib.ax_dev_sys_init.argtypes = [AxDeviceType, ctypes.c_char]
109
+ _lib.ax_dev_sys_init.restype = ctypes.c_int
110
+
111
+ _lib.ax_dev_sys_deinit.argtypes = [AxDeviceType, ctypes.c_char]
112
+ _lib.ax_dev_sys_deinit.restype = ctypes.c_int
113
+
114
+ def enum_devices():
115
+ devices = AxDevices()
116
+ check_error(_lib.ax_dev_enum_devices(ctypes.byref(devices)))
117
+
118
+ return {
119
+ 'host': {
120
+ 'available': bool(devices.host.available[0]),
121
+ 'version': devices.host.version.decode('utf-8'),
122
+ 'mem_info': {
123
+ 'remain': devices.host.mem_info.remain,
124
+ 'total': devices.host.mem_info.total
125
+ }
126
+ },
127
+ 'devices': {
128
+ 'host_version': devices.host_version.decode('utf-8'),
129
+ 'dev_version': devices.dev_version.decode('utf-8'),
130
+ 'count': devices.count,
131
+ 'devices_info': [{
132
+ 'temp': dev.temp,
133
+ 'cpu_usage': dev.cpu_usage,
134
+ 'npu_usage': dev.npu_usage,
135
+ 'mem_info': {
136
+ 'remain': dev.mem_info.remain,
137
+ 'total': dev.mem_info.total
138
+ }
139
+ } for dev in devices.devices_info[:devices.count]]
140
+ }
141
+ }
142
+
143
+
144
+ def sys_init(dev_type: AxDeviceType = AxDeviceType.axcl_device, devid: int = 0):
145
+ check_error(_lib.ax_dev_sys_init(dev_type, devid))
146
+
147
+
148
+ def sys_deinit(dev_type: AxDeviceType = AxDeviceType.axcl_device, devid: int = 0):
149
+ check_error(_lib.ax_dev_sys_deinit(dev_type, devid))
pyyoloworld/pyyoloworld.py ADDED
@@ -0,0 +1,135 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import ctypes
2
+ from typing import List, Tuple
3
+ import numpy as np
4
+ from pyaxdev import _lib, AxDeviceType, check_error
5
+
6
+
7
+ YOLOWORLD_CLASSES_NUM = 4
8
+ YOLOWORLD_CLASSES_MAX_LEN = 64
9
+
10
+ class YWInit(ctypes.Structure):
11
+ _fields_ = [
12
+ ('dev_type', AxDeviceType),
13
+ ('devid', ctypes.c_char),
14
+ ('text_encoder_path', ctypes.c_char * 128),
15
+ ('yoloworld_path', ctypes.c_char * 128),
16
+ ('tokenizer_path', ctypes.c_char * 128),
17
+ ('threshold', ctypes.c_float)
18
+ ]
19
+
20
+
21
+ class YWClasses(ctypes.Structure):
22
+ _fields_ = [
23
+ ("classes", ctypes.c_char * YOLOWORLD_CLASSES_MAX_LEN * YOLOWORLD_CLASSES_NUM),
24
+ ]
25
+
26
+
27
+ class YWImage(ctypes.Structure):
28
+ _fields_ = [
29
+ ('data', ctypes.POINTER(ctypes.c_ubyte)),
30
+ ('width', ctypes.c_int),
31
+ ('height', ctypes.c_int),
32
+ ('channels', ctypes.c_int),
33
+ ('stride', ctypes.c_int)
34
+ ]
35
+
36
+ class YWObject(ctypes.Structure):
37
+ _fields_ = [
38
+ ('label', ctypes.c_int),
39
+ ('score', ctypes.c_float),
40
+ ('x', ctypes.c_int),
41
+ ('y', ctypes.c_int),
42
+ ('w', ctypes.c_int),
43
+ ('h', ctypes.c_int),
44
+ ]
45
+
46
+ class YWObjects(ctypes.Structure):
47
+ _fields_ = [
48
+ ('objects', YWObject * 32),
49
+ ('num', ctypes.c_int),
50
+ ]
51
+
52
+ _lib.yw_create.argtypes = [ctypes.POINTER(YWInit), ctypes.POINTER(ctypes.c_void_p)]
53
+ _lib.yw_create.restype = ctypes.c_int
54
+
55
+ _lib.yw_destroy.argtypes = [ctypes.c_void_p]
56
+ _lib.yw_destroy.restype = ctypes.c_int
57
+
58
+ _lib.yw_set_classes.argtypes = [ctypes.c_void_p, ctypes.POINTER(YWClasses)]
59
+ _lib.yw_set_classes.restype = ctypes.c_int
60
+
61
+ _lib.yw_set_threshold.argtypes = [ctypes.c_void_p, ctypes.c_float]
62
+ _lib.yw_set_threshold.restype = ctypes.c_int
63
+
64
+ _lib.yw_detect.argtypes = [ctypes.c_void_p, ctypes.POINTER(YWImage), ctypes.POINTER(YWObjects)]
65
+ _lib.yw_detect.restype = ctypes.c_int
66
+
67
+
68
+ class YOLOWORLD:
69
+ def __init__(self, init_info: dict):
70
+ self.handle = None
71
+ self.init_info = YWInit()
72
+
73
+ # 设置初始化参数
74
+ self.init_info.dev_type = init_info.get('dev_type', AxDeviceType.axcl_device)
75
+ self.init_info.devid = init_info.get('devid', 0)
76
+ self.init_info.threshold = init_info.get('threshold', 0.1)
77
+
78
+ # 设置路径
79
+ for path_name in ['text_encoder_path', 'yoloworld_path', 'tokenizer_path']:
80
+ if path_name in init_info:
81
+ setattr(self.init_info, path_name, init_info[path_name].encode('utf-8'))
82
+
83
+ # 创建CLIP实例
84
+ handle = ctypes.c_void_p()
85
+ check_error(_lib.yw_create(ctypes.byref(self.init_info), ctypes.byref(handle)))
86
+ self.handle = handle
87
+
88
+ def __del__(self):
89
+ if self.handle:
90
+ _lib.yw_destroy(self.handle)
91
+
92
+ def set_classes(self, class_list: List[str]):
93
+ yw_classes = YWClasses()
94
+ for i, name in enumerate(class_list):
95
+ if i >= YOLOWORLD_CLASSES_NUM:
96
+ break
97
+ name_bytes = name.encode("utf-8")
98
+ if len(name_bytes) >= YOLOWORLD_CLASSES_MAX_LEN:
99
+ raise ValueError(f"Class name '{name}' too long (max {YOLOWORLD_CLASSES_MAX_LEN - 1})")
100
+ # 清零整行(可省略,默认值已是0)
101
+ for j in range(YOLOWORLD_CLASSES_MAX_LEN):
102
+ yw_classes.classes[i][j] = 0
103
+ # 拷贝字符串
104
+ for j in range(len(name_bytes)):
105
+ yw_classes.classes[i][j] = name_bytes[j]
106
+
107
+ check_error(_lib.yw_set_classes(self.handle, ctypes.byref(yw_classes), 0))
108
+
109
+ def set_threshold(self, threshold):
110
+ check_error(_lib.yw_set_threshold(self.handle, threshold))
111
+
112
+ def detect(self, image_data: np.ndarray) -> None:
113
+ image = YWImage()
114
+ image.data = ctypes.cast(image_data.ctypes.data, ctypes.POINTER(ctypes.c_ubyte))
115
+ image.width = image_data.shape[1]
116
+ image.height = image_data.shape[0]
117
+ image.channels = image_data.shape[2]
118
+ image.stride = image_data.shape[1] * image_data.shape[2]
119
+
120
+ objects = YWObjects()
121
+ check_error(_lib.yw_detect(self.handle, ctypes.byref(image), ctypes.byref(objects)))
122
+
123
+ ret = []
124
+ for i in range(objects.num):
125
+ ret.append({
126
+ 'label': objects.objects[i].label,
127
+ 'score': objects.objects[i].score,
128
+ 'x': objects.objects[i].x,
129
+ 'y': objects.objects[i].y,
130
+ 'w': objects.objects[i].w,
131
+ 'h': objects.objects[i].h,
132
+ })
133
+ return ret
134
+
135
+
pyyoloworld/requirements.txt ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ gradio
2
+ opencv-python
3
+ tqdm
4
+ Pillow
result.png ADDED

Git LFS Details

  • SHA256: 536a1e0c395db4050a9943fccea990f68383a2ac81b906a5d63ecf62a808fb98
  • Pointer size: 131 Bytes
  • Size of remote file: 583 kB
vocab.txt ADDED
The diff for this file is too large to render. See raw diff