File size: 2,610 Bytes
38fb1f6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
# Hierarchical System ROS2

## 0- Preface

### What we have

- The paths and names of the main logic (Python) folders and files are as follows:

```
src/
   └── g0_vlm_node/
        └── g0_vlm_node
            β”œβ”€β”€ utils/                  # Stores functions related to Gemini API processing
            └── vlm_main.py             # Core logic for VLM service provision
```
- Note: In the above package:
    - vlm_main.py

### Development Log

- VLM 
    1. Format the String so that the JSON string sent by EHI is converted into a structured string.
    2. Support the cache switch for receiving repeated instruction from EHI.
    3. Support parameterized startup, using `--use-qwen` and `--no-use-qwen` to control model usage, with Gemini as the default.



## 1- Install

1. Install Python dependencies

Refer to https://github.com/whitbrunn/G0

2. Compile the workspace

Clone the `src/` folder to the local workspace under `TO/YOUR/WORKSPACE/`, then run:

```
cd TO/YOUR/WORKSPACE/
colcon build --symlink-install --cmake-args -DPython3_ROOT_DIR=$CONDA_PREFIX
```

Note:

Use `ros2 pkg list | grep PACK_NAME` to check if the following ROS packages exist: 
- `g0_vlm_node`

## 2- Usage

1. Set your VLM API key

```
export VLM_API_KEY=<YOUR_GEMINI_API_KEY> 
export VLM_API_KEY_QWEN=<YOUR_QWEN_API_KEY> 
```

2. Start the VLM Node

1.1 First configure the proxy according to the environment (necessary for Gemini, if using the qwen version, skip to 1.3)


```
export https_proxy=http://127.0.0.1:<PORT>
export http_proxy=http://127.0.0.1:<PORT>
export all_proxy=http://127.0.0.1:<PORT>
```
1.2 Verify if the external network is accessible

```
curl -I www.google.com
```

Expected output (partial):

```
HTTP/1.1 200 OK
Transfer-Encoding: chunked
Cache-Control: private
Connection: keep-alive
```

1.3 After confirming the above step is OK, start the VLM node

```
ros2 run g0_vlm_node vlm_main
```

*If using the qwen model inference:
```
unset http_proxy
unset https_proxy
unset all_proxy
ros2 run g0_vlm_node vlm_main -- --use-qwen
```


## 3- What you expect

- VLM receives a Send request output, e.g.,

```
2025-11-05 07:40:33.230 | INFO     | g0_vlm_node.vlm_main:vlm_processor1:153 - One hp successfully processed: ε°†ε’–ε•‘η½η”¨ε³ζ‰‹ζ”Ύεˆ°ζ‰˜η›˜δΈŠ -> [Low]: Pick up the coffee can with the right hand and place it on the tray.!
```

- VLM receives a confirm request, e.g.,

```
2025-11-05 07:40:47.641 | INFO     | g0_vlm_node.vlm_main:vlm_processor2:169 - One hp_ successfully sent to VLA: [Low]: Pick up the coffee can with the right hand and place it on the tray.!
```