ligeng-zhu-nv commited on
Commit
f5bc2e9
·
verified ·
1 Parent(s): 85875a0

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +36 -21
README.md CHANGED
@@ -66,32 +66,47 @@ pip install -U huggingface_hub
66
  huggingface-cli download nvidia/EGM-8B --local-dir ./models/EGM-8B
67
  ```
68
 
69
- ### Evaluation
70
 
71
- ```bash
72
- pip install sglang==0.5.5
73
 
74
- export BASE_DIR=$(pwd)
75
- export MODEL_PATH="${BASE_DIR}/models/EGM-8B"
76
- export DATA_JSON="${BASE_DIR}/data/EGM_Datasets/metadata/eval/refcoco+_testA.jsonl"
77
- export OUTPUT_DIR="${BASE_DIR}/result/"
78
- export BASE_IMG_DIR="${BASE_DIR}"
79
 
80
- cd verl
81
- bash scripts/sglang_infer.sh
 
 
82
  ```
83
 
84
- vLLM is also supported:
85
-
86
- ```bash
87
- export BASE_DIR=$(pwd)
88
- export MODEL_PATH="${BASE_DIR}/models/EGM-8B"
89
- export DATA_JSON="${BASE_DIR}/data/EGM_Datasets/metadata/eval/refcoco+_testA.jsonl"
90
- export OUTPUT_DIR="${BASE_DIR}/result/"
91
- export BASE_IMG_DIR="${BASE_DIR}"
92
-
93
- cd verl
94
- bash scripts/vllm_infer.sh
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
95
  ```
96
 
97
  ## Model Architecture
 
66
  huggingface-cli download nvidia/EGM-8B --local-dir ./models/EGM-8B
67
  ```
68
 
69
+ ### Inference with SGLang
70
 
71
+ Launch the server:
 
72
 
73
+ ```bash
74
+ pip install "sglang[all]>=0.5.5"
 
 
 
75
 
76
+ python -m sglang.launch_server \
77
+ --model-path nvidia/EGM-8B \
78
+ --chat-template=qwen3-vl \
79
+ --port 30000
80
  ```
81
 
82
+ Send a visual grounding request:
83
+
84
+ ```python
85
+ import openai
86
+ import base64
87
+
88
+ client = openai.Client(base_url="http://127.0.0.1:30000/v1", api_key="EMPTY")
89
+
90
+ # Load a local image as base64
91
+ with open("example.jpg", "rb") as f:
92
+ image_base64 = base64.b64encode(f.read()).decode("utf-8")
93
+
94
+ response = client.chat.completions.create(
95
+ model="nvidia/EGM-8B",
96
+ messages=[
97
+ {
98
+ "role": "user",
99
+ "content": [
100
+ {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_base64}"}},
101
+ {"type": "text", "text": "Please provide the bounding box coordinate of the region this sentence describes: the person on the left."},
102
+ ],
103
+ }
104
+ ],
105
+ temperature=0.6,
106
+ top_p=0.95,
107
+ max_tokens=8192,
108
+ )
109
+ print(response.choices[0].message.content)
110
  ```
111
 
112
  ## Model Architecture