LIU1712 commited on
Commit
4de7dd6
·
verified ·
1 Parent(s): 52b71bc

Upload 8 files

Browse files
README.md CHANGED
@@ -1,3 +1,234 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ library_name: pytorch
4
+ tags:
5
+ - pytorch
6
+ - computer-vision
7
+ - multimodal
8
+ - chart-understanding
9
+ - data-extraction
10
+ - summarization
11
+ - cvpr-2026
12
  ---
13
+
14
+ <a id="top"></a>
15
+ <div align="center">
16
+ <h1>🚀 ChartLens @ CVPR 2026 DataMFM Chart Understanding Challenge</h1>
17
+
18
+ <p>
19
+ <b>Hao Liu</b><sup>1</sup>&nbsp;
20
+ <b>Ruping Cao</b><sup>1</sup>&nbsp;
21
+ <b>Kun Wang</b><sup>1</sup>&nbsp;
22
+ <b>Zhiran Li</b><sup>1</sup>&nbsp;
23
+ <b>Fan Liu</b><sup>2</sup>&nbsp;
24
+ <b>Yupeng Hu</b><sup>1</sup>&nbsp;
25
+ <b>Liqiang Nie</b><sup>3</sup>
26
+ </p>
27
+
28
+ <p>
29
+ <sup>1</sup>Shandong University<br>
30
+ <sup>2</sup>Southeast University<br>
31
+ <sup>3</sup>Harbin Institute of Technology (Shenzhen)
32
+ </p>
33
+ </div>
34
+
35
+ These are the official implementation resources, model weights, and prediction files for **ChartLens**, our champion solution for **DataMFM Challenge Track 2: Chart Understanding** at CVPR 2026.
36
+
37
+ 🔗 **Paper:** [Arxiv](https://arxiv.org/pdf/2606.10640)
38
+ 🔗 **GitHub Repository:** [iLearnLab/CVPRW26-ChartLens](https://github.com/iLearnLab/CVPRW26-ChartLens)
39
+ 🔗 **Challenge Page:** [DataMFM Challenge](https://datamfm.github.io/challenge.html)
40
+
41
+ ---
42
+
43
+ ## 📌 Model Information
44
+
45
+ ### 1. Model Name
46
+ **ChartLens: A Dual-Branch Framework for Chart Data Correction and Factual Summary Refinement**
47
+
48
+ ### 2. Task Type & Applicable Tasks
49
+ - **Task Type:** Chart Understanding / Multimodal Document Understanding
50
+ - **Applicable Tasks:** Chart-to-CSV extraction and chart-to-summary generation from chart images.
51
+
52
+ ### 3. Project Introduction
53
+ Chart understanding requires models to recover structured chart data and generate faithful natural-language summaries from chart images. **ChartLens** addresses these complementary goals with a dual-branch, verification-guided correction framework.
54
+
55
+ > 💡 **Method Highlight:** ChartLens combines Granite-Vision-4.1-4B LoRA adaptation with two correction branches: **Structure-Aware CSV Verification and Correction (SAVC)** for reliable table recovery, and **Text-Retention-Guided Summary Refinement (TRSR)** for OCR-assisted factual summary repair. SAVC checks structure, completeness, and numerical accuracy, while TRSR preserves visible chart text such as titles, legends, annotations, sources, and numerical evidence.
56
+
57
+ ### 4. Training Data Source
58
+ - Released ChartNet-based training data for LoRA adaptation.
59
+ - DataMFM Challenge chart understanding splits, including `real` and `synthetic` chart images.
60
+
61
+ ### 5. Challenge Results
62
+
63
+ | Method | CSV Numeric F1 | CSV Structural Score | Summary ROUGE-L | Summary Numeric Fact F1 | Overall |
64
+ |--------|---------------:|---------------------:|----------------:|------------------------:|--------:|
65
+ | **ChartLens (Ours)** | **80.62** | **75.66** | **45.57** | **74.55** | **69.10** |
66
+
67
+ ChartLens ranked **1st place** on DataMFM Challenge Track 2.
68
+
69
+ ---
70
+
71
+ ## 🚀 Usage & Basic Inference
72
+
73
+ ### Step 1: Prepare the Environment
74
+
75
+ Clone the GitHub repository and set up the Conda environment:
76
+
77
+ ```bash
78
+ git clone https://github.com/iLearnLab/CVPRW26-ChartLens.git
79
+ cd CVPRW26-ChartLens
80
+ ```
81
+
82
+ ```bash
83
+ conda create -n chartlens python=3.10 -y
84
+ conda activate chartlens
85
+ pip install -r requirements.txt
86
+ ```
87
+
88
+ ### Step 2: Data & Weights Preparation
89
+
90
+ 1. **Challenge Data:** Use the datasets and splits released by the [DataMFM Challenge](https://datamfm.github.io/challenge.html). The chart understanding track contains `real` and `synthetic` splits.
91
+ 2. **ChartLens Checkpoints:** Download the model weights from this Hugging Face repository.
92
+ 3. **Granite Vision Backbone:** Prepare the Granite-Vision-4.1-4B backbone and update the local `--model_path` argument when running inference.
93
+
94
+ To prepare ChartNet SFT data for LoRA training:
95
+
96
+ ```bash
97
+ python code/load_chartnet_500.py \
98
+ --out_dir Fine-tuning/Dataset/raw \
99
+ --num_samples 500
100
+
101
+ python code/build_chartnet_sft.py \
102
+ --gt_path Fine-tuning/Dataset/raw/gt.jsonl \
103
+ --image_dir Fine-tuning/Dataset/raw/images \
104
+ --out_dir Fine-tuning/Dataset/sft \
105
+ --csv_repeat 2 \
106
+ --summary_repeat 1
107
+ ```
108
+
109
+ ### Step 3: Run Granite Vision + LoRA Inference
110
+
111
+ ```bash
112
+ python code/infer_granite_with_lora.py \
113
+ --image_root /path/to/data \
114
+ --out_root /path/to/output \
115
+ --model_path /path/to/granite-vision-4.1-4b \
116
+ --lora_path /path/to/chartlens_lora \
117
+ --gpu_id 0 \
118
+ --splits real synthetic
119
+ ```
120
+
121
+ Use `code/infer_chartnet_granite.py` for base Granite Vision inference without a LoRA adapter.
122
+
123
+ ### Step 4: SAVC CSV Correction
124
+
125
+ ```bash
126
+ export OPENAI_API_KEY="..."
127
+
128
+ python code/calibrate_baseline_with_ai.py \
129
+ --split all \
130
+ --baseline_root /path/to/baseline_predictions \
131
+ --image_root /path/to/data \
132
+ --output_root /path/to/savc_output \
133
+ --base_url "https://your-openai-compatible-endpoint" \
134
+ --model gemini-3.5-flash \
135
+ --threshold 85
136
+ ```
137
+
138
+ `--baseline_root` should contain split directories such as `real/` and `synthetic/`, each with `chart2csv_predictions.jsonl` and `chart2summary_predictions.jsonl`.
139
+
140
+ ### Step 5: TRSR Summary Refinement
141
+
142
+ ```bash
143
+ python code/ocr.py \
144
+ --real_images /path/to/data/real/images \
145
+ --synthetic_images /path/to/data/synthetic/images \
146
+ --real_summary /path/to/baseline/real/chart2summary_predictions.jsonl \
147
+ --synthetic_summary /path/to/baseline/synthetic/chart2summary_predictions.jsonl \
148
+ --output_dir /path/to/ocr_text_copy_coverage \
149
+ --threshold 0.8
150
+
151
+ export AIGCBEST_API_KEY="..."
152
+
153
+ python code/repair_summary.py \
154
+ --split all \
155
+ --workers 20 \
156
+ --ocr_eval_root /path/to/ocr_text_copy_coverage \
157
+ --output_root /path/to/trsr_output
158
+ ```
159
+
160
+ ### Step 6: Training (Optional)
161
+
162
+ Train the LoRA adapter on the prepared ChartNet SFT data:
163
+
164
+ ```bash
165
+ python code/train_lora_chartnet.py \
166
+ --model_path /path/to/granite-vision-4.1-4b \
167
+ --train_jsonl Fine-tuning/Dataset/sft/train.jsonl \
168
+ --val_jsonl Fine-tuning/Dataset/sft/val.jsonl \
169
+ --output_dir Fine-tuning/FT/model/granite_chartnet_lora_bs2 \
170
+ --gpu_id 0 \
171
+ --epochs 2 \
172
+ --batch_size 1 \
173
+ --grad_accum 8
174
+ ```
175
+
176
+ ---
177
+
178
+ ## 📦 Submission Format
179
+
180
+ For DataMFM Track 2, organize the final predictions as:
181
+
182
+ ```bash
183
+ submission.zip
184
+ ├── real/
185
+ │ ├── chart2csv_predictions.jsonl
186
+ │ └── chart2summary_predictions.jsonl
187
+ └── synthetic/
188
+ ├── chart2csv_predictions.jsonl
189
+ └── chart2summary_predictions.jsonl
190
+ ```
191
+
192
+ Each CSV prediction line:
193
+
194
+ ```json
195
+ {"imagename": "example.png", "predicted_csv": "Header A,Header B\nA,1\nB,2"}
196
+ ```
197
+
198
+ Each summary prediction line:
199
+
200
+ ```json
201
+ {"imagename": "example.png", "predicted_summary": "One paragraph summary grounded in the chart."}
202
+ ```
203
+
204
+ ---
205
+
206
+ ## ⚠️ Limitations & Notes
207
+
208
+ **Disclaimer:** This framework and its model weights are intended for **academic research purposes only**.
209
+
210
+ - Chart-to-CSV extraction may still struggle with dense layouts, asymmetric legends, or adjacent semantic-column misalignment.
211
+ - Summary refinement depends on OCR quality; OCR errors can affect text-retention scoring and repair decisions.
212
+ - GPU execution is expected for Granite Vision inference and LoRA training.
213
+ - API-backed correction scripts require valid credentials and an OpenAI-compatible endpoint.
214
+
215
+ ---
216
+
217
+ ## 🤝 Acknowledgements & Contact
218
+
219
+ - **Contact:** If you have any questions or encounter issues, feel free to contact Hao Liu at liuh90210@gmail.com or Ruping Cao at caoruping657@gmail.com.
220
+
221
+ ---
222
+
223
+ ## 📝⭐️ Citation
224
+
225
+ If you find this project useful for your research, please consider citing:
226
+
227
+ ```bibtex
228
+ @article{liu2026chartlens,
229
+ title={ChartLens: A Dual-Branch Framework for Chart Data Correction and Factual Summary Refinement},
230
+ author={Liu, Hao and Cao, Ruping and Wang, Kun and Li, Zhiran and Liu, Fan and Hu, Yupeng and Nie, Liqiang},
231
+ journal={arXiv preprint arXiv:2606.10640},
232
+ year={2026}
233
+ }
234
+ ```
adapter_config.json ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alora_invocation_tokens": null,
3
+ "alpha_pattern": {},
4
+ "arrow_config": null,
5
+ "auto_mapping": null,
6
+ "base_model_name_or_path": "/data2/caoruping/DataMFM/models/granite-vision-4.1-4b",
7
+ "bias": "none",
8
+ "corda_config": null,
9
+ "ensure_weight_tying": false,
10
+ "eva_config": null,
11
+ "exclude_modules": null,
12
+ "fan_in_fan_out": false,
13
+ "inference_mode": true,
14
+ "init_lora_weights": true,
15
+ "layer_replication": null,
16
+ "layers_pattern": null,
17
+ "layers_to_transform": null,
18
+ "loftq_config": {},
19
+ "lora_alpha": 32,
20
+ "lora_bias": false,
21
+ "lora_dropout": 0.05,
22
+ "lora_ga_config": null,
23
+ "megatron_config": null,
24
+ "megatron_core": "megatron.core",
25
+ "modules_to_save": null,
26
+ "peft_type": "LORA",
27
+ "peft_version": "0.19.1",
28
+ "qalora_group_size": 16,
29
+ "r": 16,
30
+ "rank_pattern": {},
31
+ "revision": null,
32
+ "target_modules": [
33
+ "down_proj",
34
+ "query",
35
+ "out_linear",
36
+ "up_proj",
37
+ "gate_proj",
38
+ "key",
39
+ "fc2",
40
+ "dense",
41
+ "out_proj",
42
+ "q_proj",
43
+ "o_proj",
44
+ "v_proj",
45
+ "value",
46
+ "k_proj",
47
+ "fc1"
48
+ ],
49
+ "target_parameters": null,
50
+ "task_type": "CAUSAL_LM",
51
+ "trainable_token_indices": null,
52
+ "use_bdlora": null,
53
+ "use_dora": false,
54
+ "use_qalora": false,
55
+ "use_rslora": false
56
+ }
adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fb26025f6a6e8fbcb91583aae8819aa2f7ca8a002f61bd784fd020262694507a
3
+ size 175974064
chat_template.jinja ADDED
@@ -0,0 +1,180 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {#- ===== Task tag prompt constants ===== -#}
2
+ {%- set chart2code_prompt = "Generate code that recreates the chart as best as possible." -%}
3
+ {%- set chart2csv_prompt = "Please examine this chart image. Consider you are a data visualization expert, and extract the data into a CSV table.\n\nYour CSV should:\n- Include a header row with clear column names\n- Represent all data series/categories shown in the chart\n- Use numeric values that match the chart as closely as possible\n\nOutput only the CSV data, nothing else." -%}
4
+ {%- set chart2summary_prompt = "Can you describe this chart image?" -%}
5
+ {%- set tables_json_prompt = "Identify and extract the table schema\n Extract the schema of all the tables in the image sorted according to the reading order.\nThe output must be a valid JSON object containing a list of dictionaries with the following structure:\n\n {\n \"dimensions\": {\n \"rows\": <number of data rows (excluding header rows)>,\n \"columns\": <number of columns>,\n \"header_rows\": <number of header rows>,\n \"total_rows\": <total number of rows including headers>\n },\n \"cells\": [\n {\n \"row\": <row index starting at 1>,\n \"col\": <column index starting at 1>,\n \"colspan\": <number of columns spanned>,\n \"rowspan\": <number of rows spanned>,\n \"type\": \"<'header' or 'data'>\",\n \"header_level\": <header nesting level if type=header, else omit or null>,\n \"content\": \"<string content of the cell>\"\n },\n ...\n ]\n }" -%}
6
+ {%- set tables_html_prompt = "Identify and extract the table schema\n Extract the schema of all the tables in the image sorted according to the reading order.\nThe output must be a list of valid HTML tables" -%}
7
+ {%- set tables_otsl_prompt = "Identify and extract the table schema\n Extract the schema of all the tables in the image sorted according to the reading order.\nThe output must be a list of valid OTSL objects, each consists of the following fields: \n <fcel> - a cell with content in it\n <ecel> - an empty cell\n <lcel> - a cell that is merged with the cell to its left\n <ucel> - a cell that is merged with the cell above it\n <xcel> - a cell that is merged with both the cell above it and the cell to its left\n <nl> - a new line\n <ched> - a column header\n <otsl> - the beginning of the OTSL table\n </otsl> - the end of the OTSL table\n\n An example for an output:\n [\n <otsl><ched>first table header1<ched>first table header2<nl><fcel>data1<fcel>data2<nl><fcel>data with horizontal span<lcel><nl><fcel>data with vertical span<ecel><nl><ucel><fcel>data3<nl></otsl>,\n <otsl><ched>second table header1<ched>second table header2<nl><fcel>data1<fcel>data2<nl><fcel>data with horizontal span<lcel><nl><fcel>data with vertical span<ecel><nl><ucel><fcel>data3<nl></otsl>\n ]" -%}
8
+
9
+
10
+ {#- ===== Tag expansion dispatcher ===== -#}
11
+ {%- macro expand_tags(text) -%}
12
+ {%- set has_image = "<image>" in text -%}
13
+ {#- Determine image position: prefix if <image> appears before the tag, suffix if after -#}
14
+ {%- if has_image -%}
15
+ {%- set img_idx = text.index("<image>") -%}
16
+ {%- if "<chart2code>" in text -%}{%- set tag_idx = text.index("<chart2code>") -%}
17
+ {%- elif "<chart2csv>" in text -%}{%- set tag_idx = text.index("<chart2csv>") -%}
18
+ {%- elif "<chart2summary>" in text -%}{%- set tag_idx = text.index("<chart2summary>") -%}
19
+ {%- elif "<tables_json>" in text -%}{%- set tag_idx = text.index("<tables_json>") -%}
20
+ {%- elif "<tables_html>" in text -%}{%- set tag_idx = text.index("<tables_html>") -%}
21
+ {%- elif "<tables_otsl>" in text -%}{%- set tag_idx = text.index("<tables_otsl>") -%}
22
+ {%- else -%}{%- set tag_idx = 999999 -%}
23
+ {%- endif -%}
24
+ {%- set img_prefix = "<image>\n" if img_idx < tag_idx else "" -%}
25
+ {%- set img_suffix = "<image>\n" if img_idx >= tag_idx else "" -%}
26
+ {%- else -%}
27
+ {%- set img_prefix = "" -%}
28
+ {%- set img_suffix = "" -%}
29
+ {%- endif -%}
30
+ {%- if "<chart2code>" in text -%}
31
+ {{- img_prefix + chart2code_prompt + img_suffix -}}
32
+ {%- elif "<chart2csv>" in text -%}
33
+ {{- img_prefix + chart2csv_prompt + img_suffix -}}
34
+ {%- elif "<chart2summary>" in text -%}
35
+ {{- img_prefix + chart2summary_prompt + img_suffix -}}
36
+ {%- elif "<tables_json>" in text -%}
37
+ {{- img_prefix + tables_json_prompt + img_suffix -}}
38
+ {%- elif "<tables_html>" in text -%}
39
+ {{- img_prefix + tables_html_prompt + img_suffix -}}
40
+ {%- elif "<tables_otsl>" in text -%}
41
+ {{- img_prefix + tables_otsl_prompt + img_suffix -}}
42
+ {%- else -%}
43
+ {{- text -}}
44
+ {%- endif -%}
45
+ {%- endmacro -%}
46
+
47
+ {#- ===== Original chat template ===== -#}
48
+ {% macro render_content(x) %}
49
+ {%- if x is string %}
50
+ {{ x }}
51
+ {%- else %}
52
+ {%- for chunk in x %}
53
+ {%- if chunk['type'] == 'text' -%}
54
+ {{ chunk['text']}}
55
+ {%- elif chunk['type'] == 'image' -%}
56
+ {{- "<image>
57
+ " }}
58
+ {%- endif -%}
59
+ {%- endfor -%}
60
+ {%- endif -%}
61
+ {% endmacro %}
62
+
63
+ {%- set tools_system_message_prefix = 'You are a helpful assistant with access to the following tools. You may call one or more tools to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>' %}
64
+ {%- set tools_system_message_suffix = '\n</tools>\n\nFor each tool call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call>. If a tool does not exist in the provided list of tools, notify the user that you do not have the ability to fulfill the request.' %}
65
+ {%- set documents_system_message_prefix = 'You are a helpful assistant with access to the following documents. You may use one or more documents to assist with the user query.\n\nYou are given a list of documents within <documents></documents> XML tags:\n<documents>' %}
66
+ {%- set documents_system_message_suffix = '\n</documents>\n\nWrite the response to the user\'s input by strictly aligning with the facts in the provided documents. If the information needed to answer the question is not available in the documents, inform the user that the question cannot be answered based on the available data.' %}
67
+ {%- set g4_default_system_message = 'You are a helpful assistant. Please ensure responses are professional, accurate, and safe.' %}
68
+ {%- if available_tools is defined and available_tools %}
69
+ {%- set tools = available_tools %}
70
+ {%- endif %}
71
+ {%- set ns = namespace(tools_system_message=tools_system_message_prefix,
72
+ documents_system_message=documents_system_message_prefix,
73
+ default_system_message=g4_default_system_message,
74
+ system_message=''
75
+ ) %}
76
+ {%- if tools %}
77
+ {%- for tool in tools %}
78
+ {%- set ns.tools_system_message = ns.tools_system_message + '\n' + (tool | tojson) %}
79
+ {%- endfor %}
80
+ {%- set ns.tools_system_message = ns.tools_system_message + tools_system_message_suffix %}
81
+ {%- else %}
82
+ {%- set ns.tools_system_message = '' %}
83
+ {%- endif %}
84
+ {%- if documents %}
85
+ {%- for document in documents %}
86
+ {%- set ns.documents_system_message = ns.documents_system_message + '\n' + (document | tojson) %}
87
+ {%- endfor %}
88
+ {%- set ns.documents_system_message = ns.documents_system_message + documents_system_message_suffix %}
89
+ {%- else %}
90
+ {%- set ns.documents_system_message = '' %}
91
+ {%- endif %}
92
+ {%- if messages[0].role == 'system' %}
93
+ {%- if messages[0].content is string %}
94
+ {%- set ns.system_message = messages[0].content %}
95
+ {%- elif messages[0].content is iterable %}
96
+ {%- for entry in messages[0].content %}
97
+ {%- if entry.type== 'text' %}
98
+ {%- if ns.system_message != '' %}
99
+ {%- set ns.system_message = ns.system_message + '\n' %}
100
+ {%- endif %}
101
+ {%- set ns.system_message = ns.system_message + entry.text %}
102
+ {%- endif %}
103
+ {%- endfor %}
104
+ {%- endif %}
105
+ {%- if tools and documents %}
106
+ {%- set ns.system_message = ns.system_message + '\n\n' + ns.tools_system_message + '\n\n' + ns.documents_system_message %}
107
+ {%- elif tools %}
108
+ {%- set ns.system_message = ns.system_message + '\n\n' + ns.tools_system_message %}
109
+ {%- elif documents %}
110
+ {%- set ns.system_message = ns.system_message + '\n\n' + ns.documents_system_message %}
111
+ {%- endif %}
112
+ {%- else %}
113
+ {%- if tools and documents %}
114
+ {%- set ns.system_message = ns.tools_system_message + '\n\n' + ns.documents_system_message %}
115
+ {%- elif tools %}
116
+ {%- set ns.system_message = ns.tools_system_message %}
117
+ {%- elif documents %}
118
+ {%- set ns.system_message = ns.documents_system_message %}
119
+ {%- endif %}
120
+ {%- endif %}
121
+ {%- if ns.system_message %}
122
+ {{- '<|start_of_role|>system<|end_of_role|>' + ns.system_message + '<|end_of_text|>\n' }}
123
+ {%- else %}
124
+ {{- '<|start_of_role|>system<|end_of_role|>' + ns.default_system_message + '<|end_of_text|>\n' }}
125
+ {%- endif %}
126
+ {%- for message in messages %}
127
+ {%- set content = namespace(val='') %}
128
+ {%- if render_content(message['content']) is string %}
129
+ {%- set content.val = render_content(message['content']) %}
130
+ {%- else %}
131
+ {%- if render_content(message['content']) is iterable %}
132
+ {%- for entry in render_content(message['content']) %}
133
+ {%- if entry.type== 'text' %}
134
+ {%- if content.val != '' %}
135
+ {%- set content.val = content.val + '\n' %}
136
+ {%- endif %}
137
+ {%- set content.val = content.val + entry.text %}
138
+ {%- endif %}
139
+ {%- endfor %}
140
+ {%- endif %}
141
+ {%- endif %}
142
+ {%- if (message.role == 'user') or (message.role == 'system' and not loop.first) %}
143
+ {{- '<|start_of_role|>' + message.role + '<|end_of_role|>' + expand_tags(content.val) + '<|end_of_text|>\n' }}
144
+ {%- elif message.role == 'assistant' %}
145
+ {{- '<|start_of_role|>' + message.role + '<|end_of_role|>' + content.val }}
146
+ {%- if message.tool_calls %}
147
+ {%- for tool_call in message.tool_calls %}
148
+ {%- if (loop.first and content.val) or (not loop.first) %}
149
+ {{- '\n' }}
150
+ {%- endif %}
151
+ {%- if tool_call.function %}
152
+ {%- set tool_call = tool_call.function %}
153
+ {%- endif %}
154
+ {{- '<tool_call>\n{"name": "' }}
155
+ {{- tool_call.name }}
156
+ {{- '", "arguments": ' }}
157
+ {%- if tool_call.arguments is string %}
158
+ {{- tool_call.arguments }}
159
+ {%- else %}
160
+ {{- tool_call.arguments | tojson }}
161
+ {%- endif %}
162
+ {{- '}\n</tool_call>' }}
163
+ {%- endfor %}
164
+ {%- endif %}
165
+ {{- '<|end_of_text|>\n' }}
166
+ {%- elif message.role == 'tool' %}
167
+ {%- if loop.first or (messages[loop.index0 - 1].role != 'tool') %}
168
+ {{- '<|start_of_role|>user<|end_of_role|>' }}
169
+ {%- endif %}
170
+ {{- '\n<tool_response>\n' }}
171
+ {{- content.val }}
172
+ {{- '\n</tool_response>' }}
173
+ {%- if loop.last or (messages[loop.index0 + 1].role != 'tool') %}
174
+ {{- '<|end_of_text|>\n' }}
175
+ {%- endif %}
176
+ {%- endif %}
177
+ {%- endfor %}
178
+ {%- if add_generation_prompt %}
179
+ {{- '<|start_of_role|>assistant<|end_of_role|>' }}
180
+ {%- endif %}
processing.py ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from fractions import Fraction
2
+
3
+ from transformers import LlavaNextProcessor
4
+ from transformers.image_processing_utils import select_best_resolution
5
+
6
+
7
+
8
+ class Granite4VisionProcessor(LlavaNextProcessor):
9
+ model_type = "granite4_vision"
10
+
11
+ def __init__(
12
+ self,
13
+ image_processor=None,
14
+ tokenizer=None,
15
+ patch_size=None,
16
+ vision_feature_select_strategy=None,
17
+ chat_template=None,
18
+ image_token="<image>", # set the default and let users change if they have peculiar special tokens in rare cases
19
+ num_additional_image_tokens=0,
20
+ downsample_rate=None,
21
+ **kwargs,
22
+ ):
23
+ super().__init__(image_processor=image_processor,
24
+ tokenizer=tokenizer,
25
+ patch_size=patch_size,
26
+ vision_feature_select_strategy=vision_feature_select_strategy,
27
+ chat_template=chat_template,
28
+ image_token=image_token,
29
+ num_additional_image_tokens=num_additional_image_tokens,
30
+ )
31
+ self.downsample_rate = downsample_rate
32
+
33
+ def _get_number_of_features(self, orig_height: int, orig_width: int, height: int, width: int) -> int:
34
+ image_grid_pinpoints = self.image_processor.image_grid_pinpoints
35
+
36
+ height_best_resolution, width_best_resolution = select_best_resolution(
37
+ [orig_height, orig_width], image_grid_pinpoints
38
+ )
39
+ scale_height, scale_width = height_best_resolution // height, width_best_resolution // width
40
+
41
+ patches_height = height // self.patch_size
42
+ patches_width = width // self.patch_size
43
+ if self.downsample_rate is not None:
44
+ ds_rate = Fraction(self.downsample_rate)
45
+ patches_height = int(patches_height * ds_rate)
46
+ patches_width = int(patches_width * ds_rate)
47
+
48
+ unpadded_features, newline_features = self._get_unpadded_features(
49
+ orig_height, orig_width, patches_height, patches_width, scale_height, scale_width
50
+ )
51
+ # The base patch covers the entire image (+1 for the CLS)
52
+ base_features = patches_height * patches_width + self.num_additional_image_tokens
53
+ num_image_tokens = unpadded_features + newline_features + base_features
54
+ return num_image_tokens
processor_config.json ADDED
@@ -0,0 +1,153 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "auto_map": {
3
+ "AutoProcessor": "processing.Granite4VisionProcessor"
4
+ },
5
+ "downsample_rate": "4/8",
6
+ "image_processor": {
7
+ "auto_map": {
8
+ "AutoProcessor": "processing.Granite4VisionProcessor"
9
+ },
10
+ "crop_size": {
11
+ "height": 384,
12
+ "width": 384
13
+ },
14
+ "do_center_crop": true,
15
+ "do_convert_rgb": true,
16
+ "do_normalize": true,
17
+ "do_pad": true,
18
+ "do_rescale": true,
19
+ "do_resize": true,
20
+ "image_grid_pinpoints": [
21
+ [
22
+ 384,
23
+ 384
24
+ ],
25
+ [
26
+ 384,
27
+ 768
28
+ ],
29
+ [
30
+ 384,
31
+ 1152
32
+ ],
33
+ [
34
+ 384,
35
+ 1536
36
+ ],
37
+ [
38
+ 384,
39
+ 1920
40
+ ],
41
+ [
42
+ 384,
43
+ 2304
44
+ ],
45
+ [
46
+ 384,
47
+ 2688
48
+ ],
49
+ [
50
+ 384,
51
+ 3072
52
+ ],
53
+ [
54
+ 384,
55
+ 3456
56
+ ],
57
+ [
58
+ 384,
59
+ 3840
60
+ ],
61
+ [
62
+ 768,
63
+ 384
64
+ ],
65
+ [
66
+ 768,
67
+ 768
68
+ ],
69
+ [
70
+ 768,
71
+ 1152
72
+ ],
73
+ [
74
+ 768,
75
+ 1536
76
+ ],
77
+ [
78
+ 768,
79
+ 1920
80
+ ],
81
+ [
82
+ 1152,
83
+ 384
84
+ ],
85
+ [
86
+ 1152,
87
+ 768
88
+ ],
89
+ [
90
+ 1152,
91
+ 1152
92
+ ],
93
+ [
94
+ 1536,
95
+ 384
96
+ ],
97
+ [
98
+ 1536,
99
+ 768
100
+ ],
101
+ [
102
+ 1920,
103
+ 384
104
+ ],
105
+ [
106
+ 1920,
107
+ 768
108
+ ],
109
+ [
110
+ 2304,
111
+ 384
112
+ ],
113
+ [
114
+ 2688,
115
+ 384
116
+ ],
117
+ [
118
+ 3072,
119
+ 384
120
+ ],
121
+ [
122
+ 3456,
123
+ 384
124
+ ],
125
+ [
126
+ 3840,
127
+ 384
128
+ ]
129
+ ],
130
+ "image_mean": [
131
+ 0.5,
132
+ 0.5,
133
+ 0.5
134
+ ],
135
+ "image_processor_type": "LlavaNextImageProcessor",
136
+ "image_std": [
137
+ 0.5,
138
+ 0.5,
139
+ 0.5
140
+ ],
141
+ "resample": 3,
142
+ "rescale_factor": 0.00392156862745098,
143
+ "size": {
144
+ "height": 384,
145
+ "width": 384
146
+ }
147
+ },
148
+ "image_token": "<image>",
149
+ "num_additional_image_tokens": 0,
150
+ "patch_size": 16,
151
+ "processor_class": "Granite4VisionProcessor",
152
+ "vision_feature_select_strategy": "full"
153
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "auto_map": {
4
+ "AutoProcessor": "processing.Granite4VisionProcessor"
5
+ },
6
+ "backend": "tokenizers",
7
+ "bos_token": "<|end_of_text|>",
8
+ "clean_up_tokenization_spaces": false,
9
+ "eos_token": "<|end_of_text|>",
10
+ "errors": "replace",
11
+ "is_local": true,
12
+ "local_files_only": false,
13
+ "model_max_length": 1000000000000000019884624838656,
14
+ "pad_token": "<|pad|>",
15
+ "padding_side": "left",
16
+ "processor_class": "Granite4VisionProcessor",
17
+ "tokenizer_class": "GPT2Tokenizer",
18
+ "unk_token": "<|unk|>"
19
+ }