Instructions to use iLearn-Lab/CVPRW26-ChartLens with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use iLearn-Lab/CVPRW26-ChartLens with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("/data2/caoruping/DataMFM/models/granite-vision-4.1-4b") model = PeftModel.from_pretrained(base_model, "iLearn-Lab/CVPRW26-ChartLens") - Notebooks
- Google Colab
- Kaggle
Upload 8 files
Browse files- README.md +231 -0
- adapter_config.json +56 -0
- adapter_model.safetensors +3 -0
- chat_template.jinja +180 -0
- processing.py +54 -0
- processor_config.json +153 -0
- tokenizer.json +0 -0
- tokenizer_config.json +19 -0
README.md
CHANGED
|
@@ -1,3 +1,234 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
+
library_name: pytorch
|
| 4 |
+
tags:
|
| 5 |
+
- pytorch
|
| 6 |
+
- computer-vision
|
| 7 |
+
- multimodal
|
| 8 |
+
- chart-understanding
|
| 9 |
+
- data-extraction
|
| 10 |
+
- summarization
|
| 11 |
+
- cvpr-2026
|
| 12 |
---
|
| 13 |
+
|
| 14 |
+
<a id="top"></a>
|
| 15 |
+
<div align="center">
|
| 16 |
+
<h1>🚀 ChartLens @ CVPR 2026 DataMFM Chart Understanding Challenge</h1>
|
| 17 |
+
|
| 18 |
+
<p>
|
| 19 |
+
<b>Hao Liu</b><sup>1</sup>
|
| 20 |
+
<b>Ruping Cao</b><sup>1</sup>
|
| 21 |
+
<b>Kun Wang</b><sup>1</sup>
|
| 22 |
+
<b>Zhiran Li</b><sup>1</sup>
|
| 23 |
+
<b>Fan Liu</b><sup>2</sup>
|
| 24 |
+
<b>Yupeng Hu</b><sup>1</sup>
|
| 25 |
+
<b>Liqiang Nie</b><sup>3</sup>
|
| 26 |
+
</p>
|
| 27 |
+
|
| 28 |
+
<p>
|
| 29 |
+
<sup>1</sup>Shandong University<br>
|
| 30 |
+
<sup>2</sup>Southeast University<br>
|
| 31 |
+
<sup>3</sup>Harbin Institute of Technology (Shenzhen)
|
| 32 |
+
</p>
|
| 33 |
+
</div>
|
| 34 |
+
|
| 35 |
+
These are the official implementation resources, model weights, and prediction files for **ChartLens**, our champion solution for **DataMFM Challenge Track 2: Chart Understanding** at CVPR 2026.
|
| 36 |
+
|
| 37 |
+
🔗 **Paper:** [Arxiv](https://arxiv.org/pdf/2606.10640)
|
| 38 |
+
🔗 **GitHub Repository:** [iLearnLab/CVPRW26-ChartLens](https://github.com/iLearnLab/CVPRW26-ChartLens)
|
| 39 |
+
🔗 **Challenge Page:** [DataMFM Challenge](https://datamfm.github.io/challenge.html)
|
| 40 |
+
|
| 41 |
+
---
|
| 42 |
+
|
| 43 |
+
## 📌 Model Information
|
| 44 |
+
|
| 45 |
+
### 1. Model Name
|
| 46 |
+
**ChartLens: A Dual-Branch Framework for Chart Data Correction and Factual Summary Refinement**
|
| 47 |
+
|
| 48 |
+
### 2. Task Type & Applicable Tasks
|
| 49 |
+
- **Task Type:** Chart Understanding / Multimodal Document Understanding
|
| 50 |
+
- **Applicable Tasks:** Chart-to-CSV extraction and chart-to-summary generation from chart images.
|
| 51 |
+
|
| 52 |
+
### 3. Project Introduction
|
| 53 |
+
Chart understanding requires models to recover structured chart data and generate faithful natural-language summaries from chart images. **ChartLens** addresses these complementary goals with a dual-branch, verification-guided correction framework.
|
| 54 |
+
|
| 55 |
+
> 💡 **Method Highlight:** ChartLens combines Granite-Vision-4.1-4B LoRA adaptation with two correction branches: **Structure-Aware CSV Verification and Correction (SAVC)** for reliable table recovery, and **Text-Retention-Guided Summary Refinement (TRSR)** for OCR-assisted factual summary repair. SAVC checks structure, completeness, and numerical accuracy, while TRSR preserves visible chart text such as titles, legends, annotations, sources, and numerical evidence.
|
| 56 |
+
|
| 57 |
+
### 4. Training Data Source
|
| 58 |
+
- Released ChartNet-based training data for LoRA adaptation.
|
| 59 |
+
- DataMFM Challenge chart understanding splits, including `real` and `synthetic` chart images.
|
| 60 |
+
|
| 61 |
+
### 5. Challenge Results
|
| 62 |
+
|
| 63 |
+
| Method | CSV Numeric F1 | CSV Structural Score | Summary ROUGE-L | Summary Numeric Fact F1 | Overall |
|
| 64 |
+
|--------|---------------:|---------------------:|----------------:|------------------------:|--------:|
|
| 65 |
+
| **ChartLens (Ours)** | **80.62** | **75.66** | **45.57** | **74.55** | **69.10** |
|
| 66 |
+
|
| 67 |
+
ChartLens ranked **1st place** on DataMFM Challenge Track 2.
|
| 68 |
+
|
| 69 |
+
---
|
| 70 |
+
|
| 71 |
+
## 🚀 Usage & Basic Inference
|
| 72 |
+
|
| 73 |
+
### Step 1: Prepare the Environment
|
| 74 |
+
|
| 75 |
+
Clone the GitHub repository and set up the Conda environment:
|
| 76 |
+
|
| 77 |
+
```bash
|
| 78 |
+
git clone https://github.com/iLearnLab/CVPRW26-ChartLens.git
|
| 79 |
+
cd CVPRW26-ChartLens
|
| 80 |
+
```
|
| 81 |
+
|
| 82 |
+
```bash
|
| 83 |
+
conda create -n chartlens python=3.10 -y
|
| 84 |
+
conda activate chartlens
|
| 85 |
+
pip install -r requirements.txt
|
| 86 |
+
```
|
| 87 |
+
|
| 88 |
+
### Step 2: Data & Weights Preparation
|
| 89 |
+
|
| 90 |
+
1. **Challenge Data:** Use the datasets and splits released by the [DataMFM Challenge](https://datamfm.github.io/challenge.html). The chart understanding track contains `real` and `synthetic` splits.
|
| 91 |
+
2. **ChartLens Checkpoints:** Download the model weights from this Hugging Face repository.
|
| 92 |
+
3. **Granite Vision Backbone:** Prepare the Granite-Vision-4.1-4B backbone and update the local `--model_path` argument when running inference.
|
| 93 |
+
|
| 94 |
+
To prepare ChartNet SFT data for LoRA training:
|
| 95 |
+
|
| 96 |
+
```bash
|
| 97 |
+
python code/load_chartnet_500.py \
|
| 98 |
+
--out_dir Fine-tuning/Dataset/raw \
|
| 99 |
+
--num_samples 500
|
| 100 |
+
|
| 101 |
+
python code/build_chartnet_sft.py \
|
| 102 |
+
--gt_path Fine-tuning/Dataset/raw/gt.jsonl \
|
| 103 |
+
--image_dir Fine-tuning/Dataset/raw/images \
|
| 104 |
+
--out_dir Fine-tuning/Dataset/sft \
|
| 105 |
+
--csv_repeat 2 \
|
| 106 |
+
--summary_repeat 1
|
| 107 |
+
```
|
| 108 |
+
|
| 109 |
+
### Step 3: Run Granite Vision + LoRA Inference
|
| 110 |
+
|
| 111 |
+
```bash
|
| 112 |
+
python code/infer_granite_with_lora.py \
|
| 113 |
+
--image_root /path/to/data \
|
| 114 |
+
--out_root /path/to/output \
|
| 115 |
+
--model_path /path/to/granite-vision-4.1-4b \
|
| 116 |
+
--lora_path /path/to/chartlens_lora \
|
| 117 |
+
--gpu_id 0 \
|
| 118 |
+
--splits real synthetic
|
| 119 |
+
```
|
| 120 |
+
|
| 121 |
+
Use `code/infer_chartnet_granite.py` for base Granite Vision inference without a LoRA adapter.
|
| 122 |
+
|
| 123 |
+
### Step 4: SAVC CSV Correction
|
| 124 |
+
|
| 125 |
+
```bash
|
| 126 |
+
export OPENAI_API_KEY="..."
|
| 127 |
+
|
| 128 |
+
python code/calibrate_baseline_with_ai.py \
|
| 129 |
+
--split all \
|
| 130 |
+
--baseline_root /path/to/baseline_predictions \
|
| 131 |
+
--image_root /path/to/data \
|
| 132 |
+
--output_root /path/to/savc_output \
|
| 133 |
+
--base_url "https://your-openai-compatible-endpoint" \
|
| 134 |
+
--model gemini-3.5-flash \
|
| 135 |
+
--threshold 85
|
| 136 |
+
```
|
| 137 |
+
|
| 138 |
+
`--baseline_root` should contain split directories such as `real/` and `synthetic/`, each with `chart2csv_predictions.jsonl` and `chart2summary_predictions.jsonl`.
|
| 139 |
+
|
| 140 |
+
### Step 5: TRSR Summary Refinement
|
| 141 |
+
|
| 142 |
+
```bash
|
| 143 |
+
python code/ocr.py \
|
| 144 |
+
--real_images /path/to/data/real/images \
|
| 145 |
+
--synthetic_images /path/to/data/synthetic/images \
|
| 146 |
+
--real_summary /path/to/baseline/real/chart2summary_predictions.jsonl \
|
| 147 |
+
--synthetic_summary /path/to/baseline/synthetic/chart2summary_predictions.jsonl \
|
| 148 |
+
--output_dir /path/to/ocr_text_copy_coverage \
|
| 149 |
+
--threshold 0.8
|
| 150 |
+
|
| 151 |
+
export AIGCBEST_API_KEY="..."
|
| 152 |
+
|
| 153 |
+
python code/repair_summary.py \
|
| 154 |
+
--split all \
|
| 155 |
+
--workers 20 \
|
| 156 |
+
--ocr_eval_root /path/to/ocr_text_copy_coverage \
|
| 157 |
+
--output_root /path/to/trsr_output
|
| 158 |
+
```
|
| 159 |
+
|
| 160 |
+
### Step 6: Training (Optional)
|
| 161 |
+
|
| 162 |
+
Train the LoRA adapter on the prepared ChartNet SFT data:
|
| 163 |
+
|
| 164 |
+
```bash
|
| 165 |
+
python code/train_lora_chartnet.py \
|
| 166 |
+
--model_path /path/to/granite-vision-4.1-4b \
|
| 167 |
+
--train_jsonl Fine-tuning/Dataset/sft/train.jsonl \
|
| 168 |
+
--val_jsonl Fine-tuning/Dataset/sft/val.jsonl \
|
| 169 |
+
--output_dir Fine-tuning/FT/model/granite_chartnet_lora_bs2 \
|
| 170 |
+
--gpu_id 0 \
|
| 171 |
+
--epochs 2 \
|
| 172 |
+
--batch_size 1 \
|
| 173 |
+
--grad_accum 8
|
| 174 |
+
```
|
| 175 |
+
|
| 176 |
+
---
|
| 177 |
+
|
| 178 |
+
## 📦 Submission Format
|
| 179 |
+
|
| 180 |
+
For DataMFM Track 2, organize the final predictions as:
|
| 181 |
+
|
| 182 |
+
```bash
|
| 183 |
+
submission.zip
|
| 184 |
+
├── real/
|
| 185 |
+
│ ├── chart2csv_predictions.jsonl
|
| 186 |
+
│ └── chart2summary_predictions.jsonl
|
| 187 |
+
└── synthetic/
|
| 188 |
+
├── chart2csv_predictions.jsonl
|
| 189 |
+
└── chart2summary_predictions.jsonl
|
| 190 |
+
```
|
| 191 |
+
|
| 192 |
+
Each CSV prediction line:
|
| 193 |
+
|
| 194 |
+
```json
|
| 195 |
+
{"imagename": "example.png", "predicted_csv": "Header A,Header B\nA,1\nB,2"}
|
| 196 |
+
```
|
| 197 |
+
|
| 198 |
+
Each summary prediction line:
|
| 199 |
+
|
| 200 |
+
```json
|
| 201 |
+
{"imagename": "example.png", "predicted_summary": "One paragraph summary grounded in the chart."}
|
| 202 |
+
```
|
| 203 |
+
|
| 204 |
+
---
|
| 205 |
+
|
| 206 |
+
## ⚠️ Limitations & Notes
|
| 207 |
+
|
| 208 |
+
**Disclaimer:** This framework and its model weights are intended for **academic research purposes only**.
|
| 209 |
+
|
| 210 |
+
- Chart-to-CSV extraction may still struggle with dense layouts, asymmetric legends, or adjacent semantic-column misalignment.
|
| 211 |
+
- Summary refinement depends on OCR quality; OCR errors can affect text-retention scoring and repair decisions.
|
| 212 |
+
- GPU execution is expected for Granite Vision inference and LoRA training.
|
| 213 |
+
- API-backed correction scripts require valid credentials and an OpenAI-compatible endpoint.
|
| 214 |
+
|
| 215 |
+
---
|
| 216 |
+
|
| 217 |
+
## 🤝 Acknowledgements & Contact
|
| 218 |
+
|
| 219 |
+
- **Contact:** If you have any questions or encounter issues, feel free to contact Hao Liu at liuh90210@gmail.com or Ruping Cao at caoruping657@gmail.com.
|
| 220 |
+
|
| 221 |
+
---
|
| 222 |
+
|
| 223 |
+
## 📝⭐️ Citation
|
| 224 |
+
|
| 225 |
+
If you find this project useful for your research, please consider citing:
|
| 226 |
+
|
| 227 |
+
```bibtex
|
| 228 |
+
@article{liu2026chartlens,
|
| 229 |
+
title={ChartLens: A Dual-Branch Framework for Chart Data Correction and Factual Summary Refinement},
|
| 230 |
+
author={Liu, Hao and Cao, Ruping and Wang, Kun and Li, Zhiran and Liu, Fan and Hu, Yupeng and Nie, Liqiang},
|
| 231 |
+
journal={arXiv preprint arXiv:2606.10640},
|
| 232 |
+
year={2026}
|
| 233 |
+
}
|
| 234 |
+
```
|
adapter_config.json
ADDED
|
@@ -0,0 +1,56 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"alora_invocation_tokens": null,
|
| 3 |
+
"alpha_pattern": {},
|
| 4 |
+
"arrow_config": null,
|
| 5 |
+
"auto_mapping": null,
|
| 6 |
+
"base_model_name_or_path": "/data2/caoruping/DataMFM/models/granite-vision-4.1-4b",
|
| 7 |
+
"bias": "none",
|
| 8 |
+
"corda_config": null,
|
| 9 |
+
"ensure_weight_tying": false,
|
| 10 |
+
"eva_config": null,
|
| 11 |
+
"exclude_modules": null,
|
| 12 |
+
"fan_in_fan_out": false,
|
| 13 |
+
"inference_mode": true,
|
| 14 |
+
"init_lora_weights": true,
|
| 15 |
+
"layer_replication": null,
|
| 16 |
+
"layers_pattern": null,
|
| 17 |
+
"layers_to_transform": null,
|
| 18 |
+
"loftq_config": {},
|
| 19 |
+
"lora_alpha": 32,
|
| 20 |
+
"lora_bias": false,
|
| 21 |
+
"lora_dropout": 0.05,
|
| 22 |
+
"lora_ga_config": null,
|
| 23 |
+
"megatron_config": null,
|
| 24 |
+
"megatron_core": "megatron.core",
|
| 25 |
+
"modules_to_save": null,
|
| 26 |
+
"peft_type": "LORA",
|
| 27 |
+
"peft_version": "0.19.1",
|
| 28 |
+
"qalora_group_size": 16,
|
| 29 |
+
"r": 16,
|
| 30 |
+
"rank_pattern": {},
|
| 31 |
+
"revision": null,
|
| 32 |
+
"target_modules": [
|
| 33 |
+
"down_proj",
|
| 34 |
+
"query",
|
| 35 |
+
"out_linear",
|
| 36 |
+
"up_proj",
|
| 37 |
+
"gate_proj",
|
| 38 |
+
"key",
|
| 39 |
+
"fc2",
|
| 40 |
+
"dense",
|
| 41 |
+
"out_proj",
|
| 42 |
+
"q_proj",
|
| 43 |
+
"o_proj",
|
| 44 |
+
"v_proj",
|
| 45 |
+
"value",
|
| 46 |
+
"k_proj",
|
| 47 |
+
"fc1"
|
| 48 |
+
],
|
| 49 |
+
"target_parameters": null,
|
| 50 |
+
"task_type": "CAUSAL_LM",
|
| 51 |
+
"trainable_token_indices": null,
|
| 52 |
+
"use_bdlora": null,
|
| 53 |
+
"use_dora": false,
|
| 54 |
+
"use_qalora": false,
|
| 55 |
+
"use_rslora": false
|
| 56 |
+
}
|
adapter_model.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:fb26025f6a6e8fbcb91583aae8819aa2f7ca8a002f61bd784fd020262694507a
|
| 3 |
+
size 175974064
|
chat_template.jinja
ADDED
|
@@ -0,0 +1,180 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{#- ===== Task tag prompt constants ===== -#}
|
| 2 |
+
{%- set chart2code_prompt = "Generate code that recreates the chart as best as possible." -%}
|
| 3 |
+
{%- set chart2csv_prompt = "Please examine this chart image. Consider you are a data visualization expert, and extract the data into a CSV table.\n\nYour CSV should:\n- Include a header row with clear column names\n- Represent all data series/categories shown in the chart\n- Use numeric values that match the chart as closely as possible\n\nOutput only the CSV data, nothing else." -%}
|
| 4 |
+
{%- set chart2summary_prompt = "Can you describe this chart image?" -%}
|
| 5 |
+
{%- set tables_json_prompt = "Identify and extract the table schema\n Extract the schema of all the tables in the image sorted according to the reading order.\nThe output must be a valid JSON object containing a list of dictionaries with the following structure:\n\n {\n \"dimensions\": {\n \"rows\": <number of data rows (excluding header rows)>,\n \"columns\": <number of columns>,\n \"header_rows\": <number of header rows>,\n \"total_rows\": <total number of rows including headers>\n },\n \"cells\": [\n {\n \"row\": <row index starting at 1>,\n \"col\": <column index starting at 1>,\n \"colspan\": <number of columns spanned>,\n \"rowspan\": <number of rows spanned>,\n \"type\": \"<'header' or 'data'>\",\n \"header_level\": <header nesting level if type=header, else omit or null>,\n \"content\": \"<string content of the cell>\"\n },\n ...\n ]\n }" -%}
|
| 6 |
+
{%- set tables_html_prompt = "Identify and extract the table schema\n Extract the schema of all the tables in the image sorted according to the reading order.\nThe output must be a list of valid HTML tables" -%}
|
| 7 |
+
{%- set tables_otsl_prompt = "Identify and extract the table schema\n Extract the schema of all the tables in the image sorted according to the reading order.\nThe output must be a list of valid OTSL objects, each consists of the following fields: \n <fcel> - a cell with content in it\n <ecel> - an empty cell\n <lcel> - a cell that is merged with the cell to its left\n <ucel> - a cell that is merged with the cell above it\n <xcel> - a cell that is merged with both the cell above it and the cell to its left\n <nl> - a new line\n <ched> - a column header\n <otsl> - the beginning of the OTSL table\n </otsl> - the end of the OTSL table\n\n An example for an output:\n [\n <otsl><ched>first table header1<ched>first table header2<nl><fcel>data1<fcel>data2<nl><fcel>data with horizontal span<lcel><nl><fcel>data with vertical span<ecel><nl><ucel><fcel>data3<nl></otsl>,\n <otsl><ched>second table header1<ched>second table header2<nl><fcel>data1<fcel>data2<nl><fcel>data with horizontal span<lcel><nl><fcel>data with vertical span<ecel><nl><ucel><fcel>data3<nl></otsl>\n ]" -%}
|
| 8 |
+
|
| 9 |
+
|
| 10 |
+
{#- ===== Tag expansion dispatcher ===== -#}
|
| 11 |
+
{%- macro expand_tags(text) -%}
|
| 12 |
+
{%- set has_image = "<image>" in text -%}
|
| 13 |
+
{#- Determine image position: prefix if <image> appears before the tag, suffix if after -#}
|
| 14 |
+
{%- if has_image -%}
|
| 15 |
+
{%- set img_idx = text.index("<image>") -%}
|
| 16 |
+
{%- if "<chart2code>" in text -%}{%- set tag_idx = text.index("<chart2code>") -%}
|
| 17 |
+
{%- elif "<chart2csv>" in text -%}{%- set tag_idx = text.index("<chart2csv>") -%}
|
| 18 |
+
{%- elif "<chart2summary>" in text -%}{%- set tag_idx = text.index("<chart2summary>") -%}
|
| 19 |
+
{%- elif "<tables_json>" in text -%}{%- set tag_idx = text.index("<tables_json>") -%}
|
| 20 |
+
{%- elif "<tables_html>" in text -%}{%- set tag_idx = text.index("<tables_html>") -%}
|
| 21 |
+
{%- elif "<tables_otsl>" in text -%}{%- set tag_idx = text.index("<tables_otsl>") -%}
|
| 22 |
+
{%- else -%}{%- set tag_idx = 999999 -%}
|
| 23 |
+
{%- endif -%}
|
| 24 |
+
{%- set img_prefix = "<image>\n" if img_idx < tag_idx else "" -%}
|
| 25 |
+
{%- set img_suffix = "<image>\n" if img_idx >= tag_idx else "" -%}
|
| 26 |
+
{%- else -%}
|
| 27 |
+
{%- set img_prefix = "" -%}
|
| 28 |
+
{%- set img_suffix = "" -%}
|
| 29 |
+
{%- endif -%}
|
| 30 |
+
{%- if "<chart2code>" in text -%}
|
| 31 |
+
{{- img_prefix + chart2code_prompt + img_suffix -}}
|
| 32 |
+
{%- elif "<chart2csv>" in text -%}
|
| 33 |
+
{{- img_prefix + chart2csv_prompt + img_suffix -}}
|
| 34 |
+
{%- elif "<chart2summary>" in text -%}
|
| 35 |
+
{{- img_prefix + chart2summary_prompt + img_suffix -}}
|
| 36 |
+
{%- elif "<tables_json>" in text -%}
|
| 37 |
+
{{- img_prefix + tables_json_prompt + img_suffix -}}
|
| 38 |
+
{%- elif "<tables_html>" in text -%}
|
| 39 |
+
{{- img_prefix + tables_html_prompt + img_suffix -}}
|
| 40 |
+
{%- elif "<tables_otsl>" in text -%}
|
| 41 |
+
{{- img_prefix + tables_otsl_prompt + img_suffix -}}
|
| 42 |
+
{%- else -%}
|
| 43 |
+
{{- text -}}
|
| 44 |
+
{%- endif -%}
|
| 45 |
+
{%- endmacro -%}
|
| 46 |
+
|
| 47 |
+
{#- ===== Original chat template ===== -#}
|
| 48 |
+
{% macro render_content(x) %}
|
| 49 |
+
{%- if x is string %}
|
| 50 |
+
{{ x }}
|
| 51 |
+
{%- else %}
|
| 52 |
+
{%- for chunk in x %}
|
| 53 |
+
{%- if chunk['type'] == 'text' -%}
|
| 54 |
+
{{ chunk['text']}}
|
| 55 |
+
{%- elif chunk['type'] == 'image' -%}
|
| 56 |
+
{{- "<image>
|
| 57 |
+
" }}
|
| 58 |
+
{%- endif -%}
|
| 59 |
+
{%- endfor -%}
|
| 60 |
+
{%- endif -%}
|
| 61 |
+
{% endmacro %}
|
| 62 |
+
|
| 63 |
+
{%- set tools_system_message_prefix = 'You are a helpful assistant with access to the following tools. You may call one or more tools to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>' %}
|
| 64 |
+
{%- set tools_system_message_suffix = '\n</tools>\n\nFor each tool call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call>. If a tool does not exist in the provided list of tools, notify the user that you do not have the ability to fulfill the request.' %}
|
| 65 |
+
{%- set documents_system_message_prefix = 'You are a helpful assistant with access to the following documents. You may use one or more documents to assist with the user query.\n\nYou are given a list of documents within <documents></documents> XML tags:\n<documents>' %}
|
| 66 |
+
{%- set documents_system_message_suffix = '\n</documents>\n\nWrite the response to the user\'s input by strictly aligning with the facts in the provided documents. If the information needed to answer the question is not available in the documents, inform the user that the question cannot be answered based on the available data.' %}
|
| 67 |
+
{%- set g4_default_system_message = 'You are a helpful assistant. Please ensure responses are professional, accurate, and safe.' %}
|
| 68 |
+
{%- if available_tools is defined and available_tools %}
|
| 69 |
+
{%- set tools = available_tools %}
|
| 70 |
+
{%- endif %}
|
| 71 |
+
{%- set ns = namespace(tools_system_message=tools_system_message_prefix,
|
| 72 |
+
documents_system_message=documents_system_message_prefix,
|
| 73 |
+
default_system_message=g4_default_system_message,
|
| 74 |
+
system_message=''
|
| 75 |
+
) %}
|
| 76 |
+
{%- if tools %}
|
| 77 |
+
{%- for tool in tools %}
|
| 78 |
+
{%- set ns.tools_system_message = ns.tools_system_message + '\n' + (tool | tojson) %}
|
| 79 |
+
{%- endfor %}
|
| 80 |
+
{%- set ns.tools_system_message = ns.tools_system_message + tools_system_message_suffix %}
|
| 81 |
+
{%- else %}
|
| 82 |
+
{%- set ns.tools_system_message = '' %}
|
| 83 |
+
{%- endif %}
|
| 84 |
+
{%- if documents %}
|
| 85 |
+
{%- for document in documents %}
|
| 86 |
+
{%- set ns.documents_system_message = ns.documents_system_message + '\n' + (document | tojson) %}
|
| 87 |
+
{%- endfor %}
|
| 88 |
+
{%- set ns.documents_system_message = ns.documents_system_message + documents_system_message_suffix %}
|
| 89 |
+
{%- else %}
|
| 90 |
+
{%- set ns.documents_system_message = '' %}
|
| 91 |
+
{%- endif %}
|
| 92 |
+
{%- if messages[0].role == 'system' %}
|
| 93 |
+
{%- if messages[0].content is string %}
|
| 94 |
+
{%- set ns.system_message = messages[0].content %}
|
| 95 |
+
{%- elif messages[0].content is iterable %}
|
| 96 |
+
{%- for entry in messages[0].content %}
|
| 97 |
+
{%- if entry.type== 'text' %}
|
| 98 |
+
{%- if ns.system_message != '' %}
|
| 99 |
+
{%- set ns.system_message = ns.system_message + '\n' %}
|
| 100 |
+
{%- endif %}
|
| 101 |
+
{%- set ns.system_message = ns.system_message + entry.text %}
|
| 102 |
+
{%- endif %}
|
| 103 |
+
{%- endfor %}
|
| 104 |
+
{%- endif %}
|
| 105 |
+
{%- if tools and documents %}
|
| 106 |
+
{%- set ns.system_message = ns.system_message + '\n\n' + ns.tools_system_message + '\n\n' + ns.documents_system_message %}
|
| 107 |
+
{%- elif tools %}
|
| 108 |
+
{%- set ns.system_message = ns.system_message + '\n\n' + ns.tools_system_message %}
|
| 109 |
+
{%- elif documents %}
|
| 110 |
+
{%- set ns.system_message = ns.system_message + '\n\n' + ns.documents_system_message %}
|
| 111 |
+
{%- endif %}
|
| 112 |
+
{%- else %}
|
| 113 |
+
{%- if tools and documents %}
|
| 114 |
+
{%- set ns.system_message = ns.tools_system_message + '\n\n' + ns.documents_system_message %}
|
| 115 |
+
{%- elif tools %}
|
| 116 |
+
{%- set ns.system_message = ns.tools_system_message %}
|
| 117 |
+
{%- elif documents %}
|
| 118 |
+
{%- set ns.system_message = ns.documents_system_message %}
|
| 119 |
+
{%- endif %}
|
| 120 |
+
{%- endif %}
|
| 121 |
+
{%- if ns.system_message %}
|
| 122 |
+
{{- '<|start_of_role|>system<|end_of_role|>' + ns.system_message + '<|end_of_text|>\n' }}
|
| 123 |
+
{%- else %}
|
| 124 |
+
{{- '<|start_of_role|>system<|end_of_role|>' + ns.default_system_message + '<|end_of_text|>\n' }}
|
| 125 |
+
{%- endif %}
|
| 126 |
+
{%- for message in messages %}
|
| 127 |
+
{%- set content = namespace(val='') %}
|
| 128 |
+
{%- if render_content(message['content']) is string %}
|
| 129 |
+
{%- set content.val = render_content(message['content']) %}
|
| 130 |
+
{%- else %}
|
| 131 |
+
{%- if render_content(message['content']) is iterable %}
|
| 132 |
+
{%- for entry in render_content(message['content']) %}
|
| 133 |
+
{%- if entry.type== 'text' %}
|
| 134 |
+
{%- if content.val != '' %}
|
| 135 |
+
{%- set content.val = content.val + '\n' %}
|
| 136 |
+
{%- endif %}
|
| 137 |
+
{%- set content.val = content.val + entry.text %}
|
| 138 |
+
{%- endif %}
|
| 139 |
+
{%- endfor %}
|
| 140 |
+
{%- endif %}
|
| 141 |
+
{%- endif %}
|
| 142 |
+
{%- if (message.role == 'user') or (message.role == 'system' and not loop.first) %}
|
| 143 |
+
{{- '<|start_of_role|>' + message.role + '<|end_of_role|>' + expand_tags(content.val) + '<|end_of_text|>\n' }}
|
| 144 |
+
{%- elif message.role == 'assistant' %}
|
| 145 |
+
{{- '<|start_of_role|>' + message.role + '<|end_of_role|>' + content.val }}
|
| 146 |
+
{%- if message.tool_calls %}
|
| 147 |
+
{%- for tool_call in message.tool_calls %}
|
| 148 |
+
{%- if (loop.first and content.val) or (not loop.first) %}
|
| 149 |
+
{{- '\n' }}
|
| 150 |
+
{%- endif %}
|
| 151 |
+
{%- if tool_call.function %}
|
| 152 |
+
{%- set tool_call = tool_call.function %}
|
| 153 |
+
{%- endif %}
|
| 154 |
+
{{- '<tool_call>\n{"name": "' }}
|
| 155 |
+
{{- tool_call.name }}
|
| 156 |
+
{{- '", "arguments": ' }}
|
| 157 |
+
{%- if tool_call.arguments is string %}
|
| 158 |
+
{{- tool_call.arguments }}
|
| 159 |
+
{%- else %}
|
| 160 |
+
{{- tool_call.arguments | tojson }}
|
| 161 |
+
{%- endif %}
|
| 162 |
+
{{- '}\n</tool_call>' }}
|
| 163 |
+
{%- endfor %}
|
| 164 |
+
{%- endif %}
|
| 165 |
+
{{- '<|end_of_text|>\n' }}
|
| 166 |
+
{%- elif message.role == 'tool' %}
|
| 167 |
+
{%- if loop.first or (messages[loop.index0 - 1].role != 'tool') %}
|
| 168 |
+
{{- '<|start_of_role|>user<|end_of_role|>' }}
|
| 169 |
+
{%- endif %}
|
| 170 |
+
{{- '\n<tool_response>\n' }}
|
| 171 |
+
{{- content.val }}
|
| 172 |
+
{{- '\n</tool_response>' }}
|
| 173 |
+
{%- if loop.last or (messages[loop.index0 + 1].role != 'tool') %}
|
| 174 |
+
{{- '<|end_of_text|>\n' }}
|
| 175 |
+
{%- endif %}
|
| 176 |
+
{%- endif %}
|
| 177 |
+
{%- endfor %}
|
| 178 |
+
{%- if add_generation_prompt %}
|
| 179 |
+
{{- '<|start_of_role|>assistant<|end_of_role|>' }}
|
| 180 |
+
{%- endif %}
|
processing.py
ADDED
|
@@ -0,0 +1,54 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from fractions import Fraction
|
| 2 |
+
|
| 3 |
+
from transformers import LlavaNextProcessor
|
| 4 |
+
from transformers.image_processing_utils import select_best_resolution
|
| 5 |
+
|
| 6 |
+
|
| 7 |
+
|
| 8 |
+
class Granite4VisionProcessor(LlavaNextProcessor):
|
| 9 |
+
model_type = "granite4_vision"
|
| 10 |
+
|
| 11 |
+
def __init__(
|
| 12 |
+
self,
|
| 13 |
+
image_processor=None,
|
| 14 |
+
tokenizer=None,
|
| 15 |
+
patch_size=None,
|
| 16 |
+
vision_feature_select_strategy=None,
|
| 17 |
+
chat_template=None,
|
| 18 |
+
image_token="<image>", # set the default and let users change if they have peculiar special tokens in rare cases
|
| 19 |
+
num_additional_image_tokens=0,
|
| 20 |
+
downsample_rate=None,
|
| 21 |
+
**kwargs,
|
| 22 |
+
):
|
| 23 |
+
super().__init__(image_processor=image_processor,
|
| 24 |
+
tokenizer=tokenizer,
|
| 25 |
+
patch_size=patch_size,
|
| 26 |
+
vision_feature_select_strategy=vision_feature_select_strategy,
|
| 27 |
+
chat_template=chat_template,
|
| 28 |
+
image_token=image_token,
|
| 29 |
+
num_additional_image_tokens=num_additional_image_tokens,
|
| 30 |
+
)
|
| 31 |
+
self.downsample_rate = downsample_rate
|
| 32 |
+
|
| 33 |
+
def _get_number_of_features(self, orig_height: int, orig_width: int, height: int, width: int) -> int:
|
| 34 |
+
image_grid_pinpoints = self.image_processor.image_grid_pinpoints
|
| 35 |
+
|
| 36 |
+
height_best_resolution, width_best_resolution = select_best_resolution(
|
| 37 |
+
[orig_height, orig_width], image_grid_pinpoints
|
| 38 |
+
)
|
| 39 |
+
scale_height, scale_width = height_best_resolution // height, width_best_resolution // width
|
| 40 |
+
|
| 41 |
+
patches_height = height // self.patch_size
|
| 42 |
+
patches_width = width // self.patch_size
|
| 43 |
+
if self.downsample_rate is not None:
|
| 44 |
+
ds_rate = Fraction(self.downsample_rate)
|
| 45 |
+
patches_height = int(patches_height * ds_rate)
|
| 46 |
+
patches_width = int(patches_width * ds_rate)
|
| 47 |
+
|
| 48 |
+
unpadded_features, newline_features = self._get_unpadded_features(
|
| 49 |
+
orig_height, orig_width, patches_height, patches_width, scale_height, scale_width
|
| 50 |
+
)
|
| 51 |
+
# The base patch covers the entire image (+1 for the CLS)
|
| 52 |
+
base_features = patches_height * patches_width + self.num_additional_image_tokens
|
| 53 |
+
num_image_tokens = unpadded_features + newline_features + base_features
|
| 54 |
+
return num_image_tokens
|
processor_config.json
ADDED
|
@@ -0,0 +1,153 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"auto_map": {
|
| 3 |
+
"AutoProcessor": "processing.Granite4VisionProcessor"
|
| 4 |
+
},
|
| 5 |
+
"downsample_rate": "4/8",
|
| 6 |
+
"image_processor": {
|
| 7 |
+
"auto_map": {
|
| 8 |
+
"AutoProcessor": "processing.Granite4VisionProcessor"
|
| 9 |
+
},
|
| 10 |
+
"crop_size": {
|
| 11 |
+
"height": 384,
|
| 12 |
+
"width": 384
|
| 13 |
+
},
|
| 14 |
+
"do_center_crop": true,
|
| 15 |
+
"do_convert_rgb": true,
|
| 16 |
+
"do_normalize": true,
|
| 17 |
+
"do_pad": true,
|
| 18 |
+
"do_rescale": true,
|
| 19 |
+
"do_resize": true,
|
| 20 |
+
"image_grid_pinpoints": [
|
| 21 |
+
[
|
| 22 |
+
384,
|
| 23 |
+
384
|
| 24 |
+
],
|
| 25 |
+
[
|
| 26 |
+
384,
|
| 27 |
+
768
|
| 28 |
+
],
|
| 29 |
+
[
|
| 30 |
+
384,
|
| 31 |
+
1152
|
| 32 |
+
],
|
| 33 |
+
[
|
| 34 |
+
384,
|
| 35 |
+
1536
|
| 36 |
+
],
|
| 37 |
+
[
|
| 38 |
+
384,
|
| 39 |
+
1920
|
| 40 |
+
],
|
| 41 |
+
[
|
| 42 |
+
384,
|
| 43 |
+
2304
|
| 44 |
+
],
|
| 45 |
+
[
|
| 46 |
+
384,
|
| 47 |
+
2688
|
| 48 |
+
],
|
| 49 |
+
[
|
| 50 |
+
384,
|
| 51 |
+
3072
|
| 52 |
+
],
|
| 53 |
+
[
|
| 54 |
+
384,
|
| 55 |
+
3456
|
| 56 |
+
],
|
| 57 |
+
[
|
| 58 |
+
384,
|
| 59 |
+
3840
|
| 60 |
+
],
|
| 61 |
+
[
|
| 62 |
+
768,
|
| 63 |
+
384
|
| 64 |
+
],
|
| 65 |
+
[
|
| 66 |
+
768,
|
| 67 |
+
768
|
| 68 |
+
],
|
| 69 |
+
[
|
| 70 |
+
768,
|
| 71 |
+
1152
|
| 72 |
+
],
|
| 73 |
+
[
|
| 74 |
+
768,
|
| 75 |
+
1536
|
| 76 |
+
],
|
| 77 |
+
[
|
| 78 |
+
768,
|
| 79 |
+
1920
|
| 80 |
+
],
|
| 81 |
+
[
|
| 82 |
+
1152,
|
| 83 |
+
384
|
| 84 |
+
],
|
| 85 |
+
[
|
| 86 |
+
1152,
|
| 87 |
+
768
|
| 88 |
+
],
|
| 89 |
+
[
|
| 90 |
+
1152,
|
| 91 |
+
1152
|
| 92 |
+
],
|
| 93 |
+
[
|
| 94 |
+
1536,
|
| 95 |
+
384
|
| 96 |
+
],
|
| 97 |
+
[
|
| 98 |
+
1536,
|
| 99 |
+
768
|
| 100 |
+
],
|
| 101 |
+
[
|
| 102 |
+
1920,
|
| 103 |
+
384
|
| 104 |
+
],
|
| 105 |
+
[
|
| 106 |
+
1920,
|
| 107 |
+
768
|
| 108 |
+
],
|
| 109 |
+
[
|
| 110 |
+
2304,
|
| 111 |
+
384
|
| 112 |
+
],
|
| 113 |
+
[
|
| 114 |
+
2688,
|
| 115 |
+
384
|
| 116 |
+
],
|
| 117 |
+
[
|
| 118 |
+
3072,
|
| 119 |
+
384
|
| 120 |
+
],
|
| 121 |
+
[
|
| 122 |
+
3456,
|
| 123 |
+
384
|
| 124 |
+
],
|
| 125 |
+
[
|
| 126 |
+
3840,
|
| 127 |
+
384
|
| 128 |
+
]
|
| 129 |
+
],
|
| 130 |
+
"image_mean": [
|
| 131 |
+
0.5,
|
| 132 |
+
0.5,
|
| 133 |
+
0.5
|
| 134 |
+
],
|
| 135 |
+
"image_processor_type": "LlavaNextImageProcessor",
|
| 136 |
+
"image_std": [
|
| 137 |
+
0.5,
|
| 138 |
+
0.5,
|
| 139 |
+
0.5
|
| 140 |
+
],
|
| 141 |
+
"resample": 3,
|
| 142 |
+
"rescale_factor": 0.00392156862745098,
|
| 143 |
+
"size": {
|
| 144 |
+
"height": 384,
|
| 145 |
+
"width": 384
|
| 146 |
+
}
|
| 147 |
+
},
|
| 148 |
+
"image_token": "<image>",
|
| 149 |
+
"num_additional_image_tokens": 0,
|
| 150 |
+
"patch_size": 16,
|
| 151 |
+
"processor_class": "Granite4VisionProcessor",
|
| 152 |
+
"vision_feature_select_strategy": "full"
|
| 153 |
+
}
|
tokenizer.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
tokenizer_config.json
ADDED
|
@@ -0,0 +1,19 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"add_prefix_space": false,
|
| 3 |
+
"auto_map": {
|
| 4 |
+
"AutoProcessor": "processing.Granite4VisionProcessor"
|
| 5 |
+
},
|
| 6 |
+
"backend": "tokenizers",
|
| 7 |
+
"bos_token": "<|end_of_text|>",
|
| 8 |
+
"clean_up_tokenization_spaces": false,
|
| 9 |
+
"eos_token": "<|end_of_text|>",
|
| 10 |
+
"errors": "replace",
|
| 11 |
+
"is_local": true,
|
| 12 |
+
"local_files_only": false,
|
| 13 |
+
"model_max_length": 1000000000000000019884624838656,
|
| 14 |
+
"pad_token": "<|pad|>",
|
| 15 |
+
"padding_side": "left",
|
| 16 |
+
"processor_class": "Granite4VisionProcessor",
|
| 17 |
+
"tokenizer_class": "GPT2Tokenizer",
|
| 18 |
+
"unk_token": "<|unk|>"
|
| 19 |
+
}
|