Teklia
/

DAI-cards-nested

Image-Text-to-Text

text-generation-inference

Model card Files Files and versions

DAI-cards-nested / README.md

starride-teklia's picture

starride-teklia

Update README.md

a86c9b4 verified 3 months ago

|

history blame contribute delete

3.63 kB

	---
	license: apache-2.0
	language:
	- fr
	pipeline_tag: image-text-to-text
	tags:
	- multimodal
	library_name: transformers
	metrics:
	- cer
	- wer
	- f1
	base_model:
	- Qwen/Qwen2.5-VL-7B-Instruct
	---

	# Qwen2.5-VL-7B-Instruct Index Cards Nested

	## Introduction

	This version of QWEN2.5-VL-7B is specialized for document parsing on French index cards.
	It was fine-tuned as part of the [DAI-CReTDHI](https://dai-cretdhi.univ-lr.fr/) project.

	## Training

	The model is a [QWEN2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) fine-tuned on French index cards using LoRA.

	Parameters:
	- Image width: 800 pixels
	- LoRa rank: 8
	- LoRa alpha: 32
	- Epochs: 10 (about 4k steps)

	Wandb: https://wandb.ai/starride-teklia/DAI-CReTDHI/runs/hk78u308

	## Evaluation

	\| Set \| CER (%) \| WER (%) \| F1 @ 0.0 (%) \| F1 @ 0.3 (%) \| N samples \| N entities \|
	\|:--------------------:\|:-------:\|:-------:\|:------------:\|:------------:\|-----------\|------------\|
	\| QWEN2.5-VL-7B Flat \| 10.23 \| 18.07 \| 83.6 \| 91.96 \| 55 \| 808 \|
	\| QWEN2.5-VL-7B Nested \| 5.48 \| 15.94 \| 84.86 \| 92.27 \| 58 \| 909 \|

	### Usage

	Here we show a code snippet to show you how to use the model with `transformers` and `qwen_vl_utils`:

	* Prediction script

	```python
	from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor
	from qwen_vl_utils import process_vision_info

	model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
	"Teklia/DAI-cards-nested",
	torch_dtype=torch.bfloat16,
	attn_implementation="flash_attention_2",
	device_map="auto",
	)

	processor = AutoProcessor.from_pretrained("Teklia/DAI-cards-nested")

	messages = [
	{
	"role": "user",
	"content": [
	{
	"type": "image",
	"image": "12e74aa3-4d7d-47b4-b46b-7013b9a1f251.jpg",
	},
	{"type": "text", "text": "Extrait les informations en XML."},
	],
	}
	]

	# Preparation for inference
	text = processor.apply_chat_template(
	messages, tokenize=False, add_generation_prompt=True
	)
	image_inputs, video_inputs = process_vision_info(messages)
	inputs = processor(
	text=[text],
	images=image_inputs,
	videos=video_inputs,
	padding=True,
	return_tensors="pt",
	)
	inputs = inputs.to("cuda")

	# Inference: Generation of the output
	generated_ids = model.generate(**inputs, max_new_tokens=1024)
	generated_ids_trimmed = [
	out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
	]
	output_text = processor.batch_decode(
	generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
	)
	print(output_text[0])
	```

	* Output

	```xml
	<root>
	<Décès>
	<Défunt>
	<Nom>Choisnard</Nom>
	<Prénom>Marie Madelaine</Prénom>
	<Sexe>F</Sexe>
	<DateDeNaissance>23 juillet 1753</DateDeNaissance>
	<LieuDeNaissance>Ambroise (Indre-et-Loire)</LieuDeNaissance>
	<Profession>journalière</Profession>
	<Statut>veuf(ve)</Statut>
	</Défunt>
	<Conjoint>
	<Nom>Rocheriou</Nom>
	<Prénom>Pierre</Prénom>
	<Statut>décédé(e)</Statut>
	</Conjoint>
	<Père>
	<Nom>Choisnard</Nom>
	<Prénom>Michel</Prénom>
	</Père>
	<Mère>
	<Nom>Dubeuf</Nom>
	<Prénom>Louise</Prénom>
	</Mère>
	</Décès>
	<Date>
	<Année>1826</Année>
	<Mois>septembre</Mois>
	<Jour>5</Jour>
	</Date>
	</root>
	```

	## Citation

	To cite the original QWEN2.5-VL model:

	```
	@misc{qwen2.5-VL,
	title = {Qwen2.5-VL},
	url = {https://qwenlm.github.io/blog/qwen2.5-vl/},
	author = {Qwen Team},
	month = {January},
	year = {2025}
	}
	```