--- license: apache-2.0 language: - fr pipeline_tag: image-text-to-text tags: - multimodal library_name: transformers metrics: - cer - wer - f1 base_model: - Qwen/Qwen2.5-VL-7B-Instruct --- # Qwen2.5-VL-7B-Instruct Index Cards Nested ## Introduction This version of QWEN2.5-VL-7B is specialized for document parsing on French index cards. It was fine-tuned as part of the [DAI-CReTDHI](https://dai-cretdhi.univ-lr.fr/) project. ## Training The model is a [QWEN2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) fine-tuned on French index cards using LoRA. Parameters: - Image width: 800 pixels - LoRa rank: 8 - LoRa alpha: 32 - Epochs: 10 (about 4k steps) Wandb: https://wandb.ai/starride-teklia/DAI-CReTDHI/runs/hk78u308 ## Evaluation | Set | CER (%) | WER (%) | F1 @ 0.0 (%) | F1 @ 0.3 (%) | N samples | N entities | |:--------------------:|:-------:|:-------:|:------------:|:------------:|-----------|------------| | QWEN2.5-VL-7B Flat | 10.23 | 18.07 | 83.6 | 91.96 | 55 | 808 | | QWEN2.5-VL-7B Nested | **5.48** | **15.94** | **84.86** | **92.27** | 58 | 909 | ### Usage Here we show a code snippet to show you how to use the model with `transformers` and `qwen_vl_utils`: * Prediction script ```python from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor from qwen_vl_utils import process_vision_info model = Qwen2_5_VLForConditionalGeneration.from_pretrained( "Teklia/DAI-cards-nested", torch_dtype=torch.bfloat16, attn_implementation="flash_attention_2", device_map="auto", ) processor = AutoProcessor.from_pretrained("Teklia/DAI-cards-nested") messages = [ { "role": "user", "content": [ { "type": "image", "image": "12e74aa3-4d7d-47b4-b46b-7013b9a1f251.jpg", }, {"type": "text", "text": "Extrait les informations en XML."}, ], } ] # Preparation for inference text = processor.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) image_inputs, video_inputs = process_vision_info(messages) inputs = processor( text=[text], images=image_inputs, videos=video_inputs, padding=True, return_tensors="pt", ) inputs = inputs.to("cuda") # Inference: Generation of the output generated_ids = model.generate(**inputs, max_new_tokens=1024) generated_ids_trimmed = [ out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids) ] output_text = processor.batch_decode( generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False ) print(output_text[0]) ``` * Output ```xml Choisnard Marie Madelaine F 23 juillet 1753 Ambroise (Indre-et-Loire) journalière veuf(ve) Rocheriou Pierre décédé(e) Choisnard Michel Dubeuf Louise 1826 septembre 5 ``` ## Citation To cite the original QWEN2.5-VL model: ``` @misc{qwen2.5-VL, title = {Qwen2.5-VL}, url = {https://qwenlm.github.io/blog/qwen2.5-vl/}, author = {Qwen Team}, month = {January}, year = {2025} } ```