--- base_model: Qwen/Qwen2.5-VL-7B-Instruct library_name: peft pipeline_tag: image-text-to-text license: apache-2.0 tags: - base_model:adapter:Qwen/Qwen2.5-VL-7B-Instruct - llama-factory - lora - transformers - financial - vlm --- # PyFi-QwenVL-7B-47K This repository contains a fine-tuned LoRA adapter for [Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) optimized for hierarchical financial image understanding. It was introduced in the paper [PyFi: Toward Pyramid-like Financial Image Understanding for VLMs via Adversarial Agents](https://huggingface.co/papers/2512.14735). ## Model Details - **Developed by:** Yuqun Zhang, Yuxuan Zhao, Sijia Chen (AgenticFin Lab) - **Model Type:** Vision-Language Model (VLM) Adapter - **Base Model:** [Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) - **Language(s):** English - **License:** Apache 2.0 - **Resources:** - [Paper](https://arxiv.org/abs/2512.14735) - [Code Repository](https://github.com/AgenticFinLab/PyFi) - [Dataset (PyFi-600K)](https://huggingface.co/datasets/AgenticFinLab/PyFi-600K) ## Model Description **PyFi** (Pyramid-like Financial Image Understanding) is a framework designed to enhance Vision-Language Models in understanding complex financial images (e.g., stock charts, financial reports) through a pyramid-like reasoning structure. This model allows VLMs to reason through question chains in a progressive, simple-to-complex manner. This specific checkpoint is fine-tuned on approximately 47,000 reasoning chains from the **PyFi-600K** dataset. It is trained to handle six hierarchical capability levels: 1. **Perception**: Basic visual understanding. 2. **Data Extraction**: Foundational information retrieval. 3. **Calculation Analysis**: Numerical analysis tasks. 4. **Pattern Recognition**: Identifying trends and patterns. 5. **Logical Reasoning**: Complex logical analysis. 6. **Decision Support**: Strategic decision-making assistance. ## Training Details The model was fine-tuned using Parameter-Efficient Fine-Tuning (LoRA) with full-module adaptation. - **Training Data:** ~47K sample chains from [PyFi-600K](https://huggingface.co/datasets/AgenticFinLab/PyFi-600K). - **Optimizer:** AdamW - **Learning Rate:** 1.0e-4 - **Learning Rate Schedule:** Cosine scheduling with a warmup ratio of 0.1 - **Epochs:** 1 - **Effective Batch Size:** 8 - **Hardware:** 4x NVIDIA RTX 5090 GPUs ## Citation If you find PyFi useful in your research, please cite: ```bibtex @article{pyfi2025, title={PyFi: Toward Pyramid-like Financial Image Understanding for VLMs via Adversarial Agents}, author={Zhang, Yuqun and Zhao, Yuxuan and Chen, Sijia}, journal={arXiv preprint arXiv:2512.14735}, year={2025} } ```