--- base_model: Qwen/Qwen2.5-VL-7B-Instruct library_name: peft pipeline_tag: image-text-to-text license: apache-2.0 tags: - base_model:adapter:Qwen/Qwen2.5-VL-7B-Instruct - llama-factory - lora - transformers - finance - financial-vlm --- # PyFi-QwenVL-7B-47K This repository contains a LoRA adapter for [Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) fine-tuned using the **PyFi** framework for advanced financial image understanding. ## Model Details - **Developed by:** AgenticFin Lab - **Model type:** Vision-Language Model (VLM) LoRA Adapter - **Base model:** [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) - **Finetuning variant:** Trained on ~47K sample chains **without** Chain-of-Thought (CoT) - **License:** Apache 2.0 - **Paper:** [PyFi: Toward Pyramid-like Financial Image Understanding for VLMs via Adversarial Agents](https://huggingface.co/papers/2512.14735) - **Repository:** [https://github.com/AgenticFinLab/PyFi](https://github.com/AgenticFinLab/PyFi) ## Description **PyFi** (Pyramid-like Financial Image Understanding) is a framework designed to enhance VLMs in understanding complex financial images through adversarial agents. The framework uses **PyFi-600K**, a dataset organized into a reasoning pyramid across 6 capability levels: 1. **Perception**: Basic visual understanding. 2. **Data Extraction**: Foundational information retrieval. 3. **Calculation Analysis**: Numerical analysis tasks. 4. **Pattern Recognition**: Identifying trends and patterns. 5. **Logical Reasoning**: Complex logical analysis. 6. **Decision Support**: Strategic decision-making assistance. This specific model variant is fine-tuned on the final question-answer pairs of the reasoning chains, rather than the intermediate steps. ## Usage Examples Below are examples of how to use the PyFi framework for financial analysis, as provided in the official repository. ### MCTS Tree Construction for Financial Analysis ```python from fttracer.mcts.gqa import ImageQASystem # Initialize the Image QA system system = ImageQASystem() # Analyze a financial report image with MCTS tree report_path = "examples/financial_report.png" # Run MCTS tree construction for comprehensive analysis system.main( image_path=report_path, context_base_path="examples/context/" ) ``` ### Endgame QA Generation for Financial Images ```python from fttracer.mcts.gqa import ImageQASystem # Initialize the Image QA system system = ImageQASystem() # Analyze a stock chart and generate endgame QA stock_chart_path = "examples/stock_chart.png" # Generate endgame QA focused on final analysis system.main_gfa( image_path=stock_chart_path, context_base_path="examples/context/" ) ``` ## Evaluation Results Fine-tuning Qwen2.5-VL models on the pyramid-structured question chains enables these models to answer complex financial questions more effectively. According to the paper, the PyFi models show significant accuracy improvements, especially in high-level reasoning and decision support tasks compared to their base models. ## Citation If you use PyFi in your research, please cite the following paper: ```bibtex @article{pyfi2025, title={PyFi: Toward Pyramid-like Financial Image Understanding for VLMs via Adversarial Agents}, author={Zhang, Yuqun and Zhao, Yuxuan and Chen, Sijia}, journal={arXiv preprint arXiv:2512.14735}, year={2025} } ```