PyFi-QwenVL-7B-47K / README.md
nielsr's picture
nielsr HF Staff
Improve model card and link to paper
cdc36f2 verified
|
raw
history blame
3.43 kB
---
base_model: Qwen/Qwen2.5-VL-7B-Instruct
library_name: peft
pipeline_tag: image-text-to-text
license: apache-2.0
tags:
- base_model:adapter:Qwen/Qwen2.5-VL-7B-Instruct
- llama-factory
- lora
- transformers
- finance
- financial-vlm
---
# PyFi-QwenVL-7B-47K
This repository contains a LoRA adapter for [Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) fine-tuned using the **PyFi** framework for advanced financial image understanding.
## Model Details
- **Developed by:** AgenticFin Lab
- **Model type:** Vision-Language Model (VLM) LoRA Adapter
- **Base model:** [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct)
- **Finetuning variant:** Trained on ~47K sample chains **without** Chain-of-Thought (CoT)
- **License:** Apache 2.0
- **Paper:** [PyFi: Toward Pyramid-like Financial Image Understanding for VLMs via Adversarial Agents](https://huggingface.co/papers/2512.14735)
- **Repository:** [https://github.com/AgenticFinLab/PyFi](https://github.com/AgenticFinLab/PyFi)
## Description
**PyFi** (Pyramid-like Financial Image Understanding) is a framework designed to enhance VLMs in understanding complex financial images through adversarial agents. The framework uses **PyFi-600K**, a dataset organized into a reasoning pyramid across 6 capability levels:
1. **Perception**: Basic visual understanding.
2. **Data Extraction**: Foundational information retrieval.
3. **Calculation Analysis**: Numerical analysis tasks.
4. **Pattern Recognition**: Identifying trends and patterns.
5. **Logical Reasoning**: Complex logical analysis.
6. **Decision Support**: Strategic decision-making assistance.
This specific model variant is fine-tuned on the final question-answer pairs of the reasoning chains, rather than the intermediate steps.
## Usage Examples
Below are examples of how to use the PyFi framework for financial analysis, as provided in the official repository.
### MCTS Tree Construction for Financial Analysis
```python
from fttracer.mcts.gqa import ImageQASystem
# Initialize the Image QA system
system = ImageQASystem()
# Analyze a financial report image with MCTS tree
report_path = "examples/financial_report.png"
# Run MCTS tree construction for comprehensive analysis
system.main(
image_path=report_path,
context_base_path="examples/context/"
)
```
### Endgame QA Generation for Financial Images
```python
from fttracer.mcts.gqa import ImageQASystem
# Initialize the Image QA system
system = ImageQASystem()
# Analyze a stock chart and generate endgame QA
stock_chart_path = "examples/stock_chart.png"
# Generate endgame QA focused on final analysis
system.main_gfa(
image_path=stock_chart_path,
context_base_path="examples/context/"
)
```
## Evaluation Results
Fine-tuning Qwen2.5-VL models on the pyramid-structured question chains enables these models to answer complex financial questions more effectively. According to the paper, the PyFi models show significant accuracy improvements, especially in high-level reasoning and decision support tasks compared to their base models.
## Citation
If you use PyFi in your research, please cite the following paper:
```bibtex
@article{pyfi2025,
title={PyFi: Toward Pyramid-like Financial Image Understanding for VLMs via Adversarial Agents},
author={Zhang, Yuqun and Zhao, Yuxuan and Chen, Sijia},
journal={arXiv preprint arXiv:2512.14735},
year={2025}
}
```