---
base_model: Qwen/Qwen2.5-VL-7B-Instruct
library_name: peft
pipeline_tag: image-text-to-text
license: apache-2.0
tags:
- base_model:adapter:Qwen/Qwen2.5-VL-7B-Instruct
- llama-factory
- lora
- transformers
- finance
- financial-vlm
---

# PyFi-QwenVL-7B-47K

This repository contains a LoRA adapter for [Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) fine-tuned using the **PyFi** framework for advanced financial image understanding.

## Model Details

- **Developed by:** AgenticFin Lab
- **Model type:** Vision-Language Model (VLM) LoRA Adapter
- **Base model:** [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct)
- **Finetuning variant:** Trained on ~47K sample chains **without** Chain-of-Thought (CoT)
- **License:** Apache 2.0
- **Paper:** [PyFi: Toward Pyramid-like Financial Image Understanding for VLMs via Adversarial Agents](https://huggingface.co/papers/2512.14735)
- **Repository:** [https://github.com/AgenticFinLab/PyFi](https://github.com/AgenticFinLab/PyFi)

## Description

**PyFi** (Pyramid-like Financial Image Understanding) is a framework designed to enhance VLMs in understanding complex financial images through adversarial agents. The framework uses **PyFi-600K**, a dataset organized into a reasoning pyramid across 6 capability levels:
1. **Perception**: Basic visual understanding.
2. **Data Extraction**: Foundational information retrieval.
3. **Calculation Analysis**: Numerical analysis tasks.
4. **Pattern Recognition**: Identifying trends and patterns.
5. **Logical Reasoning**: Complex logical analysis.
6. **Decision Support**: Strategic decision-making assistance.

This specific model variant is fine-tuned on the final question-answer pairs of the reasoning chains, rather than the intermediate steps.

## Usage Examples

Below are examples of how to use the PyFi framework for financial analysis, as provided in the official repository.

### MCTS Tree Construction for Financial Analysis

```python
from fttracer.mcts.gqa import ImageQASystem

# Initialize the Image QA system
system = ImageQASystem()

# Analyze a financial report image with MCTS tree
report_path = "examples/financial_report.png"

# Run MCTS tree construction for comprehensive analysis
system.main(
    image_path=report_path,
    context_base_path="examples/context/"
)
```

### Endgame QA Generation for Financial Images

```python
from fttracer.mcts.gqa import ImageQASystem

# Initialize the Image QA system
system = ImageQASystem()

# Analyze a stock chart and generate endgame QA
stock_chart_path = "examples/stock_chart.png"

# Generate endgame QA focused on final analysis
system.main_gfa(
    image_path=stock_chart_path,
    context_base_path="examples/context/"
)
```

## Evaluation Results

Fine-tuning Qwen2.5-VL models on the pyramid-structured question chains enables these models to answer complex financial questions more effectively. According to the paper, the PyFi models show significant accuracy improvements, especially in high-level reasoning and decision support tasks compared to their base models.

## Citation

If you use PyFi in your research, please cite the following paper:

```bibtex
@article{pyfi2025,
  title={PyFi: Toward Pyramid-like Financial Image Understanding for VLMs via Adversarial Agents},
  author={Zhang, Yuqun and Zhao, Yuxuan and Chen, Sijia},
  journal={arXiv preprint arXiv:2512.14735},
  year={2025}
}
```