File size: 3,430 Bytes
24884ca
 
 
cdc36f2
 
24884ca
 
 
 
 
cdc36f2
 
24884ca
 
cdc36f2
24884ca
cdc36f2
24884ca
 
 
cdc36f2
 
 
 
 
 
 
24884ca
cdc36f2
24884ca
cdc36f2
 
 
 
 
 
 
24884ca
cdc36f2
24884ca
cdc36f2
24884ca
cdc36f2
24884ca
cdc36f2
24884ca
cdc36f2
 
24884ca
cdc36f2
 
24884ca
cdc36f2
 
24884ca
cdc36f2
 
 
 
 
 
24884ca
cdc36f2
24884ca
cdc36f2
 
24884ca
cdc36f2
 
24884ca
cdc36f2
 
24884ca
cdc36f2
 
 
 
 
 
24884ca
cdc36f2
24884ca
cdc36f2
24884ca
cdc36f2
24884ca
cdc36f2
24884ca
cdc36f2
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
---
base_model: Qwen/Qwen2.5-VL-7B-Instruct
library_name: peft
pipeline_tag: image-text-to-text
license: apache-2.0
tags:
- base_model:adapter:Qwen/Qwen2.5-VL-7B-Instruct
- llama-factory
- lora
- transformers
- finance
- financial-vlm
---

# PyFi-QwenVL-7B-47K

This repository contains a LoRA adapter for [Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) fine-tuned using the **PyFi** framework for advanced financial image understanding.

## Model Details

- **Developed by:** AgenticFin Lab
- **Model type:** Vision-Language Model (VLM) LoRA Adapter
- **Base model:** [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct)
- **Finetuning variant:** Trained on ~47K sample chains **without** Chain-of-Thought (CoT)
- **License:** Apache 2.0
- **Paper:** [PyFi: Toward Pyramid-like Financial Image Understanding for VLMs via Adversarial Agents](https://huggingface.co/papers/2512.14735)
- **Repository:** [https://github.com/AgenticFinLab/PyFi](https://github.com/AgenticFinLab/PyFi)

## Description

**PyFi** (Pyramid-like Financial Image Understanding) is a framework designed to enhance VLMs in understanding complex financial images through adversarial agents. The framework uses **PyFi-600K**, a dataset organized into a reasoning pyramid across 6 capability levels:
1. **Perception**: Basic visual understanding.
2. **Data Extraction**: Foundational information retrieval.
3. **Calculation Analysis**: Numerical analysis tasks.
4. **Pattern Recognition**: Identifying trends and patterns.
5. **Logical Reasoning**: Complex logical analysis.
6. **Decision Support**: Strategic decision-making assistance.

This specific model variant is fine-tuned on the final question-answer pairs of the reasoning chains, rather than the intermediate steps.

## Usage Examples

Below are examples of how to use the PyFi framework for financial analysis, as provided in the official repository.

### MCTS Tree Construction for Financial Analysis

```python
from fttracer.mcts.gqa import ImageQASystem

# Initialize the Image QA system
system = ImageQASystem()

# Analyze a financial report image with MCTS tree
report_path = "examples/financial_report.png"

# Run MCTS tree construction for comprehensive analysis
system.main(
    image_path=report_path,
    context_base_path="examples/context/"
)
```

### Endgame QA Generation for Financial Images

```python
from fttracer.mcts.gqa import ImageQASystem

# Initialize the Image QA system
system = ImageQASystem()

# Analyze a stock chart and generate endgame QA
stock_chart_path = "examples/stock_chart.png"

# Generate endgame QA focused on final analysis
system.main_gfa(
    image_path=stock_chart_path,
    context_base_path="examples/context/"
)
```

## Evaluation Results

Fine-tuning Qwen2.5-VL models on the pyramid-structured question chains enables these models to answer complex financial questions more effectively. According to the paper, the PyFi models show significant accuracy improvements, especially in high-level reasoning and decision support tasks compared to their base models.

## Citation

If you use PyFi in your research, please cite the following paper:

```bibtex
@article{pyfi2025,
  title={PyFi: Toward Pyramid-like Financial Image Understanding for VLMs via Adversarial Agents},
  author={Zhang, Yuqun and Zhao, Yuxuan and Chen, Sijia},
  journal={arXiv preprint arXiv:2512.14735},
  year={2025}
}
```