---
base_model: Qwen/Qwen2.5-VL-3B-Instruct
library_name: peft
pipeline_tag: image-text-to-text
license: apache-2.0
tags:
- base_model:adapter:Qwen/Qwen2.5-VL-3B-Instruct
- llama-factory
- lora
- transformers
- finance
- vision-language
---

# PyFi-QwenVL-3B-47K

This model is a parameter-efficient fine-tuned version (LoRA) of [Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct) specialized for financial image understanding. It was introduced as part of the **PyFi** framework.

- **Paper:** [PyFi: Toward Pyramid-like Financial Image Understanding for VLMs via Adversarial Agents](https://arxiv.org/abs/2512.14735)
- **Repository:** [https://github.com/AgenticFinLab/PyFi](https://github.com/AgenticFinLab/PyFi)
- **Dataset:** [PyFi-600K](https://huggingface.co/datasets/AgenticFinLab/PyFi-600K)

## Model Description

PyFi (Pyramid-like Financial Image Understanding) is a framework designed to enable Vision Language Models (VLMs) to reason through financial images—such as stock charts, financial reports, and economic diagrams—in a progressive, simple-to-complex manner. 

This specific checkpoint is the 3B variant fine-tuned on approximately 47,000 reasoning chains. This version was trained **without Chain-of-Thought (CoT)**, focusing on the model's ability to provide the final answer in the financial reasoning pyramid.

The model is designed to handle tasks across six hierarchical capability levels:
1. **Perception**: Basic visual understanding.
2. **Data Extraction**: Information retrieval from charts and tables.
3. **Calculation Analysis**: Numerical analysis tasks.
4. **Pattern Recognition**: Identifying trends and patterns.
5. **Logical Reasoning**: Complex logical analysis.
6. **Decision Support**: Strategic decision-making assistance.

## Training Details

- **Finetuning approach:** LoRA (Parameter-Efficient Fine-Tuning) with full-module adaptation.
- **Training Data:** 47K sample chains from the PyFi-600K dataset.
- **Optimizer:** AdamW
- **Learning Rate:** $1.0 \times 10^{-4}$
- **Learning Rate Schedule:** Cosine scheduling with a warmup ratio of 0.1.
- **Training Epochs:** 1
- **Effective Batch Size:** 8
- **Hardware:** 4x NVIDIA RTX 5090 GPUs.

## Evaluation Results

In the PyFi benchmark, fine-tuning on pyramid-structured question chains showed significant improvements. The PyFi models (when using CoT) yielded average accuracy improvements of 19.52% for the 3B variant over baseline pre-trained models.

## Citation

If you use PyFi in your research, please cite:

```bibtex
@article{pyfi2025,
  title={PyFi: Toward Pyramid-like Financial Image Understanding for VLMs via Adversarial Agents},
  author={Zhang, Yuqun and Zhao, Yuxuan and Chen, Sijia},
  journal={arXiv preprint arXiv:2512.14735},
  year={2025}
}
```