---
base_model: Qwen/Qwen2.5-VL-7B-Instruct
library_name: peft
pipeline_tag: image-text-to-text
license: apache-2.0
tags:
- base_model:adapter:Qwen/Qwen2.5-VL-7B-Instruct
- llama-factory
- lora
- transformers
- financial
- vlm
---

# PyFi-QwenVL-7B-47K

This repository contains a fine-tuned LoRA adapter for [Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) optimized for hierarchical financial image understanding. It was introduced in the paper [PyFi: Toward Pyramid-like Financial Image Understanding for VLMs via Adversarial Agents](https://huggingface.co/papers/2512.14735).

## Model Details

- **Developed by:** Yuqun Zhang, Yuxuan Zhao, Sijia Chen (AgenticFin Lab)
- **Model Type:** Vision-Language Model (VLM) Adapter
- **Base Model:** [Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct)
- **Language(s):** English
- **License:** Apache 2.0
- **Resources:**
  - [Paper](https://arxiv.org/abs/2512.14735)
  - [Code Repository](https://github.com/AgenticFinLab/PyFi)
  - [Dataset (PyFi-600K)](https://huggingface.co/datasets/AgenticFinLab/PyFi-600K)

## Model Description

**PyFi** (Pyramid-like Financial Image Understanding) is a framework designed to enhance Vision-Language Models in understanding complex financial images (e.g., stock charts, financial reports) through a pyramid-like reasoning structure. This model allows VLMs to reason through question chains in a progressive, simple-to-complex manner.

This specific checkpoint is fine-tuned on approximately 47,000 reasoning chains from the **PyFi-600K** dataset. It is trained to handle six hierarchical capability levels:
1. **Perception**: Basic visual understanding.
2. **Data Extraction**: Foundational information retrieval.
3. **Calculation Analysis**: Numerical analysis tasks.
4. **Pattern Recognition**: Identifying trends and patterns.
5. **Logical Reasoning**: Complex logical analysis.
6. **Decision Support**: Strategic decision-making assistance.

## Training Details

The model was fine-tuned using Parameter-Efficient Fine-Tuning (LoRA) with full-module adaptation.

- **Training Data:** ~47K sample chains from [PyFi-600K](https://huggingface.co/datasets/AgenticFinLab/PyFi-600K).
- **Optimizer:** AdamW
- **Learning Rate:** 1.0e-4
- **Learning Rate Schedule:** Cosine scheduling with a warmup ratio of 0.1
- **Epochs:** 1
- **Effective Batch Size:** 8
- **Hardware:** 4x NVIDIA RTX 5090 GPUs

## Citation

If you find PyFi useful in your research, please cite:

```bibtex
@article{pyfi2025,
  title={PyFi: Toward Pyramid-like Financial Image Understanding for VLMs via Adversarial Agents},
  author={Zhang, Yuqun and Zhao, Yuxuan and Chen, Sijia},
  journal={arXiv preprint arXiv:2512.14735},
  year={2025}
}
```