nielsr's picture
nielsr HF Staff
Improve model card: add metadata, paper link, and description
8904da1 verified
|
raw
history blame
2.75 kB
metadata
base_model: Qwen/Qwen2.5-VL-7B-Instruct
library_name: peft
pipeline_tag: image-text-to-text
license: apache-2.0
tags:
  - base_model:adapter:Qwen/Qwen2.5-VL-7B-Instruct
  - llama-factory
  - lora
  - transformers
  - financial
  - vlm

PyFi-QwenVL-7B-47K

This repository contains a fine-tuned LoRA adapter for Qwen2.5-VL-7B-Instruct optimized for hierarchical financial image understanding. It was introduced in the paper PyFi: Toward Pyramid-like Financial Image Understanding for VLMs via Adversarial Agents.

Model Details

Model Description

PyFi (Pyramid-like Financial Image Understanding) is a framework designed to enhance Vision-Language Models in understanding complex financial images (e.g., stock charts, financial reports) through a pyramid-like reasoning structure. This model allows VLMs to reason through question chains in a progressive, simple-to-complex manner.

This specific checkpoint is fine-tuned on approximately 47,000 reasoning chains from the PyFi-600K dataset. It is trained to handle six hierarchical capability levels:

  1. Perception: Basic visual understanding.
  2. Data Extraction: Foundational information retrieval.
  3. Calculation Analysis: Numerical analysis tasks.
  4. Pattern Recognition: Identifying trends and patterns.
  5. Logical Reasoning: Complex logical analysis.
  6. Decision Support: Strategic decision-making assistance.

Training Details

The model was fine-tuned using Parameter-Efficient Fine-Tuning (LoRA) with full-module adaptation.

  • Training Data: ~47K sample chains from PyFi-600K.
  • Optimizer: AdamW
  • Learning Rate: 1.0e-4
  • Learning Rate Schedule: Cosine scheduling with a warmup ratio of 0.1
  • Epochs: 1
  • Effective Batch Size: 8
  • Hardware: 4x NVIDIA RTX 5090 GPUs

Citation

If you find PyFi useful in your research, please cite:

@article{pyfi2025,
  title={PyFi: Toward Pyramid-like Financial Image Understanding for VLMs via Adversarial Agents},
  author={Zhang, Yuqun and Zhao, Yuxuan and Chen, Sijia},
  journal={arXiv preprint arXiv:2512.14735},
  year={2025}
}