Raymond-Qiancx
/

ProgressLM-3B-SFT

+---
+pipeline_tag: image-text-to-text
+library_name: transformers
+base_model: Qwen/Qwen2.5-VL-3B-Instruct
+tags:
+- progress-reasoning
+- vlm
+- vision-language
+---
+# ProgressLM-3B-SFT
+ProgressLM is a Vision-Language Model (VLM) specifically fine-tuned for **progress reasoning**—estimating how much of a task has been completed from partial observations. It is introduced in the paper [ProgressLM: Towards Progress Reasoning in Vision-Language Models](https://huggingface.co/papers/2601.15224).
+This version is the 3B parameter model fine-tuned using Supervised Fine-Tuning (SFT) on the **ProgressLM-45K** dataset.
+## Resources
+- **Project Page:** [https://progresslm.github.io/ProgressLM/](https://progresslm.github.io/ProgressLM/)
+- **GitHub Repository:** [https://github.com/ProgressLM/ProgressLM](https://github.com/ProgressLM/ProgressLM)
+- **Paper:** [https://huggingface.co/papers/2601.15224](https://huggingface.co/papers/2601.15224)
+- **Dataset:** [Raymond-Qiancx/ProgressLM-Dataset](https://huggingface.co/datasets/Raymond-Qiancx/ProgressLM-Dataset)
+## Overview
+Estimating task progress requires reasoning over long-horizon dynamics rather than recognizing static visual content. ProgressLM follows a human-inspired two-stage progress reasoning paradigm:
+1. **Episodic Retrieval:** Coarsely locating the observation along the demonstrated task.
+2. **Mental Simulation:** Imagining the transition from the retrieved anchor to the current observation for a fine-grained estimate.
+ProgressLM-3B achieves consistent improvements in task progress estimation even at a small model scale, despite being trained on a task set fully disjoint from evaluation tasks.
 ## Citation
 If you find this work useful, please cite our paper: