Raymond-Qiancx
/

ProgressLM-3B-RL

 ---
 license: mit
+pipeline_tag: image-text-to-text
+library_name: transformers
+base_model: Qwen/Qwen2.5-VL-3B-Instruct
+tags:
+- progress-reasoning
+- vlm
 ---
+# ProgressLM-3B
+[**Website**](https://progresslm.github.io/ProgressLM/) | [**Paper**](https://huggingface.co/papers/2601.15224) | [**Code**](https://github.com/ProgressLM/ProgressLM)
+ProgressLM-3B is a Vision-Language Model (VLM) specifically designed for **progress reasoning**. While traditional VLMs are proficient at describing static visual content, ProgressLM is trained to infer how far a task has progressed from partial observations.
+The model is built upon the [Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct) architecture and was fine-tuned on the **ProgressLM-45K** dataset. It employs a human-inspired two-stage reasoning paradigm: episodic retrieval (to locate the observation along a task trajectory) and mental simulation (to imagine the transition from an anchor to the current observation).
+## Model Details
+- **Developed by:** Jianshu Zhang, Chengxuan Qian, Haosen Sun, Haoran Lu, Dingcheng Wang, Letian Xue, Han Liu
+- **Model Type:** Vision-Language Model
+- **Base Model:** Qwen2.5-VL-3B-Instruct
+- **Language(s):** English
+- **License:** MIT
 ## Citation
 If you find this work useful, please cite our paper: