Add model card metadata and links
Browse filesHi! I'm Niels from the Hugging Face community team. I'm opening this PR to update your model card with relevant metadata and links to the paper and code.
Specifically:
- Added `pipeline_tag: image-text-to-text`.
- Added `library_name: transformers`.
- Added `base_model: Qwen/Qwen2.5-VL-3B-Instruct`.
- Linked the paper, project page, and GitHub repository.
This helps users discover and use your model more easily. Please feel free to review and merge!
README.md
CHANGED
|
@@ -1,3 +1,32 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
## Citation
|
| 2 |
|
| 3 |
If you find this work useful, please cite our paper:
|
|
|
|
| 1 |
+
---
|
| 2 |
+
pipeline_tag: image-text-to-text
|
| 3 |
+
library_name: transformers
|
| 4 |
+
base_model: Qwen/Qwen2.5-VL-3B-Instruct
|
| 5 |
+
tags:
|
| 6 |
+
- progress-reasoning
|
| 7 |
+
- vlm
|
| 8 |
+
- vision-language
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
# ProgressLM-3B-SFT
|
| 12 |
+
|
| 13 |
+
ProgressLM is a Vision-Language Model (VLM) specifically fine-tuned for **progress reasoning**—estimating how much of a task has been completed from partial observations. It is introduced in the paper [ProgressLM: Towards Progress Reasoning in Vision-Language Models](https://huggingface.co/papers/2601.15224).
|
| 14 |
+
|
| 15 |
+
This version is the 3B parameter model fine-tuned using Supervised Fine-Tuning (SFT) on the **ProgressLM-45K** dataset.
|
| 16 |
+
|
| 17 |
+
## Resources
|
| 18 |
+
- **Project Page:** [https://progresslm.github.io/ProgressLM/](https://progresslm.github.io/ProgressLM/)
|
| 19 |
+
- **GitHub Repository:** [https://github.com/ProgressLM/ProgressLM](https://github.com/ProgressLM/ProgressLM)
|
| 20 |
+
- **Paper:** [https://huggingface.co/papers/2601.15224](https://huggingface.co/papers/2601.15224)
|
| 21 |
+
- **Dataset:** [Raymond-Qiancx/ProgressLM-Dataset](https://huggingface.co/datasets/Raymond-Qiancx/ProgressLM-Dataset)
|
| 22 |
+
|
| 23 |
+
## Overview
|
| 24 |
+
Estimating task progress requires reasoning over long-horizon dynamics rather than recognizing static visual content. ProgressLM follows a human-inspired two-stage progress reasoning paradigm:
|
| 25 |
+
1. **Episodic Retrieval:** Coarsely locating the observation along the demonstrated task.
|
| 26 |
+
2. **Mental Simulation:** Imagining the transition from the retrieved anchor to the current observation for a fine-grained estimate.
|
| 27 |
+
|
| 28 |
+
ProgressLM-3B achieves consistent improvements in task progress estimation even at a small model scale, despite being trained on a task set fully disjoint from evaluation tasks.
|
| 29 |
+
|
| 30 |
## Citation
|
| 31 |
|
| 32 |
If you find this work useful, please cite our paper:
|