Add pipeline tag and improve model card

Hi! I'm Niels from the Hugging Face community team.

I've noticed this model repository is missing a `pipeline_tag` in its metadata. Adding `pipeline_tag: image-feature-extraction` will help users discover this model when filtering by task on the Hugging Face Hub.

I've also:
- Updated the GitHub link to the official organization repository.
- Refined the sample usage snippet based on your GitHub README.
- Cleaned up the Markdown to be more concise.

Feel free to merge if this looks good!

Files changed (1) hide show

README.md +27 -51

README.md CHANGED Viewed

@@ -1,33 +1,50 @@
 ---
 license: mit
 ---
 **PL-Stitch**
 -------------
-[📚 Paper](https://www.arxiv.org/abs/2511.17805) - [🤖 GitHub](https://github.com/jaime-1998/PL-Stitch)
-This is the official repository for the **CVPR2026** paper [A Stitch in Time: Learning Procedural Workflow via Self-Supervised Plackett-Luce Ranking](https://www.arxiv.org/abs/2511.17805).
 *PL-Stitch* is an image foundation model that captures visual changes over time, enabling procedural activity understanding. It takes an image as input and produces a feature vector as output, leveraging the novel Plackett-Luce temporal ranking objective to build a comprehensive understanding of both the static semantic information and the procedural context within each frame.
-Star ⭐ us if you like it!
-<img src="https://cdn-uploads.huggingface.co/production/uploads/67d9504a41d31cc626fcecc8/O0azUcMHjCyKYzM4vox98.png" />
 If you use our model or code in your research, please cite our paper:
-```
 @misc{che2025stitchtimelearningprocedural,
       title={A Stitch in Time: Learning Procedural Workflow via Self-Supervised Plackett-Luce Ranking},
       author={Chengan Che and Chao Wang and Xinyue Chen and Sophia Tsoka and Luis C. Garcia-Peraza-Herrera},
@@ -37,45 +54,4 @@ If you use our model or code in your research, please cite our paper:
       primaryClass={cs.CV},
       url={https://arxiv.org/abs/2511.17805},
 }
-```
-Abstract
---------
-Procedural activities, ranging from routine cooking to complex surgical operations, are highly structured as a set of actions conducted in a specific temporal order. Despite their success on static images and short clips, current self-supervised learning methods often overlook the procedural nature that underpins such activities. We expose the lack of procedural awareness in current SSL methods with a motivating experiment: models pretrained on forward and time-reversed sequences produce highly similar features, confirming that their representations are blind to the underlying procedural order. To address this shortcoming, we propose PL-Stitch, a self-supervised framework that harnesses the inherent temporal order of video frames as a powerful supervisory signal. Our approach integrates two novel probabilistic objectives based on the Plackett-Luce (PL) model. The primary PL objective trains the model to sort sampled frames chronologically, compelling it to learn the global workflow progression. The secondary objective, a spatio-temporal jigsaw loss, complements the learning by capturing fine-grained, cross-frame object correlations. Our approach consistently achieves superior performance across five surgical and cooking benchmarks. Specifically, PL-Stitch yields significant gains in surgical phase recognition (e.g., +11.4 pp k-NN accuracy on Cholec80) and cooking action segmentation (e.g., +5.7 pp linear probing accuracy on Breakfast), demonstrating its effectiveness for procedural video representation learning.
-<br>
-🚩 PL-Stitch model
-------------------
-You can download the checkpoint at [🤗 PL-Stitch](https://huggingface.co/visurg/PL-Stitch) and run the following code to extract features from your video frames.
-   ```python
-   import torch
-   from PIL import Image
-   from build_model import build_model
-   # Load the pre-trained pl_stitch model
-   pl_stitch = build_model(pretrained_weights = 'your path to the model')
-   pl_stitch.eval()
-   # Load the image and convert it to a PyTorch tensor
-   img_path = 'path/to/your/image.jpg'
-   img = Image.open(img_path)
-   img = img.resize((224, 224))
-   img_tensor = torch.tensor(np.array(img)).unsqueeze(0).to('cuda')
-   # Extract features from the image
-   outputs = pl_stitch(img_tensor)
-   ```

 ---
 license: mit
+pipeline_tag: image-feature-extraction
 ---
 **PL-Stitch**
 -------------
+[📚 Paper](https://arxiv.org/abs/2511.17805) - [🤖 GitHub](https://github.com/visurg-ai/PL-Stitch)
+This is the official repository for the **CVPR 2026** paper [A Stitch in Time: Learning Procedural Workflow via Self-Supervised Plackett-Luce Ranking](https://arxiv.org/abs/2511.17805).
 *PL-Stitch* is an image foundation model that captures visual changes over time, enabling procedural activity understanding. It takes an image as input and produces a feature vector as output, leveraging the novel Plackett-Luce temporal ranking objective to build a comprehensive understanding of both the static semantic information and the procedural context within each frame.
+<img src="https://cdn-uploads.huggingface.co/production/uploads/67d9504a41d31cc626fcecc8/O0azUcMHjCyKYzM4vox98.png" />
+## Sample Usage
+You can download the checkpoint and run the following code to extract features from your video frames. Note that this requires the `pl_stitch` package from the [GitHub repository](https://github.com/visurg-ai/PL-Stitch).
+```python
+import torch
+import numpy as np
+from PIL import Image
+from pl_stitch.build_model import build_model
+# Load the pre-trained pl_stitch model
+# Ensure you have the checkpoint file (e.g., pl_lemon.pth) locally
+pl_stitch = build_model(pretrained_weights = 'path/to/pl_lemon.pth')
+pl_stitch.eval()
+# Load the image and convert it to a PyTorch tensor
+img_path = 'path/to/your/image.jpg'
+img = Image.open(img_path).convert('RGB')
+img = img.resize((224, 224))
+img_tensor = torch.tensor(np.array(img)).permute(2, 0, 1).float().unsqueeze(0).to('cuda')
+# Extract features from the image
+with torch.no_grad():
+    outputs = pl_stitch(img_tensor)
+```
+## Citation
 If you use our model or code in your research, please cite our paper:
+```bibtex
 @misc{che2025stitchtimelearningprocedural,
       title={A Stitch in Time: Learning Procedural Workflow via Self-Supervised Plackett-Luce Ranking},
       author={Chengan Che and Chao Wang and Xinyue Chen and Sophia Tsoka and Luis C. Garcia-Peraza-Herrera},
       primaryClass={cs.CV},
       url={https://arxiv.org/abs/2511.17805},
 }
+```