nielsr HF Staff commited on
Commit
1095d31
·
verified ·
1 Parent(s): 0c21bfa

Add pipeline tag and improve model card

Browse files

Hi! I'm Niels from the Hugging Face community team.

I've noticed this model repository is missing a `pipeline_tag` in its metadata. Adding `pipeline_tag: image-feature-extraction` will help users discover this model when filtering by task on the Hugging Face Hub.

I've also:
- Updated the GitHub link to the official organization repository.
- Refined the sample usage snippet based on your GitHub README.
- Cleaned up the Markdown to be more concise.

Feel free to merge if this looks good!

Files changed (1) hide show
  1. README.md +27 -51
README.md CHANGED
@@ -1,33 +1,50 @@
1
  ---
2
  license: mit
 
3
  ---
4
 
5
  **PL-Stitch**
6
  -------------
7
 
 
8
 
9
- [📚 Paper](https://www.arxiv.org/abs/2511.17805) - [🤖 GitHub](https://github.com/jaime-1998/PL-Stitch)
10
-
11
-
12
- This is the official repository for the **CVPR2026** paper [A Stitch in Time: Learning Procedural Workflow via Self-Supervised Plackett-Luce Ranking](https://www.arxiv.org/abs/2511.17805).
13
 
14
  *PL-Stitch* is an image foundation model that captures visual changes over time, enabling procedural activity understanding. It takes an image as input and produces a feature vector as output, leveraging the novel Plackett-Luce temporal ranking objective to build a comprehensive understanding of both the static semantic information and the procedural context within each frame.
15
 
 
16
 
 
17
 
18
- Star us if you like it!
19
-
20
-
21
 
22
- <img src="https://cdn-uploads.huggingface.co/production/uploads/67d9504a41d31cc626fcecc8/O0azUcMHjCyKYzM4vox98.png" />
 
 
 
 
23
 
 
 
 
 
24
 
 
 
 
 
 
25
 
 
 
 
 
26
 
 
27
 
28
  If you use our model or code in your research, please cite our paper:
29
 
30
- ```
31
  @misc{che2025stitchtimelearningprocedural,
32
  title={A Stitch in Time: Learning Procedural Workflow via Self-Supervised Plackett-Luce Ranking},
33
  author={Chengan Che and Chao Wang and Xinyue Chen and Sophia Tsoka and Luis C. Garcia-Peraza-Herrera},
@@ -37,45 +54,4 @@ If you use our model or code in your research, please cite our paper:
37
  primaryClass={cs.CV},
38
  url={https://arxiv.org/abs/2511.17805},
39
  }
40
- ```
41
-
42
- Abstract
43
- --------
44
- Procedural activities, ranging from routine cooking to complex surgical operations, are highly structured as a set of actions conducted in a specific temporal order. Despite their success on static images and short clips, current self-supervised learning methods often overlook the procedural nature that underpins such activities. We expose the lack of procedural awareness in current SSL methods with a motivating experiment: models pretrained on forward and time-reversed sequences produce highly similar features, confirming that their representations are blind to the underlying procedural order. To address this shortcoming, we propose PL-Stitch, a self-supervised framework that harnesses the inherent temporal order of video frames as a powerful supervisory signal. Our approach integrates two novel probabilistic objectives based on the Plackett-Luce (PL) model. The primary PL objective trains the model to sort sampled frames chronologically, compelling it to learn the global workflow progression. The secondary objective, a spatio-temporal jigsaw loss, complements the learning by capturing fine-grained, cross-frame object correlations. Our approach consistently achieves superior performance across five surgical and cooking benchmarks. Specifically, PL-Stitch yields significant gains in surgical phase recognition (e.g., +11.4 pp k-NN accuracy on Cholec80) and cooking action segmentation (e.g., +5.7 pp linear probing accuracy on Breakfast), demonstrating its effectiveness for procedural video representation learning.
45
-
46
- <br>
47
-
48
-
49
-
50
-
51
-
52
- 🚩 PL-Stitch model
53
- ------------------
54
-
55
- You can download the checkpoint at [🤗 PL-Stitch](https://huggingface.co/visurg/PL-Stitch) and run the following code to extract features from your video frames.
56
-
57
-
58
- ```python
59
- import torch
60
- from PIL import Image
61
- from build_model import build_model
62
-
63
- # Load the pre-trained pl_stitch model
64
- pl_stitch = build_model(pretrained_weights = 'your path to the model')
65
- pl_stitch.eval()
66
-
67
- # Load the image and convert it to a PyTorch tensor
68
- img_path = 'path/to/your/image.jpg'
69
- img = Image.open(img_path)
70
- img = img.resize((224, 224))
71
- img_tensor = torch.tensor(np.array(img)).unsqueeze(0).to('cuda')
72
-
73
- # Extract features from the image
74
- outputs = pl_stitch(img_tensor)
75
- ```
76
-
77
-
78
-
79
-
80
-
81
-
 
1
  ---
2
  license: mit
3
+ pipeline_tag: image-feature-extraction
4
  ---
5
 
6
  **PL-Stitch**
7
  -------------
8
 
9
+ [📚 Paper](https://arxiv.org/abs/2511.17805) - [🤖 GitHub](https://github.com/visurg-ai/PL-Stitch)
10
 
11
+ This is the official repository for the **CVPR 2026** paper [A Stitch in Time: Learning Procedural Workflow via Self-Supervised Plackett-Luce Ranking](https://arxiv.org/abs/2511.17805).
 
 
 
12
 
13
  *PL-Stitch* is an image foundation model that captures visual changes over time, enabling procedural activity understanding. It takes an image as input and produces a feature vector as output, leveraging the novel Plackett-Luce temporal ranking objective to build a comprehensive understanding of both the static semantic information and the procedural context within each frame.
14
 
15
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/67d9504a41d31cc626fcecc8/O0azUcMHjCyKYzM4vox98.png" />
16
 
17
+ ## Sample Usage
18
 
19
+ You can download the checkpoint and run the following code to extract features from your video frames. Note that this requires the `pl_stitch` package from the [GitHub repository](https://github.com/visurg-ai/PL-Stitch).
 
 
20
 
21
+ ```python
22
+ import torch
23
+ import numpy as np
24
+ from PIL import Image
25
+ from pl_stitch.build_model import build_model
26
 
27
+ # Load the pre-trained pl_stitch model
28
+ # Ensure you have the checkpoint file (e.g., pl_lemon.pth) locally
29
+ pl_stitch = build_model(pretrained_weights = 'path/to/pl_lemon.pth')
30
+ pl_stitch.eval()
31
 
32
+ # Load the image and convert it to a PyTorch tensor
33
+ img_path = 'path/to/your/image.jpg'
34
+ img = Image.open(img_path).convert('RGB')
35
+ img = img.resize((224, 224))
36
+ img_tensor = torch.tensor(np.array(img)).permute(2, 0, 1).float().unsqueeze(0).to('cuda')
37
 
38
+ # Extract features from the image
39
+ with torch.no_grad():
40
+ outputs = pl_stitch(img_tensor)
41
+ ```
42
 
43
+ ## Citation
44
 
45
  If you use our model or code in your research, please cite our paper:
46
 
47
+ ```bibtex
48
  @misc{che2025stitchtimelearningprocedural,
49
  title={A Stitch in Time: Learning Procedural Workflow via Self-Supervised Plackett-Luce Ranking},
50
  author={Chengan Che and Chao Wang and Xinyue Chen and Sophia Tsoka and Luis C. Garcia-Peraza-Herrera},
 
54
  primaryClass={cs.CV},
55
  url={https://arxiv.org/abs/2511.17805},
56
  }
57
+ ```