Improve model card: Add pipeline tag, GitHub link, abstract, and move license

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +17 -4
README.md CHANGED
@@ -1,9 +1,22 @@
 
 
 
 
1
 
2
  # Selective Contrastive Learning for Weakly Supervised Affordance Grounding (ICCV 2025)
3
  WonJun Moon*</sup>, Hyun Seok Seong*</sup>, Jae-Pil Heo</sup> (*: equal contribution)
4
 
5
- [[Arxiv](https://arxiv.org/abs/2508.07877)]
6
 
7
- ---
8
- license: mit
9
- ---
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: object-detection
3
+ license: mit
4
+ ---
5
 
6
  # Selective Contrastive Learning for Weakly Supervised Affordance Grounding (ICCV 2025)
7
  WonJun Moon*</sup>, Hyun Seok Seong*</sup>, Jae-Pil Heo</sup> (*: equal contribution)
8
 
9
+ [[Arxiv](https://arxiv.org/abs/2508.07877)] [[Code](https://github.com/hynnsk/SelectiveCL)]
10
 
11
+ ## Abstract
12
+ Facilitating an entity's interaction with objects requires accurately identifying parts that afford specific actions. Weakly supervised affordance grounding (WSAG) seeks to imitate human learning from third-person demonstrations, where humans intuitively grasp functional parts without needing pixel-level annotations. To achieve this, grounding is typically learned using a shared classifier across images from different perspectives, along with distillation strategies incorporating part discovery process. However, since affordance-relevant parts are not always easily distinguishable, models primarily rely on classification, often focusing on common class-specific patterns that are unrelated to affordance. To address this limitation, we move beyond isolated part-level learning by introducing selective prototypical and pixel contrastive objectives that adaptively learn affordance-relevant cues at both the part and object levels, depending on the granularity of the available information. Initially, we find the action-associated objects in both egocentric (object-focused) and exocentric (third-person example) images by leveraging CLIP. Then, by cross-referencing the discovered objects of complementary views, we excavate the precise part-level affordance clues in each perspective. By consistently learning to distinguish affordance-relevant regions from affordance-irrelevant background context, our approach effectively shifts activation from irrelevant areas toward meaningful affordance cues. Experimental results demonstrate the effectiveness of our method.
13
+
14
+ ## Source Code
15
+ Code will be released soon.
16
+
17
+ ### Checkpoints
18
+ Dataset | Model file
19
+ -- | --
20
+ AGD20K-Seen | [checkpoint](https://drive.google.com/file/d/1cYC2PBEjhLntySyP51R46J7i8f1Cf1NT/view?usp=sharing)
21
+ AGD20K-Unseen | [checkpoint](https://drive.google.com/file/d/1YojVtXtl4gCiqDRDOpHn59vdIPSIIgdt/view?usp=sharing)
22
+ HICO-IIF | [checkpoint](https://drive.google.com/file/d/1fOIarlqETEpY7JrqUWjgzvHtwCzRfeGb/view?usp=sharing)