metchee commited on
Commit
8e7dafa
·
verified ·
1 Parent(s): 94c7f0a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +53 -8
README.md CHANGED
@@ -11,13 +11,50 @@ model-index:
11
  results: []
12
  ---
13
 
14
- # sticker-query-generator-zh
15
 
16
- This model is a fine-tuned version of [llava-hf/llava-1.5-7b-hf](https://huggingface.co/llava-hf/llava-1.5-7b-hf) on the [StickerQueries (ZH)](https://huggingface.co/datasets/metchee/sticker-queries) dataset.
 
17
 
18
- ## Model description
19
 
20
- The **Sticker Query Generator** is a vision-language model that generates culturally and emotionally resonant search queries given a sticker image. These queries are typically used in chat apps to retrieve and recommend stickers during conversations.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
 
22
  ## Training procedure
23
 
@@ -38,13 +75,21 @@ The following hyperparameters were used during training:
38
  - num_epochs: 4.0
39
 
40
  ### Training results
41
-
42
-
43
-
44
  ### Framework versions
45
 
46
  - PEFT 0.15.2
47
  - Transformers 4.52.1
48
  - Pytorch 2.7.0+cu126
49
  - Datasets 3.6.0
50
- - Tokenizers 0.21.1
 
 
 
 
 
 
 
 
 
 
 
 
11
  results: []
12
  ---
13
 
14
+ Sticker Query Generator (中文)
15
 
16
+ The **Sticker Query Generator** is a vision-language model that generates culturally and emotionally resonant search queries given a sticker image. These queries are typically used in chat apps to retrieve and recommend stickers during conversations.
17
+ For English, check out [Sticker Query Generator (Engish)](https://huggingface.co/metchee/sticker-query-generator-en).
18
 
19
+ ## 🧠 What It Does
20
 
21
+ Given a sticker image (e.g., a cartoon character shrugging, laughing, or making a gesture), the model outputs **search queries** that people might use to find or express the intent behind that sticker—such as:
22
+ - "whatever"
23
+ - "ugh not again"
24
+ - "mood"
25
+ - "shrug emoji"
26
+
27
+ It captures subtle **social**, **emotional**, and **contextual** cues—something that traditional vision-language models often fail to represent due to lack of cultural grounding.
28
+
29
+ ## 🔍 Use Cases
30
+
31
+ - Improving sticker search and retrieval in chat apps
32
+ - Enhancing semantic understanding in multimodal recommendation systems
33
+ - Cultural and emotional alignment in vision-language modeling
34
+ - Dataset pre-labeling or enrichment
35
+
36
+ ## 🗂 Dataset
37
+
38
+ This model was trained on [StickerQueries](https://huggingface.co/datasets/metchee/sticker-queries), a multilingual dataset of over **60 hours** of human-annotated sticker-query pairs in **English** and **Chinese**. Each annotation was reviewed by at least **two people** to ensure quality and consistency.
39
+
40
+ ## 🚀 Inference
41
+
42
+ ```python
43
+ from transformers import AutoProcessor, AutoModelForVision2Seq
44
+ from PIL import Image
45
+ import requests
46
+
47
+ # Load model
48
+ processor = AutoProcessor.from_pretrained("metchee/sticker-query-generator-en")
49
+ model = AutoModelForVision2Seq.from_pretrained("metchee/sticker-query-generator-en")
50
+
51
+ # Run inference
52
+ image = Image.open("sticker.png")
53
+ inputs = processor(images=image, return_tensors="pt")
54
+ output = model.generate(**inputs)
55
+ query = processor.decode(output[0], skip_special_tokens=True)
56
+ print(query)
57
+ ```
58
 
59
  ## Training procedure
60
 
 
75
  - num_epochs: 4.0
76
 
77
  ### Training results
 
 
 
78
  ### Framework versions
79
 
80
  - PEFT 0.15.2
81
  - Transformers 4.52.1
82
  - Pytorch 2.7.0+cu126
83
  - Datasets 3.6.0
84
+ - Tokenizers 0.21.1
85
+
86
+ ### Citations
87
+ ```
88
+ @misc{huggingface-sticker-queries,
89
+ author = {Heng Er Metilda Chee, et al.},
90
+ title = {Small Stickers, Big Meanings: A Multilingual Sticker Semantic Understanding Dataset with a Gamified Approach},
91
+ year = {2025},
92
+ publisher = {Hugging Face},
93
+ howpublished = {\url{https://huggingface.co/datasets/metchee/sticker-queries}},
94
+ }
95
+ ```