AtefAndrus commited on
Commit
433e8bc
ยท
unverified ยท
0 Parent(s):

Initial commit: Jmoji human evaluation app

Browse files

- Gradio-based evaluation interface for emoji translation
- 20 samples with Gold/ModelA/ModelB comparison
- CommitScheduler integration for HF Spaces

Files changed (10) hide show
  1. .gitignore +10 -0
  2. .mise.toml +2 -0
  3. .python-version +1 -0
  4. README.md +37 -0
  5. app.py +501 -0
  6. data/samples.jsonl +20 -0
  7. pyproject.toml +15 -0
  8. requirements.txt +2 -0
  9. responses/.gitkeep +0 -0
  10. uv.lock +0 -0
.gitignore ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ # Local development
2
+ .venv/
3
+ *.egg-info/
4
+
5
+ # Responses (managed by CommitScheduler on HF Spaces)
6
+ responses/*.jsonl
7
+
8
+ # Python
9
+ __pycache__/
10
+ *.pyc
.mise.toml ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ [tools]
2
+ python = "3.12"
.python-version ADDED
@@ -0,0 +1 @@
 
 
1
+ 3.12
README.md ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Jmoji Human Evaluation
3
+ emoji: ๐Ÿ“Š
4
+ colorFrom: blue
5
+ colorTo: purple
6
+ sdk: gradio
7
+ sdk_version: "4.44.0"
8
+ app_file: app.py
9
+ pinned: false
10
+ ---
11
+
12
+ # Jmoji ไบบๆ‰‹่ฉ•ไพกใ‚ทใ‚นใƒ†ใƒ 
13
+
14
+ ๆ—ฅๆœฌ่ชžใƒ†ใ‚ญใ‚นใƒˆโ†’็ตตๆ–‡ๅญ—็ฟป่จณใƒขใƒ‡ใƒซใฎไบบๆ‰‹่ฉ•ไพกใ‚’่กŒใ†ใ‚ขใƒ—ใƒชใ‚ฑใƒผใ‚ทใƒงใƒณใ€‚
15
+
16
+ ## ๆฆ‚่ฆ
17
+
18
+ ใ“ใฎใ‚ขใƒ—ใƒชใ‚ฑใƒผใ‚ทใƒงใƒณใงใฏใ€ๆ—ฅๆœฌ่ชžใƒ†ใ‚ญใ‚นใƒˆใซๅฏพใ™ใ‚‹็ตตๆ–‡ๅญ—็ฟป่จณใฎๅ“่ณชใ‚’่ฉ•ไพกใ—ใพใ™ใ€‚
19
+
20
+ - **่ฉ•ไพกๅฏพ่ฑก**: ๆ•™ๅธซๅ‡บๅŠ›๏ผˆGold๏ผ‰ใ€ใƒขใƒ‡ใƒซA๏ผˆfocal_top50๏ผ‰ใ€ใƒขใƒ‡ใƒซB๏ผˆtop50๏ผ‰
21
+ - **่ฉ•ไพก้ …็›ฎ**:
22
+ - ๆ„ๅ‘ณ็š„ไธ€่‡ดๅบฆ๏ผˆ0-4๏ผ‰
23
+ - ่‡ช็„ถใ•๏ผˆ0-4๏ผ‰
24
+ - ่ชค่งฃใฎๅฏ่ƒฝๆ€ง๏ผˆYes/No๏ผ‰
25
+ - ใƒขใƒ‡ใƒซๆฏ”่ผƒ๏ผˆA/B/ๅŒ็ญ‰๏ผ‰
26
+
27
+ ## ไฝฟใ„ๆ–น
28
+
29
+ 1. ่กจ็คบใ•ใ‚Œใ‚‹ใƒ†ใ‚ญใ‚นใƒˆใจ็ตตๆ–‡ๅญ—ๅ‡บๅŠ›ใ‚’็ขบ่ช
30
+ 2. ๅ„ๅ‡บๅŠ›ใซใคใ„ใฆ่ฉ•ไพก้ …็›ฎใ‚’้ธๆŠž
31
+ 3. ใ€Œๆฌกใธใ€ใƒœใ‚ฟใƒณใงๆฌกใฎใ‚ตใƒณใƒ—ใƒซใธ็งปๅ‹•
32
+ 4. ๆœ€ๅพŒใฎใ‚ตใƒณใƒ—ใƒซใงใ€Œ่ฉ•ไพกใ‚’้€ไฟกใ€ใƒœใ‚ฟใƒณใ‚’ใ‚ฏใƒชใƒƒใ‚ฏ
33
+
34
+ ## ้–ข้€ฃใƒชใƒณใ‚ฏ
35
+
36
+ - [Jmojiใƒ—ใƒญใ‚ธใ‚งใ‚ฏใƒˆ](https://github.com/AtefAndrus/Jmoji)
37
+ - [ใƒ‡ใƒผใ‚ฟใ‚ปใƒƒใƒˆ](https://huggingface.co/datasets/AtefAndrus/jmoji-dataset)
app.py ADDED
@@ -0,0 +1,501 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Jmoji ไบบๆ‰‹่ฉ•ไพกใ‚ขใƒ—ใƒช.
2
+
3
+ ๆ—ฅๆœฌ่ชžใƒ†ใ‚ญใ‚นใƒˆโ†’็ตตๆ–‡ๅญ—็ฟป่จณใƒขใƒ‡ใƒซใฎไบบๆ‰‹่ฉ•ไพกใ‚’่กŒใ†Gradioใ‚ขใƒ—ใƒชใ‚ฑใƒผใ‚ทใƒงใƒณใ€‚
4
+
5
+ ่ฉ•ไพก้ …็›ฎ:
6
+ - ๆ„ๅ‘ณ็š„ไธ€่‡ดๅบฆ๏ผˆ0-4๏ผ‰: ใƒ†ใ‚ญใ‚นใƒˆใฎๆ„ๅ‘ณใ‚’็ตตๆ–‡ๅญ—ใŒ่กจ็พใ—ใฆใ„ใ‚‹ใ‹
7
+ - ่‡ช็„ถใ•๏ผˆ0-4๏ผ‰: SNSใง่ฆ‹ใ‹ใ‘ใใ†ใ‹
8
+ - ่ชค่งฃใฎๅฏ่ƒฝๆ€ง๏ผˆYes/No๏ผ‰: ๅ…ƒใฎๆ–‡ใจ้€†ใฎๅฐ่ฑกใ‚’ไธŽใˆใ‚‹ใ‹
9
+ - ใƒขใƒ‡ใƒซๆฏ”่ผƒ๏ผˆA/B/ๅŒ็ญ‰๏ผ‰: ใฉใกใ‚‰ใฎใƒขใƒ‡ใƒซๅ‡บๅŠ›ใŒ่‰ฏใ„ใ‹
10
+ """
11
+
12
+ import json
13
+ import uuid
14
+ from datetime import datetime
15
+ from pathlib import Path
16
+
17
+ import gradio as gr
18
+ from huggingface_hub import CommitScheduler
19
+
20
+ # ====== ๅฎšๆ•ฐ ======
21
+ SAMPLES_PATH = Path("data/samples.jsonl")
22
+ RESPONSES_DIR = Path("responses")
23
+ RESPONSES_DIR.mkdir(exist_ok=True)
24
+
25
+ # ====== CommitScheduler่จญๅฎš ======
26
+ # HuggingFace Spaceใซใƒ‡ใƒ—ใƒญใ‚คๆ™‚ใฎใฟๆœ‰ๅŠน
27
+ try:
28
+ scheduler = CommitScheduler(
29
+ repo_id="AtefAndrus/jmoji-human-eval",
30
+ repo_type="space",
31
+ folder_path=RESPONSES_DIR,
32
+ path_in_repo="responses",
33
+ every=5, # 5ๅˆ†ใ”ใจใซใ‚ณใƒŸใƒƒใƒˆ
34
+ )
35
+ USE_SCHEDULER = True
36
+ except Exception:
37
+ # ใƒญใƒผใ‚ซใƒซๅฎŸ่กŒๆ™‚ใฏใ‚นใ‚ฑใ‚ธใƒฅใƒผใƒฉใชใ—
38
+ scheduler = None
39
+ USE_SCHEDULER = False
40
+
41
+
42
+ # ====== ใ‚ตใƒณใƒ—ใƒซ่ชญใฟ่พผใฟ ======
43
+ def load_samples() -> list[dict]:
44
+ """่ฉ•ไพกใ‚ตใƒณใƒ—ใƒซใ‚’่ชญใฟ่พผใ‚€."""
45
+ samples = []
46
+ if SAMPLES_PATH.exists():
47
+ with open(SAMPLES_PATH, encoding="utf-8") as f:
48
+ for line in f:
49
+ if line.strip():
50
+ samples.append(json.loads(line))
51
+ return samples
52
+
53
+
54
+ SAMPLES = load_samples()
55
+ TOTAL_SAMPLES = len(SAMPLES) if SAMPLES else 1
56
+
57
+
58
+ # ====== ใƒ˜ใƒซใƒ‘ใƒผ้–ขๆ•ฐ ======
59
+ def get_evaluator_id() -> str:
60
+ """่ฉ•ไพก่€…IDใ‚’็”Ÿๆˆ๏ผˆใ‚ปใƒƒใ‚ทใƒงใƒณใ”ใจใซใƒฆใƒ‹ใƒผใ‚ฏ๏ผ‰."""
61
+ return f"anon_{uuid.uuid4().hex[:8]}"
62
+
63
+
64
+ def save_response(evaluator_id: str, responses: dict) -> None:
65
+ """่ฉ•ไพก็ตๆžœใ‚’ใƒ•ใ‚กใ‚คใƒซใซไฟๅญ˜."""
66
+ timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
67
+ filename = f"responses_{evaluator_id}_{timestamp}.jsonl"
68
+ filepath = RESPONSES_DIR / filename
69
+
70
+ if USE_SCHEDULER and scheduler:
71
+ with scheduler.lock:
72
+ with open(filepath, "w", encoding="utf-8") as f:
73
+ for sample_id, response in responses.items():
74
+ record = {
75
+ "evaluator_id": evaluator_id,
76
+ "sample_id": sample_id,
77
+ **response,
78
+ "timestamp": timestamp,
79
+ }
80
+ f.write(json.dumps(record, ensure_ascii=False) + "\n")
81
+ else:
82
+ with open(filepath, "w", encoding="utf-8") as f:
83
+ for sample_id, response in responses.items():
84
+ record = {
85
+ "evaluator_id": evaluator_id,
86
+ "sample_id": sample_id,
87
+ **response,
88
+ "timestamp": timestamp,
89
+ }
90
+ f.write(json.dumps(record, ensure_ascii=False) + "\n")
91
+
92
+
93
+ # ====== UIๆง‹็ฏ‰ ======
94
+ with gr.Blocks(
95
+ title="Jmoji ไบบๆ‰‹่ฉ•ไพก",
96
+ theme=gr.themes.Soft(),
97
+ css="""
98
+ .emoji-display { font-size: 1.5em; }
99
+ .section-header { margin-top: 1em; margin-bottom: 0.5em; }
100
+ """,
101
+ ) as demo:
102
+ # ใ‚ปใƒƒใ‚ทใƒงใƒณ็Šถๆ…‹
103
+ state = gr.State(
104
+ {
105
+ "evaluator_id": None,
106
+ "current_idx": 0,
107
+ "responses": {},
108
+ }
109
+ )
110
+
111
+ # ใƒ˜ใƒƒใƒ€ใƒผ
112
+ gr.Markdown("# Jmoji ไบบๆ‰‹่ฉ•ไพกใ‚ทใ‚นใƒ†ใƒ ")
113
+ gr.Markdown(
114
+ "ๆ—ฅๆœฌ่ชžใƒ†ใ‚ญใ‚นใƒˆโ†’็ตตๆ–‡ๅญ—็ฟป่จณใƒขใƒ‡ใƒซใฎ่ฉ•ไพกใซใ”ๅ”ๅŠ›ใใ ใ•ใ„ใ€‚"
115
+ "ๅ„ใ‚ตใƒณใƒ—ใƒซใซใคใ„ใฆใ€ๆ•™ๅธซๅ‡บๅŠ›๏ผˆGold๏ผ‰ใจ2ใคใฎใƒขใƒ‡ใƒซๅ‡บๅŠ›ใ‚’่ฉ•ไพกใ—ใฆใใ ใ•ใ„ใ€‚"
116
+ )
117
+
118
+ # ้€ฒๆ—่กจ็คบ
119
+ progress = gr.Markdown(f"**้€ฒๆ—: 1 / {TOTAL_SAMPLES}**")
120
+
121
+ # ๅ…ฅๅŠ›ๆ–‡่กจ็คบ
122
+ gr.Markdown("## ๅ…ฅๅŠ›ๆ–‡", elem_classes=["section-header"])
123
+ input_text = gr.Textbox(
124
+ label="่ฉ•ไพกๅฏพ่ฑกใฎใƒ†ใ‚ญใ‚นใƒˆ",
125
+ interactive=False,
126
+ lines=2,
127
+ )
128
+
129
+ # ๅ‡บๅŠ›่กจ็คบ๏ผˆ3ๅˆ—๏ผ‰
130
+ gr.Markdown("## ๅ‡บๅŠ›ใฎๆฏ”่ผƒ", elem_classes=["section-header"])
131
+ with gr.Row():
132
+ with gr.Column():
133
+ gr.Markdown("### ๆ•™ๅธซๅ‡บๅŠ›๏ผˆGold๏ผ‰")
134
+ gold_output = gr.Textbox(
135
+ label="Gold",
136
+ interactive=False,
137
+ elem_classes=["emoji-display"],
138
+ )
139
+ with gr.Column():
140
+ gr.Markdown("### ใƒขใƒ‡ใƒซA๏ผˆfocal_top50๏ผ‰")
141
+ model_a_output = gr.Textbox(
142
+ label="Model A",
143
+ interactive=False,
144
+ elem_classes=["emoji-display"],
145
+ )
146
+ with gr.Column():
147
+ gr.Markdown("### ใƒขใƒ‡ใƒซB๏ผˆtop50๏ผ‰")
148
+ model_b_output = gr.Textbox(
149
+ label="Model B",
150
+ interactive=False,
151
+ elem_classes=["emoji-display"],
152
+ )
153
+
154
+ # ่ฉ•ไพกใ‚ปใ‚ฏใ‚ทใƒงใƒณ๏ผˆ3ๅˆ—๏ผ‰
155
+ gr.Markdown("## ่ฉ•ไพก", elem_classes=["section-header"])
156
+ gr.Markdown(
157
+ "ๅ„ๅ‡บๅŠ›ใซใคใ„ใฆไปฅไธ‹ใ‚’่ฉ•ไพกใ—ใฆใใ ใ•ใ„๏ผš"
158
+ "\n- **ๆ„ๅ‘ณ็š„ไธ€่‡ดๅบฆ**: ๅ…ฅๅŠ›ๆ–‡ใฎๆ„ๅ‘ณใƒปใƒ‹ใƒฅใ‚ขใƒณใ‚นใ‚’็ตตๆ–‡ๅญ—ใŒ่กจ็พใ—ใฆใ„ใ‚‹ใ‹"
159
+ "\n- **่‡ช็„ถใ•**: ๅฎŸ้š›ใฎSNSใง่ฆ‹ใ‹ใ‘ใใ†ใชไฝฟใ„ๆ–นใ‹"
160
+ "\n- **่ชค่งฃใฎๅฏ่ƒฝๆ€ง**: ๅ…ƒใฎๆ–‡ใฎๆ„ๅ›ณใจ้€†ใฎๅฐ่ฑกใ‚’ไธŽใˆใใ†ใ‹"
161
+ )
162
+
163
+ with gr.Row():
164
+ # Gold่ฉ•ไพก
165
+ with gr.Column():
166
+ gr.Markdown("#### Gold่ฉ•ไพก")
167
+ gold_semantic = gr.Radio(
168
+ choices=[0, 1, 2, 3, 4],
169
+ label="ๆ„ๅ‘ณ็š„ไธ€่‡ดๅบฆ๏ผˆ0:้–ขไฟ‚ใชใ— โ†’ 4:้žๅธธใซๅฆฅๅฝ“๏ผ‰",
170
+ value=None,
171
+ )
172
+ gold_naturalness = gr.Radio(
173
+ choices=[0, 1, 2, 3, 4],
174
+ label="่‡ช็„ถใ•๏ผˆ0:ไธ่‡ช็„ถ โ†’ 4:้žๅธธใซ่‡ช็„ถ๏ผ‰",
175
+ value=None,
176
+ )
177
+ gold_misleading = gr.Radio(
178
+ choices=["No", "Yes"],
179
+ label="่ชค่งฃใ‚’ๆ‹›ใๅฏ่ƒฝๆ€ง",
180
+ value=None,
181
+ )
182
+
183
+ # Model A่ฉ•ไพก
184
+ with gr.Column():
185
+ gr.Markdown("#### ใƒขใƒ‡ใƒซA่ฉ•ไพก")
186
+ model_a_semantic = gr.Radio(
187
+ choices=[0, 1, 2, 3, 4],
188
+ label="ๆ„ๅ‘ณ็š„ไธ€่‡ดๅบฆ๏ผˆ0:้–ขไฟ‚ใชใ— โ†’ 4:้žๅธธใซๅฆฅๅฝ“๏ผ‰",
189
+ value=None,
190
+ )
191
+ model_a_naturalness = gr.Radio(
192
+ choices=[0, 1, 2, 3, 4],
193
+ label="่‡ช็„ถใ•๏ผˆ0:ไธ่‡ช็„ถ โ†’ 4:้žๅธธใซ่‡ช็„ถ๏ผ‰",
194
+ value=None,
195
+ )
196
+ model_a_misleading = gr.Radio(
197
+ choices=["No", "Yes"],
198
+ label="่ชค่งฃใ‚’ๆ‹›ใๅฏ่ƒฝๆ€ง",
199
+ value=None,
200
+ )
201
+
202
+ # Model B่ฉ•ไพก
203
+ with gr.Column():
204
+ gr.Markdown("#### ใƒขใƒ‡ใƒซB่ฉ•ไพก")
205
+ model_b_semantic = gr.Radio(
206
+ choices=[0, 1, 2, 3, 4],
207
+ label="ๆ„ๅ‘ณ็š„ไธ€่‡ดๅบฆ๏ผˆ0:้–ขไฟ‚ใชใ— โ†’ 4:้žๅธธใซๅฆฅๅฝ“๏ผ‰",
208
+ value=None,
209
+ )
210
+ model_b_naturalness = gr.Radio(
211
+ choices=[0, 1, 2, 3, 4],
212
+ label="่‡ช็„ถใ•๏ผˆ0:ไธ่‡ช็„ถ โ†’ 4:้žๅธธใซ่‡ช็„ถ๏ผ‰",
213
+ value=None,
214
+ )
215
+ model_b_misleading = gr.Radio(
216
+ choices=["No", "Yes"],
217
+ label="่ชค่งฃใ‚’ๆ‹›ใๅฏ่ƒฝๆ€ง",
218
+ value=None,
219
+ )
220
+
221
+ # ใƒขใƒ‡ใƒซๆฏ”่ผƒ
222
+ gr.Markdown("## ใƒขใƒ‡ใƒซๆฏ”่ผƒ", elem_classes=["section-header"])
223
+ preference = gr.Radio(
224
+ choices=["A๏ผˆfocal_top50๏ผ‰", "B๏ผˆtop50๏ผ‰", "ๅŒ็ญ‰"],
225
+ label="ใฉใกใ‚‰ใฎใƒขใƒ‡ใƒซๅ‡บๅŠ›ใŒ่‰ฏใ„ใงใ™ใ‹๏ผŸ",
226
+ value=None,
227
+ )
228
+ comment = gr.Textbox(
229
+ label="ใ‚ณใƒกใƒณใƒˆ๏ผˆไปปๆ„๏ผ‰",
230
+ placeholder="ๆฐ—ใฅใ„ใŸ็‚นใŒใ‚ใ‚Œใฐใ”่จ˜ๅ…ฅใใ ใ•ใ„",
231
+ lines=2,
232
+ )
233
+
234
+ # ใƒŠใƒ“ใ‚ฒใƒผใ‚ทใƒงใƒณ
235
+ with gr.Row():
236
+ prev_btn = gr.Button("โ—€ ๅ‰ใธ", variant="secondary")
237
+ next_btn = gr.Button("ๆฌกใธ โ–ถ", variant="primary")
238
+
239
+ # ้€ไฟกใƒœใ‚ฟใƒณ
240
+ submit_btn = gr.Button("่ฉ•ไพกใ‚’้€ไฟก", variant="primary", visible=False)
241
+
242
+ # ใ‚นใƒ†ใƒผใ‚ฟใ‚น่กจ็คบ
243
+ status_msg = gr.Markdown("")
244
+
245
+ # ====== ใ‚คใƒ™ใƒณใƒˆใƒใƒณใƒ‰ใƒฉ ======
246
+ def init_session(state_dict: dict) -> tuple:
247
+ """ใ‚ปใƒƒใ‚ทใƒงใƒณๅˆๆœŸๅŒ–."""
248
+ if state_dict["evaluator_id"] is None:
249
+ state_dict["evaluator_id"] = get_evaluator_id()
250
+
251
+ if not SAMPLES:
252
+ return (
253
+ state_dict,
254
+ "**ใ‚จใƒฉใƒผ: ใ‚ตใƒณใƒ—ใƒซใŒ่ฆ‹ใคใ‹ใ‚Šใพใ›ใ‚“**",
255
+ "ใ‚ตใƒณใƒ—ใƒซใŒใ‚ใ‚Šใพใ›ใ‚“",
256
+ "-",
257
+ "-",
258
+ "-",
259
+ gr.update(visible=False),
260
+ "",
261
+ )
262
+
263
+ sample = SAMPLES[0]
264
+ return (
265
+ state_dict,
266
+ f"**้€ฒๆ—: 1 / {TOTAL_SAMPLES}**",
267
+ sample["text"],
268
+ sample["gold"],
269
+ sample.get("pred_focal_top50", "-"),
270
+ sample.get("pred_top50", "-"),
271
+ gr.update(visible=(TOTAL_SAMPLES == 1)),
272
+ "",
273
+ )
274
+
275
+ def collect_current_response(
276
+ gold_sem,
277
+ gold_nat,
278
+ gold_mis,
279
+ model_a_sem,
280
+ model_a_nat,
281
+ model_a_mis,
282
+ model_b_sem,
283
+ model_b_nat,
284
+ model_b_mis,
285
+ pref,
286
+ cmt,
287
+ ) -> dict:
288
+ """็พๅœจใฎ่ฉ•ไพกใ‚’่พžๆ›ธใซใพใจใ‚ใ‚‹."""
289
+ return {
290
+ "gold": {
291
+ "semantic": gold_sem,
292
+ "naturalness": gold_nat,
293
+ "misleading": gold_mis == "Yes" if gold_mis else None,
294
+ },
295
+ "model_a": {
296
+ "semantic": model_a_sem,
297
+ "naturalness": model_a_nat,
298
+ "misleading": model_a_mis == "Yes" if model_a_mis else None,
299
+ },
300
+ "model_b": {
301
+ "semantic": model_b_sem,
302
+ "naturalness": model_b_nat,
303
+ "misleading": model_b_mis == "Yes" if model_b_mis else None,
304
+ },
305
+ "preference": pref,
306
+ "comment": cmt,
307
+ }
308
+
309
+ def restore_response(response: dict) -> tuple:
310
+ """ไฟๅญ˜ๆธˆใฟใฎ่ฉ•ไพกใ‚’ๅพฉๅ…ƒ."""
311
+ gold = response.get("gold", {})
312
+ model_a = response.get("model_a", {})
313
+ model_b = response.get("model_b", {})
314
+
315
+ def to_misleading_str(val):
316
+ if val is True:
317
+ return "Yes"
318
+ elif val is False:
319
+ return "No"
320
+ return None
321
+
322
+ return (
323
+ gold.get("semantic"),
324
+ gold.get("naturalness"),
325
+ to_misleading_str(gold.get("misleading")),
326
+ model_a.get("semantic"),
327
+ model_a.get("naturalness"),
328
+ to_misleading_str(model_a.get("misleading")),
329
+ model_b.get("semantic"),
330
+ model_b.get("naturalness"),
331
+ to_misleading_str(model_b.get("misleading")),
332
+ response.get("preference"),
333
+ response.get("comment", ""),
334
+ )
335
+
336
+ def navigate(
337
+ direction: int,
338
+ state_dict: dict,
339
+ gold_sem,
340
+ gold_nat,
341
+ gold_mis,
342
+ model_a_sem,
343
+ model_a_nat,
344
+ model_a_mis,
345
+ model_b_sem,
346
+ model_b_nat,
347
+ model_b_mis,
348
+ pref,
349
+ cmt,
350
+ ) -> tuple:
351
+ """ใƒšใƒผใ‚ธ็งปๅ‹•๏ผˆๅ‰ใธ/ๆฌกใธ๏ผ‰."""
352
+ current_idx = state_dict["current_idx"]
353
+
354
+ # ็พๅœจใฎ่ฉ•ไพกใ‚’ไฟๅญ˜
355
+ sample_id = SAMPLES[current_idx]["id"]
356
+ state_dict["responses"][sample_id] = collect_current_response(
357
+ gold_sem,
358
+ gold_nat,
359
+ gold_mis,
360
+ model_a_sem,
361
+ model_a_nat,
362
+ model_a_mis,
363
+ model_b_sem,
364
+ model_b_nat,
365
+ model_b_mis,
366
+ pref,
367
+ cmt,
368
+ )
369
+
370
+ # ใ‚คใƒณใƒ‡ใƒƒใ‚ฏใ‚นๆ›ดๆ–ฐ
371
+ new_idx = max(0, min(current_idx + direction, TOTAL_SAMPLES - 1))
372
+ state_dict["current_idx"] = new_idx
373
+
374
+ # ๆ–ฐใ—ใ„ใ‚ตใƒณใƒ—ใƒซ
375
+ sample = SAMPLES[new_idx]
376
+ new_sample_id = sample["id"]
377
+
378
+ # ๆ—ขๅญ˜ใฎ่ฉ•ไพกใ‚’ๅพฉๅ…ƒ
379
+ existing = state_dict["responses"].get(new_sample_id, {})
380
+ restored = restore_response(existing)
381
+
382
+ is_last = new_idx == TOTAL_SAMPLES - 1
383
+
384
+ return (
385
+ state_dict,
386
+ f"**้€ฒๆ—: {new_idx + 1} / {TOTAL_SAMPLES}**",
387
+ sample["text"],
388
+ sample["gold"],
389
+ sample.get("pred_focal_top50", "-"),
390
+ sample.get("pred_top50", "-"),
391
+ *restored,
392
+ gr.update(visible=is_last),
393
+ "",
394
+ )
395
+
396
+ def submit_all(
397
+ state_dict: dict,
398
+ gold_sem,
399
+ gold_nat,
400
+ gold_mis,
401
+ model_a_sem,
402
+ model_a_nat,
403
+ model_a_mis,
404
+ model_b_sem,
405
+ model_b_nat,
406
+ model_b_mis,
407
+ pref,
408
+ cmt,
409
+ ) -> str:
410
+ """ๅ…จ่ฉ•ไพกใ‚’้€ไฟก."""
411
+ current_idx = state_dict["current_idx"]
412
+
413
+ # ๆœ€ๅพŒใฎใ‚ตใƒณใƒ—ใƒซใ‚’ไฟๅญ˜
414
+ sample_id = SAMPLES[current_idx]["id"]
415
+ state_dict["responses"][sample_id] = collect_current_response(
416
+ gold_sem,
417
+ gold_nat,
418
+ gold_mis,
419
+ model_a_sem,
420
+ model_a_nat,
421
+ model_a_mis,
422
+ model_b_sem,
423
+ model_b_nat,
424
+ model_b_mis,
425
+ pref,
426
+ cmt,
427
+ )
428
+
429
+ # ใƒ•ใ‚กใ‚คใƒซใซไฟๅญ˜
430
+ evaluator_id = state_dict["evaluator_id"]
431
+ save_response(evaluator_id, state_dict["responses"])
432
+
433
+ return (
434
+ f"## ่ฉ•ไพกใŒๅฎŒไบ†ใ—ใพใ—ใŸ\n\n"
435
+ f"ใ”ๅ”ๅŠ›ใ‚ใ‚ŠใŒใจใ†ใ”ใ–ใ„ใพใ—ใŸใ€‚\n\n"
436
+ f"- ่ฉ•ไพก่€…ID: `{evaluator_id}`\n"
437
+ f"- ่ฉ•ไพกใ‚ตใƒณใƒ—ใƒซๆ•ฐ: {len(state_dict['responses'])}ไปถ"
438
+ )
439
+
440
+ # ใ‚คใƒ™ใƒณใƒˆๆŽฅ็ถš
441
+ demo.load(
442
+ init_session,
443
+ inputs=[state],
444
+ outputs=[
445
+ state,
446
+ progress,
447
+ input_text,
448
+ gold_output,
449
+ model_a_output,
450
+ model_b_output,
451
+ submit_btn,
452
+ status_msg,
453
+ ],
454
+ )
455
+
456
+ eval_inputs = [
457
+ gold_semantic,
458
+ gold_naturalness,
459
+ gold_misleading,
460
+ model_a_semantic,
461
+ model_a_naturalness,
462
+ model_a_misleading,
463
+ model_b_semantic,
464
+ model_b_naturalness,
465
+ model_b_misleading,
466
+ preference,
467
+ comment,
468
+ ]
469
+
470
+ all_outputs = [
471
+ state,
472
+ progress,
473
+ input_text,
474
+ gold_output,
475
+ model_a_output,
476
+ model_b_output,
477
+ *eval_inputs,
478
+ submit_btn,
479
+ status_msg,
480
+ ]
481
+
482
+ prev_btn.click(
483
+ lambda *args: navigate(-1, *args),
484
+ inputs=[state, *eval_inputs],
485
+ outputs=all_outputs,
486
+ )
487
+
488
+ next_btn.click(
489
+ lambda *args: navigate(1, *args),
490
+ inputs=[state, *eval_inputs],
491
+ outputs=all_outputs,
492
+ )
493
+
494
+ submit_btn.click(
495
+ submit_all,
496
+ inputs=[state, *eval_inputs],
497
+ outputs=[status_msg],
498
+ )
499
+
500
+ if __name__ == "__main__":
501
+ demo.launch()
data/samples.jsonl ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {"id": 1, "text": "ๅคง็ดซใฃใฆใฎใฏๅฃฌ็”ณใฎๅŠŸ่‡ฃใฎไธญใงใ‚‚ใ‘ใฃใ“ใ†ไธŠใฎใ‚ฏใƒฉใ‚นใชใ‚“ใ ใ‘ใฉใ€ใ€Žๆ›ธ็ด€ใ€ใฎๅฃฌ็”ณใฎไนฑใฎใจใ“่ฆ‹ใฆใ‚‚ๆ˜Ÿๅท้บปๅ‘‚ใฎๅๅ‰ๅ‡บใฆใ“ใชใ„ใ‹ใ‚‰ใ€็ตๅฑ€ใฉใ‚“ใชๆดป่บใ—ใŸใฎใ‹ใฏใ‚ˆใใ‚ใ‹ใ‚“ใชใ„ใ‚“ใ ใ‚ˆใชใ€œใ€‚", "gold": "๐Ÿค” ๐Ÿ“š ๐Ÿ“– ๐Ÿ‡ฏ๐Ÿ‡ต ๐Ÿ”", "pred_focal_top50": "๐Ÿ“– ๐Ÿ˜Š ๐Ÿ‡ฏ๐Ÿ‡ต ๐Ÿ“š ๐Ÿ‘", "pred_top50": "๐Ÿ“– ๐Ÿ“š ๐Ÿ“š ๐Ÿ‘ ๐Ÿ“š", "jaccard_focal_top50": 0.42857142857142855, "jaccard_top50": 0.3333333333333333}
2
+ {"id": 2, "text": "ไธ‰ๆœจๆ”น้€ ๅ†…้–ฃใฎๆ™‚ใซใ€่‡ชๆฐ‘ๅ…šใฎๅ…šไธ‰ๅฝน๏ผˆๅนนไบ‹้•ทใ€ๆ”ฟ่ชฟไผš้•ทใ€็ทๅ‹™ไผš้•ท๏ผ‰ใฃใฆใ€ไธปๆตๆดพใ˜ใ‚ƒใชใ„ใ€Œไธ‰ๆœจใŠใ‚ใ—ใ€ใฎไธญๅฟƒใ ใฃใŸๆŒ™ๅ…šๅ”ใซใฏๅฑžใ—ใฆใชใ„ไบบใ‚’้–ฃๅƒšใซๆŠœๆ“ขใ—ใŸใ‚“ใ ใฃใฆใ€‚", "gold": "๐Ÿค” ๐Ÿ‡ฏ๐Ÿ‡ต ๐Ÿ›๏ธ ๐Ÿ’ฌ ๐Ÿค", "pred_focal_top50": "๐Ÿ‘ ๐Ÿ‘ ๐Ÿ“– ๐Ÿ‘ ๐Ÿ‘", "pred_top50": "๐ŸŽ‰ ๐Ÿ”ฅ ๐Ÿ“š ๐Ÿ˜Š ๐Ÿ˜Š", "jaccard_focal_top50": 0.0, "jaccard_top50": 0.0}
3
+ {"id": 3, "text": "ๆธฏๅŒบใฎๅคงไผšใงไธŠไฝๅ…ฅ่ณžใ—ใŸใ‚Šใ—ใฆใ€็ตๆง‹ใ‚ขใƒ”ใƒผใƒซใ—ใŸใ‚ใ€œ", "gold": "๐Ÿ† ๐ŸŽ‰ ๐Ÿ‘ ๐Ÿ‡ฏ๐Ÿ‡ต ๐Ÿ’ช", "pred_focal_top50": "๐Ÿ“บ ๐Ÿ˜Š ๐Ÿ˜Š ๐Ÿ˜Š ๐Ÿ˜Š", "pred_top50": "๐ŸŽ‰ ๐Ÿ‡ฏ๐Ÿ‡ต ๐Ÿ‡ฏ๐Ÿ‡ต ๐Ÿ˜Š ๐Ÿ‡ฏ๐Ÿ‡ต", "jaccard_focal_top50": 0.0, "jaccard_top50": 0.3333333333333333}
4
+ {"id": 4, "text": "็‰‡ๅฑฑใ•ใ‚“ๆœฌไบบใ‚‚ใ€NR500ใ‹ใ‚‰10ๅนดใŸใฃใฆใ“ใ‚“ใช้ขจใซ่จ€ใฃใฆใ‚‹ใ‚“ใ ใ‚ˆใญใ€‚", "gold": "๐Ÿ˜Š ๐Ÿค” ๐Ÿ’ฌ", "pred_focal_top50": "๐Ÿ“š ๐Ÿ˜Š ๐Ÿ˜Š ๐Ÿ“š ๐Ÿ“š", "pred_top50": "๐Ÿ‡ฏ๐Ÿ‡ต ๐Ÿ‘ ๐ŸŽ“ ๐Ÿ“š ๐ŸŽ‰", "jaccard_focal_top50": 0.25, "jaccard_top50": 0.0}
5
+ {"id": 5, "text": "ใƒˆใƒผใ‚ฏใฎ้ƒจๅˆ†ใ ใ‘ใฉใ€ใ€Œใƒใƒƒใƒ—ใ‚ธใƒฃใƒ  ็ฌฌ8ใ‚ทใƒผใ‚บใƒณใ€ใจๅŒใ˜ใงใ€NHKใฎใ‚นใ‚ฟใ‚ธใ‚ชใงๅˆฅใ€…ใซๅŽ้Œฒใ—ใฆใŸใ‚“ใ ใฃใฆใ€‚", "gold": "๐Ÿ“บ ๐ŸŽค ๐ŸŽต ๐Ÿ“š ๐Ÿ˜Š", "pred_focal_top50": "๐Ÿ“š ๐Ÿ‘ ๐Ÿ‡ฏ๐Ÿ‡ต ๐ŸŽ‰ ๐Ÿ‡ฏ๐Ÿ‡ต", "pred_top50": "๐Ÿ“š ๐Ÿ‡ฏ๐Ÿ‡ต ๐ŸŽ‰ ๐ŸŽ“ ๐Ÿ“š", "jaccard_focal_top50": 0.125, "jaccard_top50": 0.125}
6
+ {"id": 6, "text": "ใกใชใฟใซใ€ใใฎๅญฆๆ กใฏใƒŸใƒใ‚ฝใ‚ฟๅทž็ซ‹ๅคงๅญฆใŒ็ง‹็”ฐใซใ‚ญใƒฃใƒณใƒ‘ใ‚นไฝœใ‚‹ใจใใฎไธปใชๅฝนๅ‰ฒใ‚’ๆ‹…ใฃใฆใŸใ‚“ใ ใ€‚ไปŠใ˜ใ‚ƒใใฎ่ทกๅœฐใซๅ›ฝ้š›ๆ•™้คŠๅคงๅญฆใŒ็ซ‹ใฃใฆใฆใ€็ง‹็”ฐๅคงๅญฆใจใ‚‚้€ฃๆบใ—ใฆใ‚‹ใ‚ˆใ€‚", "gold": "๐ŸŽ“ ๐Ÿซ ๐ŸŒ ๐Ÿค ๐Ÿ“š", "pred_focal_top50": "๐Ÿ‘ ๐Ÿ‘ ๐ŸŽ‰ ๐Ÿ˜Š ๐Ÿ“š", "pred_top50": "๐Ÿ˜Š ๐ŸŽ‰ ๐Ÿ“š ๐Ÿ“– ๐ŸŽ‰", "jaccard_focal_top50": 0.125, "jaccard_top50": 0.125}
7
+ {"id": 7, "text": "1ใ€œ3ๅทปใŒๅ€ซๅญ็ทจใงใ€4ใ€œ6ๅทปใŒ้บป่กฃ็ทจใฃใฆๆ„Ÿใ˜๏ผ", "gold": "๐Ÿ“– ๐Ÿ“š โœจ ๐ŸŽ‰ ๐Ÿ‘", "pred_focal_top50": "๐Ÿ“š ๐Ÿ‡ฏ๐Ÿ‡ต ๐Ÿ˜Š ๐ŸŽ‰ ๐Ÿ˜Š", "pred_top50": "๐ŸŽ“ ๐Ÿ‡ฏ๐Ÿ‡ต ๐ŸŽ‰ ๐Ÿ”ฅ ๐Ÿ˜Š", "jaccard_focal_top50": 0.2857142857142857, "jaccard_top50": 0.1111111111111111}
8
+ {"id": 8, "text": "ใ“ใ‚Œใ€ๅ—่ณž็†็”ฑใ‹ใ‚‰ใฎๅผ•็”จใชใ‚“ใ ใฃใฆ๏ฝž", "gold": "๐ŸŽ‰ ๐Ÿ“š โœจ ๐Ÿค” ๐Ÿ’ก", "pred_focal_top50": "๐Ÿ‘ ๐Ÿ“š ๐Ÿ‡ฏ๐Ÿ‡ต ๐Ÿ‡ฏ๐Ÿ‡ต ๐Ÿ˜Š", "pred_top50": "๐Ÿ“š ๐Ÿ‡ฏ๐Ÿ‡ต ๐Ÿ“š ๐ŸŽ‰ ๐ŸŽ‰", "jaccard_focal_top50": 0.125, "jaccard_top50": 0.3333333333333333}
9
+ {"id": 9, "text": "2013ๅนดใฎใ‚ชใƒ•ใ‚ฃใ‚ทใƒฃใƒซใ‚ฌใ‚คใƒ‰ใƒ–ใƒƒใ‚ฏใงใ€ๅฅฝใใชใ‚ฟใƒฌใƒณใƒˆใจใ—ใฆๅคงๆณ‰ๆด‹ใฃใฆๆ›ธใ„ใฆใ‚ใ‚‹ใ‚“ใ ใ‚ˆใญใ€‚", "gold": "๐Ÿ˜Š ๐Ÿ“š ๐ŸŒŸ ๐Ÿ“บ ๐Ÿ‡ฏ๐Ÿ‡ต", "pred_focal_top50": "๐ŸŽ“ ๐Ÿ“š ๐Ÿ˜Š ๐Ÿ“š ๐Ÿ˜Š", "pred_top50": "๐Ÿ“š ๐Ÿ‘ ๐Ÿ˜Š ๐Ÿ“– ๐ŸŽ‰", "jaccard_focal_top50": 0.3333333333333333, "jaccard_top50": 0.25}
10
+ {"id": 10, "text": "ๅ„็ง‘ๅ€™่ฃœ็”Ÿใฃใฆใฎใฏใ€ๆตทๅค–ใฎๅญฆๆ กใงใใ‚Œใชใ‚Šใฎใ‚ณใƒผใ‚นใ‚’ๅ’ๆฅญใ—ใฆใ€ๆŽก็”จ่ฉฆ้จ“ใ‚‚ใƒ‘ใ‚นใ—ใฆใ€ๆตท่ปๆญฆๅฎ˜ใฎไปป็”จๅง”ๅ“กใฎๅฏฉๆŸปใ‚‚้€šใฃใŸไบบใฎใ“ใจใ ใ‚ˆใ€œ", "gold": "๐ŸŽ“ ๐Ÿ“š ๐Ÿ‡ฏ๐Ÿ‡ต ๐Ÿ’ช ๐Ÿ˜Š", "pred_focal_top50": "๐Ÿ“š ๐ŸŽ‰ ๐Ÿ‘ ๐Ÿ‡ฏ๐Ÿ‡ต ๐Ÿ˜Š", "pred_top50": "๐ŸŽ‰ ๐Ÿ“š ๐Ÿ˜Š ๐Ÿ“š ๐Ÿ“š", "jaccard_focal_top50": 0.42857142857142855, "jaccard_top50": 0.3333333333333333}
11
+ {"id": 11, "text": "่‡ชๆ€งๅˆ†ๅˆฅใฃใฆใฎใฏใ€ๅฐ‹ใจไผบใฎใ“ใจใงใ€ๅˆ†ๅˆฅใฎไธ€็จฎใฃใฆใ“ใจใญใ€‚", "gold": "๐Ÿค” ๐Ÿง  ๐Ÿ“– โœจ ๐Ÿ”", "pred_focal_top50": "๐Ÿ˜Š ๐Ÿ˜Š ๐Ÿ˜Š ๐Ÿ˜Š ๐Ÿ˜Š", "pred_top50": "๐Ÿ“– ๐Ÿ˜Š ๐ŸŽ‰ ๐Ÿ’ผ ๐Ÿ‡ฏ๐Ÿ‡ต", "jaccard_focal_top50": 0.0, "jaccard_top50": 0.1111111111111111}
12
+ {"id": 12, "text": "ๆญดๅฒใ‚ใ‚‹ๆ—งใƒปไฝ่ณ€็œŒ็ซ‹ๅคšไน…ๅทฅๆฅญ้ซ˜ๆ กใ€‚ \n1962ๅนด๏ผˆๆ˜ญๅ’Œ37ๅนด๏ผ‰7ๆœˆ23ๆ—ฅใ€ไฝ่ณ€็œŒๆ•™ๅง”ใงใ€็ฟŒ1963ๅนด๏ผˆๆ˜ญๅ’Œ38ๅนด๏ผ‰4ๆœˆใซใ€Œไฝ่ณ€็œŒ็ซ‹ๅคšไน…้ซ˜ๆ ก๏ผˆไปฎ็งฐ๏ผ‰ใ€ใ‚’่จญ็ซ‹ใ™ใ‚‹ใฃใฆ่ฉฑใŒๆฑบใพใฃใŸใ‚“ใ ใฃใฆ๏ผ", "gold": "๐Ÿ›๏ธ ๐Ÿ“š ๐Ÿ“– ๐Ÿ‡ฏ๐Ÿ‡ต ๐ŸŽ“", "pred_focal_top50": "๐Ÿ“š ๐Ÿ˜Š ๐ŸŽ‰ ๐Ÿ“š ๐ŸŽ‰", "pred_top50": "๐Ÿ˜Š ๐Ÿ‡ฏ๐Ÿ‡ต ๐Ÿ“š ๐Ÿ˜Š ๐Ÿ“š", "jaccard_focal_top50": 0.14285714285714285, "jaccard_top50": 0.3333333333333333}
13
+ {"id": 13, "text": "12ๆœˆ1ๆ—ฅ - ้›ปๆฐ—ไบ‹ๆฅญๆณ•ใฎไธปไปปๆŠ€่ก“่€…ใฎ่ณ‡ๆ ผใจใ‹ใซ้–ขใ™ใ‚‹็œไปคใงใ€็ฌฌ1ๆก็ฌฌ12้ …ใฎๅญฆๆ ก่ชๅฎšใ‚’ๅ—ใ‘ใŸใ‚ˆใ‚“๏ผˆ1975ๅนด5ๆœˆๆ”น่จ‚๏ผ‰ใ€‚", "gold": "๐ŸŽ“ ๐Ÿ“š ๐Ÿ›๏ธ ๐Ÿ“– ๐Ÿ’ผ", "pred_focal_top50": "๐Ÿ‡ฏ๐Ÿ‡ต ๐Ÿ“š ๐ŸŽ‰ ๐Ÿ“š ๐ŸŽ‰", "pred_top50": "๐Ÿ“– ๐Ÿ˜Š ๐Ÿ‘ ๐Ÿ“š ๐Ÿ˜Š", "jaccard_focal_top50": 0.14285714285714285, "jaccard_top50": 0.2857142857142857}
14
+ {"id": 14, "text": "ใ€ŒใƒŸใ‚จใƒŠใ‚คใƒใ‚ซใƒฉ ใ€œINVISIBLE ONEใ€œใ€ใฏใ€ไฟบๅˆใฎใ‚ขใƒ‹ใƒกใ‚ฟใ‚คใ‚ขใƒƒใƒ—ๆ›ฒใชใ‚“ใ ใ‚ˆใช๏ผ", "gold": "๐ŸŽต ๐ŸŽ‰ ๐Ÿ‡ฏ๐Ÿ‡ต ๐Ÿ”ฅ ๐Ÿ’ช", "pred_focal_top50": "๐Ÿ“– ๐ŸŽ‰ ๐Ÿ‡ฏ๐Ÿ‡ต ๐Ÿ‡ฏ๐Ÿ‡ต ๐Ÿ˜Š", "pred_top50": "๏ฟฝ๏ฟฝ ๐ŸŽค ๐Ÿ‡ฏ๐Ÿ‡ต ๐Ÿ˜Š ๐Ÿ“š", "jaccard_focal_top50": 0.2857142857142857, "jaccard_top50": 0.1111111111111111}
15
+ {"id": 15, "text": "ใ€Žๆฒˆ้ป™ใ€ใฃใฆใ€้ ่—คๅ‘จไฝœใŒ17ไธ–็ด€ใฎๆ—ฅๆœฌใฎๅฒๅฎŸใจใ‹ๆญดๅฒๆ–‡ๆ›ธใ‚’ใ‚‚ใจใซๆ›ธใ„ใŸๆญดๅฒๅฐ่ชฌใชใ‚“ใ ใฃใฆ๏ฝžใ€‚", "gold": "๐Ÿ“– ๐Ÿฏ ๐Ÿ“œ ๐Ÿ˜ฎ ๐Ÿค”", "pred_focal_top50": "๐Ÿ‡ฏ๐Ÿ‡ต ๐Ÿ˜Š ๐ŸŽ“ ๐Ÿ˜Š ๐ŸŽ‰", "pred_top50": "๐Ÿ˜Š ๐ŸŽ‰ ๐Ÿ˜Š ๐Ÿ˜Š ๐Ÿ˜Š", "jaccard_focal_top50": 0.0, "jaccard_top50": 0.0}
16
+ {"id": 16, "text": "ๅˆใฎใ‚ชใƒผใƒซใ‚นใ‚ฟใƒผใ‚ฒใƒผใƒ ๅ‡บๅ ดใงใ€7ๆœˆ11ๆ—ฅใฎ็ฌฌ1ๆˆฆ๏ผˆใ‚ญใƒฃใƒณใƒ‰ใƒซใ‚นใƒ†ใ‚ฃใƒƒใ‚ฏ๏ผ‰ใฎ8ๅ›žใซไปฃๆ‰“ใง็™ปๅ ดใ€‚ใƒžใ‚คใ‚ฏใƒปใƒ•ใ‚ฉใƒผใƒ‹ใƒฌใ‚นใ‹ใ‚‰ใ„ใใชใ‚Šๅˆๆ‰“ๅธญๆœฌๅกๆ‰“๏ผ", "gold": "โšพ ๐ŸŽ‰ ๐Ÿ‘ ๐Ÿ”ฅ ๐Ÿ’ฅ", "pred_focal_top50": "๐Ÿ“š ๐Ÿ“š ๐Ÿ˜Š ๐Ÿ“– ๐Ÿ“š", "pred_top50": "๐Ÿ‡ฏ๐Ÿ‡ต ๐Ÿ“– ๐ŸŽ‰ ๐Ÿ‡ฏ๐Ÿ‡ต ๐Ÿ˜Š", "jaccard_focal_top50": 0.0, "jaccard_top50": 0.125}
17
+ {"id": 17, "text": "1987ๅนดใซTom LaStrangeใŒไฝœใฃใŸใ‚“ใ ใฃใฆใ€‚", "gold": "๐Ÿค” ๐Ÿ’ก ๐Ÿ“š", "pred_focal_top50": "๐Ÿ“š ๐Ÿ˜Š ๐ŸŽ‰ ๐ŸŽ‰ ๐Ÿ“š", "pred_top50": "๐Ÿ“š ๐Ÿ‘ ๐Ÿ‡ฏ๐Ÿ‡ต ๐Ÿ˜Š ๐Ÿ‘", "jaccard_focal_top50": 0.2, "jaccard_top50": 0.16666666666666666}
18
+ {"id": 18, "text": "่ฟ‘ไปฃใซใชใฃใฆใ€ๅคงๆญฃๅคฉ็š‡ใฎ็—…ๆฐ—ใŒๆฒปใฃใŸใŠ็คผใงๅปŸใŒๅปบใฆใ‚‰ใ‚ŒใŸใ‚“ใ ใฃใฆใ€‚", "gold": "๐Ÿฏ ๐Ÿ™ ๐Ÿ‡ฏ๐Ÿ‡ต ๐Ÿ“– โœจ", "pred_focal_top50": "๐Ÿ“š ๐ŸŽ‰ ๐ŸŽ‰ ๐Ÿ‘ ๐Ÿ“š", "pred_top50": "๐Ÿ‘ ๐Ÿ“š ๐Ÿ˜Š ๐Ÿ‡ฏ๐Ÿ‡ต ๐Ÿ‡ฏ๐Ÿ‡ต", "jaccard_focal_top50": 0.0, "jaccard_top50": 0.125}
19
+ {"id": 19, "text": "ใƒ’ใƒซใƒ‡ใซใ‚ˆใ‚‹ใจใ€ใ€Œๆ˜”ใฏใ‚ใฃใกใ‚ƒๅ„ชใ—ใ‹ใฃใŸใ€ใ‚“ใ ใฃใฆใ€‚", "gold": "๐Ÿ˜Š ๐Ÿค” ๐Ÿ’ฌ", "pred_focal_top50": "๐Ÿ“š ๐Ÿ˜Š ๐Ÿ“š ๐ŸŽ‰ ๐ŸŽ‰", "pred_top50": "๐Ÿ˜Š ๐Ÿ“š ๐Ÿ“š ๐Ÿ˜Š ๐Ÿ‡ฏ๐Ÿ‡ต", "jaccard_focal_top50": 0.2, "jaccard_top50": 0.2}
20
+ {"id": 20, "text": "ใƒกใƒญใ‚จใซๅ–ใฃใฆไปฃใ‚ใฃใŸใƒŒใƒ“ใ‚ข็Ž‹ๅ›ฝใฎ่ตทๆบใฃใฆใ€ๅฎŸใฏใ‚ใ‚“ใพใ‚Šใฏใฃใใ‚Šใ—ใฆใชใ„ใ‚“ใ ใ‚ˆใญใ€‚", "gold": "๐Ÿ›๏ธ ๐Ÿ“– ๐ŸŒ ๐Ÿค” ๐Ÿ”", "pred_focal_top50": "๐Ÿ˜Š ๐Ÿ˜Š ๐Ÿ˜Š ๐Ÿ“š ๐Ÿ“–", "pred_top50": "๐Ÿ˜Š ๐Ÿ”ฅ ๐Ÿ‡ฏ๐Ÿ‡ต ๐Ÿ“š ๐Ÿ˜Š", "jaccard_focal_top50": 0.14285714285714285, "jaccard_top50": 0.0}
pyproject.toml ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [project]
2
+ name = "jmoji-human-eval"
3
+ version = "0.1.0"
4
+ requires-python = ">=3.12"
5
+ dependencies = [
6
+ "gradio>=4.0.0",
7
+ "huggingface_hub>=0.20.0",
8
+ ]
9
+
10
+ [build-system]
11
+ requires = ["setuptools>=61.0"]
12
+ build-backend = "setuptools.build_meta"
13
+
14
+ [tool.setuptools]
15
+ packages = [] # ใƒ‘ใƒƒใ‚ฑใƒผใ‚ธใชใ—๏ผˆใ‚นใ‚ฏใƒชใƒ—ใƒˆใฎใฟ๏ผ‰
requirements.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ gradio>=4.0.0
2
+ huggingface_hub>=0.20.0
responses/.gitkeep ADDED
File without changes
uv.lock ADDED
The diff for this file is too large to render. See raw diff