kevinkyi commited on
Commit
feb825a
·
verified ·
1 Parent(s): 6c24a87

Add Method Card

Browse files
Files changed (1) hide show
  1. README.md +53 -0
README.md ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Method Card — Football Sentiment Prompting (0/1/5-shot)
2
+
3
+ ## TL;DR
4
+ We compare zero-shot, adaptive one-shot, and adaptive 5-shot prompting for binary sentiment on football news.
5
+ Same train/val/test as fine-tuning; we report metrics/CMs and discuss quality/latency/cost.
6
+
7
+ ## Data
8
+ - Dataset: `james-kramer/football_news` (Hugging Face)
9
+ - Task: Binary sentiment (0=negative, 1=positive)
10
+ - Splits: Stratified 80/10/10
11
+ - Cleaning: strip text; drop empty/NA
12
+
13
+ ## Models / APIs
14
+ - LLM: (fill in, e.g., gpt-4o-mini / llama-3.1-instruct / etc.)
15
+ - Similarity: TF-IDF + cosine (sklearn)
16
+
17
+ ## Prompting Strategy
18
+ - Zero-shot: instruction + schema (return 0 or 1 only).
19
+ - Adaptive one-shot: retrieve most similar train example and include it as exemplar.
20
+ - Adaptive 5-shot: retrieve top-5 similar exemplars.
21
+
22
+ ## Evaluation Protocol
23
+ - Metrics: accuracy, precision, recall, F1; confusion matrix
24
+ - Latency: avg wall-clock per example
25
+ - Seed: 42
26
+ - Reproducibility: prompts/selection/eval code in this repo
27
+
28
+ ## Results (Val/Test)
29
+ - Val:
30
+ - Zero-shot: acc 0.8, f1 0.75, cm [[5, 0], [2, 3]], ~0.416s/ex
31
+ - One-shot: acc 0.5, f1 0.2857142857, cm [[4, 1], [4, 1]], ~0.304s/ex
32
+ - 5-shot: acc 0.8, f1 0.75, cm [[5, 0], [2, 3]], ~0.451s/ex
33
+ - Test:
34
+ - Zero-shot: acc 0.7, f1 0.7272727273, cm [[3, 2], [1, 4]], ~0.282s/ex
35
+ - One-shot: acc 0.7, f1 0.7272727273, cm [[3, 2], [1, 4]], ~0.354s/ex
36
+ - 5-shot: acc 0.7, f1 0.5714285714, cm [[5, 0], [3, 2]], ~0.449s/ex
37
+
38
+ ## Tradeoffs
39
+ - Quality: zero-shot ≈ 5-shot ≥ one-shot on this dataset.
40
+ - Latency: increases with K (prompt length).
41
+ - Cost: increases with K for token-billed APIs.
42
+
43
+ ## Limits & Risks
44
+ - No leakage: retrieve exemplars from **train** only.
45
+ - Bias: sports phrasing may sway sentiment; small data → instability.
46
+
47
+ ## Reproducibility
48
+ - Code: `prompts/`, `selection.py`, `evaluate_prompting.py`
49
+ - Seed: 42
50
+ - Python ≥ 3.10
51
+
52
+ ## Usage Disclosure
53
+ This card and pipeline were organized with GenAI assistance; experiments and results were implemented and verified by the author.