Add Method Card
Browse files
README.md
ADDED
|
@@ -0,0 +1,53 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Method Card — Football Sentiment Prompting (0/1/5-shot)
|
| 2 |
+
|
| 3 |
+
## TL;DR
|
| 4 |
+
We compare zero-shot, adaptive one-shot, and adaptive 5-shot prompting for binary sentiment on football news.
|
| 5 |
+
Same train/val/test as fine-tuning; we report metrics/CMs and discuss quality/latency/cost.
|
| 6 |
+
|
| 7 |
+
## Data
|
| 8 |
+
- Dataset: `james-kramer/football_news` (Hugging Face)
|
| 9 |
+
- Task: Binary sentiment (0=negative, 1=positive)
|
| 10 |
+
- Splits: Stratified 80/10/10
|
| 11 |
+
- Cleaning: strip text; drop empty/NA
|
| 12 |
+
|
| 13 |
+
## Models / APIs
|
| 14 |
+
- LLM: (fill in, e.g., gpt-4o-mini / llama-3.1-instruct / etc.)
|
| 15 |
+
- Similarity: TF-IDF + cosine (sklearn)
|
| 16 |
+
|
| 17 |
+
## Prompting Strategy
|
| 18 |
+
- Zero-shot: instruction + schema (return 0 or 1 only).
|
| 19 |
+
- Adaptive one-shot: retrieve most similar train example and include it as exemplar.
|
| 20 |
+
- Adaptive 5-shot: retrieve top-5 similar exemplars.
|
| 21 |
+
|
| 22 |
+
## Evaluation Protocol
|
| 23 |
+
- Metrics: accuracy, precision, recall, F1; confusion matrix
|
| 24 |
+
- Latency: avg wall-clock per example
|
| 25 |
+
- Seed: 42
|
| 26 |
+
- Reproducibility: prompts/selection/eval code in this repo
|
| 27 |
+
|
| 28 |
+
## Results (Val/Test)
|
| 29 |
+
- Val:
|
| 30 |
+
- Zero-shot: acc 0.8, f1 0.75, cm [[5, 0], [2, 3]], ~0.416s/ex
|
| 31 |
+
- One-shot: acc 0.5, f1 0.2857142857, cm [[4, 1], [4, 1]], ~0.304s/ex
|
| 32 |
+
- 5-shot: acc 0.8, f1 0.75, cm [[5, 0], [2, 3]], ~0.451s/ex
|
| 33 |
+
- Test:
|
| 34 |
+
- Zero-shot: acc 0.7, f1 0.7272727273, cm [[3, 2], [1, 4]], ~0.282s/ex
|
| 35 |
+
- One-shot: acc 0.7, f1 0.7272727273, cm [[3, 2], [1, 4]], ~0.354s/ex
|
| 36 |
+
- 5-shot: acc 0.7, f1 0.5714285714, cm [[5, 0], [3, 2]], ~0.449s/ex
|
| 37 |
+
|
| 38 |
+
## Tradeoffs
|
| 39 |
+
- Quality: zero-shot ≈ 5-shot ≥ one-shot on this dataset.
|
| 40 |
+
- Latency: increases with K (prompt length).
|
| 41 |
+
- Cost: increases with K for token-billed APIs.
|
| 42 |
+
|
| 43 |
+
## Limits & Risks
|
| 44 |
+
- No leakage: retrieve exemplars from **train** only.
|
| 45 |
+
- Bias: sports phrasing may sway sentiment; small data → instability.
|
| 46 |
+
|
| 47 |
+
## Reproducibility
|
| 48 |
+
- Code: `prompts/`, `selection.py`, `evaluate_prompting.py`
|
| 49 |
+
- Seed: 42
|
| 50 |
+
- Python ≥ 3.10
|
| 51 |
+
|
| 52 |
+
## Usage Disclosure
|
| 53 |
+
This card and pipeline were organized with GenAI assistance; experiments and results were implemented and verified by the author.
|