| # Method Card β Football Sentiment Prompting (0/1/5-shot) | |
| ## TL;DR | |
| We compare zero-shot, adaptive one-shot, and adaptive 5-shot prompting for binary sentiment on football news. | |
| Same train/val/test as fine-tuning; we report metrics/CMs and discuss quality/latency/cost. | |
| ## Data | |
| - Dataset: `james-kramer/football_news` (Hugging Face) | |
| - Task: Binary sentiment (0=negative, 1=positive) | |
| - Splits: Stratified 80/10/10 | |
| - Cleaning: strip text; drop empty/NA | |
| ## Models / APIs | |
| - LLM: (fill in, e.g., gpt-4o-mini / llama-3.1-instruct / etc.) | |
| - Similarity: TF-IDF + cosine (sklearn) | |
| ## Prompting Strategy | |
| - Zero-shot: instruction + schema (return 0 or 1 only). | |
| - Adaptive one-shot: retrieve most similar train example and include it as exemplar. | |
| - Adaptive 5-shot: retrieve top-5 similar exemplars. | |
| ## Evaluation Protocol | |
| - Metrics: accuracy, precision, recall, F1; confusion matrix | |
| - Latency: avg wall-clock per example | |
| - Seed: 42 | |
| - Reproducibility: prompts/selection/eval code in this repo | |
| ## Results (Val/Test) | |
| - Val: | |
| - Zero-shot: acc 0.8, f1 0.75, cm [[5, 0], [2, 3]], ~0.416s/ex | |
| - One-shot: acc 0.5, f1 0.2857142857, cm [[4, 1], [4, 1]], ~0.304s/ex | |
| - 5-shot: acc 0.8, f1 0.75, cm [[5, 0], [2, 3]], ~0.451s/ex | |
| - Test: | |
| - Zero-shot: acc 0.7, f1 0.7272727273, cm [[3, 2], [1, 4]], ~0.282s/ex | |
| - One-shot: acc 0.7, f1 0.7272727273, cm [[3, 2], [1, 4]], ~0.354s/ex | |
| - 5-shot: acc 0.7, f1 0.5714285714, cm [[5, 0], [3, 2]], ~0.449s/ex | |
| ## Tradeoffs | |
| - Quality: zero-shot β 5-shot β₯ one-shot on this dataset. | |
| - Latency: increases with K (prompt length). | |
| - Cost: increases with K for token-billed APIs. | |
| ## Limits & Risks | |
| - No leakage: retrieve exemplars from **train** only. | |
| - Bias: sports phrasing may sway sentiment; small data β instability. | |
| ## Reproducibility | |
| - Code: `prompts/`, `selection.py`, `evaluate_prompting.py` | |
| - Seed: 42 | |
| - Python β₯ 3.10 | |
| ## Usage Disclosure | |
| This card and pipeline were organized with GenAI assistance; experiments and results were implemented and verified by the author. | |