kevinkyi's picture
Add Method Card
feb825a verified
|
raw
history blame
2.02 kB
# Method Card β€” Football Sentiment Prompting (0/1/5-shot)
## TL;DR
We compare zero-shot, adaptive one-shot, and adaptive 5-shot prompting for binary sentiment on football news.
Same train/val/test as fine-tuning; we report metrics/CMs and discuss quality/latency/cost.
## Data
- Dataset: `james-kramer/football_news` (Hugging Face)
- Task: Binary sentiment (0=negative, 1=positive)
- Splits: Stratified 80/10/10
- Cleaning: strip text; drop empty/NA
## Models / APIs
- LLM: (fill in, e.g., gpt-4o-mini / llama-3.1-instruct / etc.)
- Similarity: TF-IDF + cosine (sklearn)
## Prompting Strategy
- Zero-shot: instruction + schema (return 0 or 1 only).
- Adaptive one-shot: retrieve most similar train example and include it as exemplar.
- Adaptive 5-shot: retrieve top-5 similar exemplars.
## Evaluation Protocol
- Metrics: accuracy, precision, recall, F1; confusion matrix
- Latency: avg wall-clock per example
- Seed: 42
- Reproducibility: prompts/selection/eval code in this repo
## Results (Val/Test)
- Val:
- Zero-shot: acc 0.8, f1 0.75, cm [[5, 0], [2, 3]], ~0.416s/ex
- One-shot: acc 0.5, f1 0.2857142857, cm [[4, 1], [4, 1]], ~0.304s/ex
- 5-shot: acc 0.8, f1 0.75, cm [[5, 0], [2, 3]], ~0.451s/ex
- Test:
- Zero-shot: acc 0.7, f1 0.7272727273, cm [[3, 2], [1, 4]], ~0.282s/ex
- One-shot: acc 0.7, f1 0.7272727273, cm [[3, 2], [1, 4]], ~0.354s/ex
- 5-shot: acc 0.7, f1 0.5714285714, cm [[5, 0], [3, 2]], ~0.449s/ex
## Tradeoffs
- Quality: zero-shot β‰ˆ 5-shot β‰₯ one-shot on this dataset.
- Latency: increases with K (prompt length).
- Cost: increases with K for token-billed APIs.
## Limits & Risks
- No leakage: retrieve exemplars from **train** only.
- Bias: sports phrasing may sway sentiment; small data β†’ instability.
## Reproducibility
- Code: `prompts/`, `selection.py`, `evaluate_prompting.py`
- Seed: 42
- Python β‰₯ 3.10
## Usage Disclosure
This card and pipeline were organized with GenAI assistance; experiments and results were implemented and verified by the author.