|
|
--- |
|
|
library_name: transformers |
|
|
pipeline_tag: text-classification |
|
|
license: mit |
|
|
tags: |
|
|
- prompting |
|
|
- zero-shot |
|
|
- few-shot |
|
|
- football |
|
|
- sentiment |
|
|
- adaptive-retrieval |
|
|
model_name: Football Sentiment Prompting (0/1/5-shot) |
|
|
language: |
|
|
- en |
|
|
datasets: |
|
|
- james-kramer/football_news |
|
|
inference: false |
|
|
--- |
|
|
|
|
|
# Method Card β Football Sentiment Prompting (0/1/5-shot) |
|
|
|
|
|
## TL;DR |
|
|
We compare zero-shot, adaptive one-shot, and adaptive 5-shot prompting for binary sentiment on football news. |
|
|
Same train/val/test as fine-tuning; we report metrics/CMs and discuss quality/latency/cost. |
|
|
|
|
|
## Data |
|
|
- Dataset: `james-kramer/football_news` (Hugging Face) |
|
|
- Task: Binary sentiment (0=negative, 1=positive) |
|
|
- Splits: Stratified 80/10/10 |
|
|
- Cleaning: strip text; drop empty/NA |
|
|
|
|
|
## Models / APIs |
|
|
- **LLM used:** gpt-4o-mini (OpenAI API, September 2025 snapshot) |
|
|
- **Similarity backend:** sklearn TF-IDF + cosine similarity |
|
|
|
|
|
## Prompting Strategy |
|
|
- Zero-shot: instruction + schema (return 0 or 1 only). |
|
|
- Adaptive one-shot: retrieve most similar train example and include it as exemplar. |
|
|
- Adaptive 5-shot: retrieve top-5 similar exemplars. |
|
|
|
|
|
## Prompt Templates |
|
|
**Zero-shot** |
|
|
You are a concise sentiment classifier. |
|
|
Decide if the following football-related sentence is positive or negative. |
|
|
Only answer with a single word: "positive" or "negative". |
|
|
|
|
|
Sentence: "text", |
|
|
Answer: |
|
|
|
|
|
**Adaptive One-shot** |
|
|
You are a concise sentiment classifier for football news. |
|
|
Decide if each sentence is positive or negative. Only answer with one word. |
|
|
|
|
|
Example: [], |
|
|
Sentence: "ex_text", |
|
|
Label: "ex_label", |
|
|
|
|
|
Now classify the target sentence. |
|
|
Sentence: "text", |
|
|
Answer: |
|
|
|
|
|
**Adaptive K-shot (e.g., K=5)** |
|
|
You are a concise sentiment classifier for football news. |
|
|
Decide if the sentence is positive or negative. Only answer with one word. |
|
|
examples: [], |
|
|
Sentence: "text", |
|
|
Answer: |
|
|
|
|
|
|
|
|
## Evaluation Protocol |
|
|
- Metrics: accuracy, precision, recall, F1; confusion matrix |
|
|
- Latency: avg wall-clock per example |
|
|
- Seed: 42 |
|
|
- Reproducibility: prompts/selection/eval code in this repo |
|
|
|
|
|
## Results (Val/Test) |
|
|
- Val: |
|
|
- Zero-shot: acc 0.8, f1 0.75, cm [[5, 0], [2, 3]], ~0.416s/ex |
|
|
- One-shot: acc 0.5, f1 0.2857142857, cm [[4, 1], [4, 1]], ~0.304s/ex |
|
|
- 5-shot: acc 0.8, f1 0.75, cm [[5, 0], [2, 3]], ~0.451s/ex |
|
|
- Test: |
|
|
- Zero-shot: acc 0.7, f1 0.7272727273, cm [[3, 2], [1, 4]], ~0.282s/ex |
|
|
- One-shot: acc 0.7, f1 0.7272727273, cm [[3, 2], [1, 4]], ~0.354s/ex |
|
|
- 5-shot: acc 0.7, f1 0.5714285714, cm [[5, 0], [3, 2]], ~0.449s/ex |
|
|
|
|
|
## Tradeoffs |
|
|
- Quality: zero-shot β 5-shot β₯ one-shot on this dataset. |
|
|
- Latency: increases with K (see Results section; ~0.28s/ex for zero-shot β ~0.45s/ex for 5-shot). |
|
|
- Cost: scales roughly linearly with prompt length (token count). For this dataset (~20 examples), 5-shot prompts were ~3Γ the token usage of zero-shot. |
|
|
|
|
|
## Limits & Risks |
|
|
- No leakage: retrieve exemplars from **train** only. |
|
|
- Bias: sports phrasing may sway sentiment; small data β instability. |
|
|
|
|
|
## Reproducibility |
|
|
- Code: `prompts/`, `selection.py`, `evaluate_prompting.py` |
|
|
- Seed: 42 |
|
|
- Python β₯ 3.10 |
|
|
|
|
|
## Usage Disclosure |
|
|
This card and pipeline were organized with GenAI assistance; experiments and results were implemented and verified by the author. |
|
|
|