File size: 3,235 Bytes

4dd5400
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2c7ec8c
 
 
feb825a
 
 
 
 
 
 
 
 
 
 
 
cf32989
 
feb825a
 
 
 
 
 
ca19daa
 
 
 
 
 
e5a7746
ca19daa
 
 
 
 
 
e5a7746
 
 
ca19daa
 
e5a7746
ca19daa
 
 
 
 
e5a7746
 
ca19daa
cf32989
 
feb825a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cf32989
 
feb825a

---
library_name: transformers
pipeline_tag: text-classification
license: mit
tags:
  - prompting
  - zero-shot
  - few-shot
  - football
  - sentiment
  - adaptive-retrieval
model_name: Football Sentiment Prompting (0/1/5-shot)
language:
  - en
datasets:
  - james-kramer/football_news
inference: false
---

# Method Card — Football Sentiment Prompting (0/1/5-shot)

## TL;DR
We compare zero-shot, adaptive one-shot, and adaptive 5-shot prompting for binary sentiment on football news. 
Same train/val/test as fine-tuning; we report metrics/CMs and discuss quality/latency/cost.

## Data
- Dataset: `james-kramer/football_news` (Hugging Face)
- Task: Binary sentiment (0=negative, 1=positive)
- Splits: Stratified 80/10/10
- Cleaning: strip text; drop empty/NA

## Models / APIs
- **LLM used:** gpt-4o-mini (OpenAI API, September 2025 snapshot)  
- **Similarity backend:** sklearn TF-IDF + cosine similarity 

## Prompting Strategy
- Zero-shot: instruction + schema (return 0 or 1 only).
- Adaptive one-shot: retrieve most similar train example and include it as exemplar.
- Adaptive 5-shot: retrieve top-5 similar exemplars.

## Prompt Templates
**Zero-shot**
You are a concise sentiment classifier.
Decide if the following football-related sentence is positive or negative.
Only answer with a single word: "positive" or "negative".

Sentence: "text", 
Answer:

**Adaptive One-shot**
You are a concise sentiment classifier for football news.
Decide if each sentence is positive or negative. Only answer with one word.

Example: [],
Sentence: "ex_text",
Label: "ex_label",

Now classify the target sentence.
Sentence: "text", 
Answer:

**Adaptive K-shot (e.g., K=5)**
You are a concise sentiment classifier for football news.
Decide if the sentence is positive or negative. Only answer with one word.
examples: [], 
Sentence: "text", 
Answer:


## Evaluation Protocol
- Metrics: accuracy, precision, recall, F1; confusion matrix
- Latency: avg wall-clock per example
- Seed: 42
- Reproducibility: prompts/selection/eval code in this repo

## Results (Val/Test)
- Val:
  - Zero-shot: acc 0.8, f1 0.75, cm [[5, 0], [2, 3]], ~0.416s/ex
  - One-shot: acc 0.5, f1 0.2857142857, cm [[4, 1], [4, 1]], ~0.304s/ex
  - 5-shot:   acc 0.8, f1 0.75, cm [[5, 0], [2, 3]], ~0.451s/ex
- Test:
  - Zero-shot: acc 0.7, f1 0.7272727273, cm [[3, 2], [1, 4]], ~0.282s/ex
  - One-shot: acc 0.7, f1 0.7272727273, cm [[3, 2], [1, 4]], ~0.354s/ex
  - 5-shot:   acc 0.7, f1 0.5714285714, cm [[5, 0], [3, 2]], ~0.449s/ex

## Tradeoffs
- Quality: zero-shot ≈ 5-shot ≥ one-shot on this dataset.
- Latency: increases with K (see Results section; ~0.28s/ex for zero-shot → ~0.45s/ex for 5-shot).  
- Cost: scales roughly linearly with prompt length (token count). For this dataset (~20 examples), 5-shot prompts were ~3× the token usage of zero-shot.

## Limits & Risks
- No leakage: retrieve exemplars from **train** only.
- Bias: sports phrasing may sway sentiment; small data → instability.

## Reproducibility
- Code: `prompts/`, `selection.py`, `evaluate_prompting.py`
- Seed: 42
- Python ≥ 3.10

## Usage Disclosure
This card and pipeline were organized with GenAI assistance; experiments and results were implemented and verified by the author.