kevinkyi
/

Homework2_Multishot_Prompting

Text Classification

adaptive-retrieval

Model card Files Files and versions

kevinkyi commited on Sep 22, 2025

Commit

feb825a

·

verified ·

1 Parent(s): 6c24a87

Add Method Card

Files changed (1) hide show

README.md +53 -0

README.md ADDED Viewed

	@@ -0,0 +1,53 @@

+# Method Card — Football Sentiment Prompting (0/1/5-shot)
+## TL;DR
+We compare zero-shot, adaptive one-shot, and adaptive 5-shot prompting for binary sentiment on football news.
+Same train/val/test as fine-tuning; we report metrics/CMs and discuss quality/latency/cost.
+## Data
+- Dataset: `james-kramer/football_news` (Hugging Face)
+- Task: Binary sentiment (0=negative, 1=positive)
+- Splits: Stratified 80/10/10
+- Cleaning: strip text; drop empty/NA
+## Models / APIs
+- LLM: (fill in, e.g., gpt-4o-mini / llama-3.1-instruct / etc.)
+- Similarity: TF-IDF + cosine (sklearn)
+## Prompting Strategy
+- Zero-shot: instruction + schema (return 0 or 1 only).
+- Adaptive one-shot: retrieve most similar train example and include it as exemplar.
+- Adaptive 5-shot: retrieve top-5 similar exemplars.
+## Evaluation Protocol
+- Metrics: accuracy, precision, recall, F1; confusion matrix
+- Latency: avg wall-clock per example
+- Seed: 42
+- Reproducibility: prompts/selection/eval code in this repo
+## Results (Val/Test)
+- Val:
+  - Zero-shot: acc 0.8, f1 0.75, cm [[5, 0], [2, 3]], ~0.416s/ex
+  - One-shot: acc 0.5, f1 0.2857142857, cm [[4, 1], [4, 1]], ~0.304s/ex
+  - 5-shot:   acc 0.8, f1 0.75, cm [[5, 0], [2, 3]], ~0.451s/ex
+- Test:
+  - Zero-shot: acc 0.7, f1 0.7272727273, cm [[3, 2], [1, 4]], ~0.282s/ex
+  - One-shot: acc 0.7, f1 0.7272727273, cm [[3, 2], [1, 4]], ~0.354s/ex
+  - 5-shot:   acc 0.7, f1 0.5714285714, cm [[5, 0], [3, 2]], ~0.449s/ex
+## Tradeoffs
+- Quality: zero-shot ≈ 5-shot ≥ one-shot on this dataset.
+- Latency: increases with K (prompt length).
+- Cost: increases with K for token-billed APIs.
+## Limits & Risks
+- No leakage: retrieve exemplars from **train** only.
+- Bias: sports phrasing may sway sentiment; small data → instability.
+## Reproducibility
+- Code: `prompts/`, `selection.py`, `evaluate_prompting.py`
+- Seed: 42
+- Python ≥ 3.10
+## Usage Disclosure
+This card and pipeline were organized with GenAI assistance; experiments and results were implemented and verified by the author.