Add Method Card
Browse files
README.md
CHANGED
|
@@ -30,14 +30,16 @@ Same train/val/test as fine-tuning; we report metrics/CMs and discuss quality/la
|
|
| 30 |
- Cleaning: strip text; drop empty/NA
|
| 31 |
|
| 32 |
## Models / APIs
|
| 33 |
-
- LLM
|
| 34 |
-
- Similarity
|
| 35 |
|
| 36 |
## Prompting Strategy
|
| 37 |
- Zero-shot: instruction + schema (return 0 or 1 only).
|
| 38 |
- Adaptive one-shot: retrieve most similar train example and include it as exemplar.
|
| 39 |
- Adaptive 5-shot: retrieve top-5 similar exemplars.
|
| 40 |
|
|
|
|
|
|
|
| 41 |
## Evaluation Protocol
|
| 42 |
- Metrics: accuracy, precision, recall, F1; confusion matrix
|
| 43 |
- Latency: avg wall-clock per example
|
|
@@ -56,8 +58,8 @@ Same train/val/test as fine-tuning; we report metrics/CMs and discuss quality/la
|
|
| 56 |
|
| 57 |
## Tradeoffs
|
| 58 |
- Quality: zero-shot ≈ 5-shot ≥ one-shot on this dataset.
|
| 59 |
-
- Latency: increases with K (
|
| 60 |
-
- Cost:
|
| 61 |
|
| 62 |
## Limits & Risks
|
| 63 |
- No leakage: retrieve exemplars from **train** only.
|
|
|
|
| 30 |
- Cleaning: strip text; drop empty/NA
|
| 31 |
|
| 32 |
## Models / APIs
|
| 33 |
+
- **LLM used:** gpt-4o-mini (OpenAI API, September 2025 snapshot)
|
| 34 |
+
- **Similarity backend:** sklearn TF-IDF + cosine similarity
|
| 35 |
|
| 36 |
## Prompting Strategy
|
| 37 |
- Zero-shot: instruction + schema (return 0 or 1 only).
|
| 38 |
- Adaptive one-shot: retrieve most similar train example and include it as exemplar.
|
| 39 |
- Adaptive 5-shot: retrieve top-5 similar exemplars.
|
| 40 |
|
| 41 |
+
|
| 42 |
+
|
| 43 |
## Evaluation Protocol
|
| 44 |
- Metrics: accuracy, precision, recall, F1; confusion matrix
|
| 45 |
- Latency: avg wall-clock per example
|
|
|
|
| 58 |
|
| 59 |
## Tradeoffs
|
| 60 |
- Quality: zero-shot ≈ 5-shot ≥ one-shot on this dataset.
|
| 61 |
+
- Latency: increases with K (see Results section; ~0.28s/ex for zero-shot → ~0.45s/ex for 5-shot).
|
| 62 |
+
- Cost: scales roughly linearly with prompt length (token count). For this dataset (~20 examples), 5-shot prompts were ~3× the token usage of zero-shot.
|
| 63 |
|
| 64 |
## Limits & Risks
|
| 65 |
- No leakage: retrieve exemplars from **train** only.
|