kevinkyi commited on
Commit
cf32989
·
verified ·
1 Parent(s): 2c7ec8c

Add Method Card

Browse files
Files changed (1) hide show
  1. README.md +6 -4
README.md CHANGED
@@ -30,14 +30,16 @@ Same train/val/test as fine-tuning; we report metrics/CMs and discuss quality/la
30
  - Cleaning: strip text; drop empty/NA
31
 
32
  ## Models / APIs
33
- - LLM: (fill in, e.g., gpt-4o-mini / llama-3.1-instruct / etc.)
34
- - Similarity: TF-IDF + cosine (sklearn)
35
 
36
  ## Prompting Strategy
37
  - Zero-shot: instruction + schema (return 0 or 1 only).
38
  - Adaptive one-shot: retrieve most similar train example and include it as exemplar.
39
  - Adaptive 5-shot: retrieve top-5 similar exemplars.
40
 
 
 
41
  ## Evaluation Protocol
42
  - Metrics: accuracy, precision, recall, F1; confusion matrix
43
  - Latency: avg wall-clock per example
@@ -56,8 +58,8 @@ Same train/val/test as fine-tuning; we report metrics/CMs and discuss quality/la
56
 
57
  ## Tradeoffs
58
  - Quality: zero-shot ≈ 5-shot ≥ one-shot on this dataset.
59
- - Latency: increases with K (prompt length).
60
- - Cost: increases with K for token-billed APIs.
61
 
62
  ## Limits & Risks
63
  - No leakage: retrieve exemplars from **train** only.
 
30
  - Cleaning: strip text; drop empty/NA
31
 
32
  ## Models / APIs
33
+ - **LLM used:** gpt-4o-mini (OpenAI API, September 2025 snapshot)
34
+ - **Similarity backend:** sklearn TF-IDF + cosine similarity
35
 
36
  ## Prompting Strategy
37
  - Zero-shot: instruction + schema (return 0 or 1 only).
38
  - Adaptive one-shot: retrieve most similar train example and include it as exemplar.
39
  - Adaptive 5-shot: retrieve top-5 similar exemplars.
40
 
41
+
42
+
43
  ## Evaluation Protocol
44
  - Metrics: accuracy, precision, recall, F1; confusion matrix
45
  - Latency: avg wall-clock per example
 
58
 
59
  ## Tradeoffs
60
  - Quality: zero-shot ≈ 5-shot ≥ one-shot on this dataset.
61
+ - Latency: increases with K (see Results section; ~0.28s/ex for zero-shot → ~0.45s/ex for 5-shot).
62
+ - Cost: scales roughly linearly with prompt length (token count). For this dataset (~20 examples), 5-shot prompts were ~3× the token usage of zero-shot.
63
 
64
  ## Limits & Risks
65
  - No leakage: retrieve exemplars from **train** only.