Uddiii commited on
Commit
d8c3b18
Β·
1 Parent(s): 9c68ba6

kaggle: route Patient + Nurse to 8B-instant pool

Browse files
.env.example CHANGED
@@ -23,9 +23,12 @@ GROQ_MEDICAL_JUDGE_API_KEY=
23
  GROQ_API_KEY=
24
 
25
  # --- Per-role models (override the in-code defaults if you want) ---
 
 
 
26
  ERMAP_DOCTOR_MODEL=llama-3.1-8b-instant
27
- ERMAP_NURSE_MODEL=llama-3.3-70b-versatile
28
- ERMAP_PATIENT_MODEL=llama-3.3-70b-versatile
29
  ERMAP_EMPATHY_JUDGE_MODEL=llama-3.3-70b-versatile
30
  ERMAP_MEDICAL_JUDGE_MODEL=llama-3.3-70b-versatile
31
 
 
23
  GROQ_API_KEY=
24
 
25
  # --- Per-role models (override the in-code defaults if you want) ---
26
+ # Traffic-shaping: high-volume roleplay agents (Doctor/Nurse/Patient) on
27
+ # the 8B pool (500K TPD); only the two judges hit the smaller 70B pool
28
+ # (100K TPD) because their grading quality directly shapes the reward.
29
  ERMAP_DOCTOR_MODEL=llama-3.1-8b-instant
30
+ ERMAP_NURSE_MODEL=llama-3.1-8b-instant
31
+ ERMAP_PATIENT_MODEL=llama-3.1-8b-instant
32
  ERMAP_EMPATHY_JUDGE_MODEL=llama-3.3-70b-versatile
33
  ERMAP_MEDICAL_JUDGE_MODEL=llama-3.3-70b-versatile
34
 
ER_MAP/dashboard.py CHANGED
@@ -46,14 +46,15 @@ _DEMO_KEYS = {
46
  "GROQ_EMPATHY_JUDGE_API_KEY": "",
47
  "GROQ_MEDICAL_JUDGE_API_KEY": "",
48
 
49
- # --- Per-role models ---
50
- # Doctor runs the small/fast tier (Llama-3.1-8B-Instant) β€” Groq does
51
- # not host a Llama-7B; 8B is the closest Llama small-tier on Groq and
52
- # gives ~14 400 req/day per key vs ~6 000/day for 70B.
 
 
53
  "ERMAP_DOCTOR_MODEL": "llama-3.1-8b-instant",
54
- # Everyone else runs Llama-3.3-70B for nuanced personas / grading.
55
- "ERMAP_NURSE_MODEL": "llama-3.3-70b-versatile",
56
- "ERMAP_PATIENT_MODEL": "llama-3.3-70b-versatile",
57
  "ERMAP_EMPATHY_JUDGE_MODEL": "llama-3.3-70b-versatile",
58
  "ERMAP_MEDICAL_JUDGE_MODEL": "llama-3.3-70b-versatile",
59
 
 
46
  "GROQ_EMPATHY_JUDGE_API_KEY": "",
47
  "GROQ_MEDICAL_JUDGE_API_KEY": "",
48
 
49
+ # --- Per-role models (traffic-shaping for free-tier budget) ---
50
+ # High-volume agents (Doctor / Nurse / Patient β€” fire on every env
51
+ # step) run the 8B-instant pool: 14 400 RPD / 500K TPD per account.
52
+ # The two judges fire mostly on terminal events but their grading
53
+ # quality directly shapes the reward, so they stay on 70B-versatile
54
+ # (1 000 RPD / 100K TPD pool β€” separate budget).
55
  "ERMAP_DOCTOR_MODEL": "llama-3.1-8b-instant",
56
+ "ERMAP_NURSE_MODEL": "llama-3.1-8b-instant",
57
+ "ERMAP_PATIENT_MODEL": "llama-3.1-8b-instant",
 
58
  "ERMAP_EMPATHY_JUDGE_MODEL": "llama-3.3-70b-versatile",
59
  "ERMAP_MEDICAL_JUDGE_MODEL": "llama-3.3-70b-versatile",
60
 
ER_MAP/evaluate_baseline.py CHANGED
@@ -64,8 +64,8 @@ _DEMO_KEYS = {
64
  "GROQ_EMPATHY_JUDGE_API_KEY": "",
65
  "GROQ_MEDICAL_JUDGE_API_KEY": "",
66
  "ERMAP_DOCTOR_MODEL": "llama-3.1-8b-instant",
67
- "ERMAP_NURSE_MODEL": "llama-3.3-70b-versatile",
68
- "ERMAP_PATIENT_MODEL": "llama-3.3-70b-versatile",
69
  }
70
 
71
 
 
64
  "GROQ_EMPATHY_JUDGE_API_KEY": "",
65
  "GROQ_MEDICAL_JUDGE_API_KEY": "",
66
  "ERMAP_DOCTOR_MODEL": "llama-3.1-8b-instant",
67
+ "ERMAP_NURSE_MODEL": "llama-3.1-8b-instant",
68
+ "ERMAP_PATIENT_MODEL": "llama-3.1-8b-instant",
69
  }
70
 
71
 
kaggle/KAGGLE.md CHANGED
@@ -106,13 +106,27 @@ In practice: a single 12-hour session is usually enough to clear Phase 1 and pro
106
 
107
  ## Per-role Groq keys vs. one shared key
108
 
109
- The dashboard ships with 4 distinct Groq clients (Nurse, Patient, Empathy Judge, Medical Judge) and a fallback chain that walks across all four if any fails auth. Inside training:
110
 
111
- - Each env step does **1 Nurse + 1 Patient + occasionally 1 Empathy Judge + 1 Medical Judge call** (judges fire mostly on terminal actions, so call ratio is roughly 4 : 4 : 1 : 1).
112
- - 1 free Groq key = 14 400 req/day on 8B-instant or 6 000 req/day on 70B-versatile.
113
- - 120-episode training Γ— 8 avg steps Γ— 2 conversational LLM calls = ~2 000 calls. **Even one key is enough for a single training run**, but if you split across 4 keys you have 4Γ— the daily headroom for re-runs.
114
 
115
- If you only have **one** Groq key, set just `GROQ_API_KEY` as a Kaggle Secret. Everything still works β€” the AgentRouter falls back to the same client for all roles.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
116
 
117
  ---
118
 
 
106
 
107
  ## Per-role Groq keys vs. one shared key
108
 
109
+ The dashboard ships with 4 distinct Groq clients (Nurse, Patient, Empathy Judge, Medical Judge) and a fallback chain that walks across all four if any fails auth. Per-key budgets are *shared* on Groq's free tier (limits are per-account, not per-key) β€” but the model split below buys you real headroom because **each model has its own daily pool**.
110
 
111
+ ### Default model assignment (traffic-shaping)
 
 
112
 
113
+ | Role | Model | Free-tier pool | Why |
114
+ |---|---|---|---|
115
+ | Nurse | `llama-3.1-8b-instant` | 14 400 RPD / 500K TPD | high-volume (every env step) |
116
+ | Patient | `llama-3.1-8b-instant` | shared 8B pool | high-volume (every env step) |
117
+ | Empathy Judge | `llama-3.3-70b-versatile` | 1 000 RPD / 100K TPD | grading quality directly shapes reward |
118
+ | Medical Judge | `llama-3.3-70b-versatile` | shared 70B pool | grading quality directly shapes reward |
119
+
120
+ Quick budget check for **one full 120-episode training run**:
121
+
122
+ | Pool | Estimated calls/run | Daily ceiling | Headroom |
123
+ |---|---|---|---|
124
+ | 8B-instant (Nurse + Patient) | ~2 880 | 14 400 RPD | ~5x |
125
+ | 70B-versatile (judges) | ~720 | 1 000 RPD | ~1.4x |
126
+
127
+ You can do **one training run per day per account** comfortably. If you need to retry inside the same day, drop one of the two judges to 8B-instant temporarily β€” the reward signal degrades a little, but training keeps moving.
128
+
129
+ If you only have **one** Groq key total, set just `GROQ_API_KEY` as a Kaggle Secret. Everything still works β€” the AgentRouter falls back to the same client for all roles, and the per-model budgets still split traffic across pools.
130
 
131
  ---
132
 
kaggle/train_ermap_grpo_kaggle.ipynb CHANGED
@@ -226,8 +226,11 @@
226
  "# Doctor-on-Kaggle is the LOCAL trained model, NOT a Groq call. The\n",
227
  "# Doctor's Groq key is therefore unused here, but Nurse / Patient /\n",
228
  "# Empathy Judge / Medical Judge all hit Groq once per env step.\n",
229
- "os.environ[\"ERMAP_NURSE_MODEL\"] = \"llama-3.3-70b-versatile\"\n",
230
- "os.environ[\"ERMAP_PATIENT_MODEL\"] = \"llama-3.3-70b-versatile\"\n",
 
 
 
231
  "os.environ[\"ERMAP_EMPATHY_JUDGE_MODEL\"] = \"llama-3.3-70b-versatile\"\n",
232
  "os.environ[\"ERMAP_MEDICAL_JUDGE_MODEL\"] = \"llama-3.3-70b-versatile\"\n",
233
  "\n",
 
226
  "# Doctor-on-Kaggle is the LOCAL trained model, NOT a Groq call. The\n",
227
  "# Doctor's Groq key is therefore unused here, but Nurse / Patient /\n",
228
  "# Empathy Judge / Medical Judge all hit Groq once per env step.\n",
229
+ "# Traffic-shaping: high-volume roleplay agents (Nurse + Patient) on the\n",
230
+ "# 8B-instant pool (500K TPD, 14,400 RPD); the two judges stay on 70B-\n",
231
+ "# versatile because their grading quality directly shapes the reward.\n",
232
+ "os.environ[\"ERMAP_NURSE_MODEL\"] = \"llama-3.1-8b-instant\"\n",
233
+ "os.environ[\"ERMAP_PATIENT_MODEL\"] = \"llama-3.1-8b-instant\"\n",
234
  "os.environ[\"ERMAP_EMPATHY_JUDGE_MODEL\"] = \"llama-3.3-70b-versatile\"\n",
235
  "os.environ[\"ERMAP_MEDICAL_JUDGE_MODEL\"] = \"llama-3.3-70b-versatile\"\n",
236
  "\n",