Spaces:
Running
Running
kaggle: route Patient + Nurse to 8B-instant pool
Browse files- .env.example +5 -2
- ER_MAP/dashboard.py +8 -7
- ER_MAP/evaluate_baseline.py +2 -2
- kaggle/KAGGLE.md +19 -5
- kaggle/train_ermap_grpo_kaggle.ipynb +5 -2
.env.example
CHANGED
|
@@ -23,9 +23,12 @@ GROQ_MEDICAL_JUDGE_API_KEY=
|
|
| 23 |
GROQ_API_KEY=
|
| 24 |
|
| 25 |
# --- Per-role models (override the in-code defaults if you want) ---
|
|
|
|
|
|
|
|
|
|
| 26 |
ERMAP_DOCTOR_MODEL=llama-3.1-8b-instant
|
| 27 |
-
ERMAP_NURSE_MODEL=llama-3.
|
| 28 |
-
ERMAP_PATIENT_MODEL=llama-3.
|
| 29 |
ERMAP_EMPATHY_JUDGE_MODEL=llama-3.3-70b-versatile
|
| 30 |
ERMAP_MEDICAL_JUDGE_MODEL=llama-3.3-70b-versatile
|
| 31 |
|
|
|
|
| 23 |
GROQ_API_KEY=
|
| 24 |
|
| 25 |
# --- Per-role models (override the in-code defaults if you want) ---
|
| 26 |
+
# Traffic-shaping: high-volume roleplay agents (Doctor/Nurse/Patient) on
|
| 27 |
+
# the 8B pool (500K TPD); only the two judges hit the smaller 70B pool
|
| 28 |
+
# (100K TPD) because their grading quality directly shapes the reward.
|
| 29 |
ERMAP_DOCTOR_MODEL=llama-3.1-8b-instant
|
| 30 |
+
ERMAP_NURSE_MODEL=llama-3.1-8b-instant
|
| 31 |
+
ERMAP_PATIENT_MODEL=llama-3.1-8b-instant
|
| 32 |
ERMAP_EMPATHY_JUDGE_MODEL=llama-3.3-70b-versatile
|
| 33 |
ERMAP_MEDICAL_JUDGE_MODEL=llama-3.3-70b-versatile
|
| 34 |
|
ER_MAP/dashboard.py
CHANGED
|
@@ -46,14 +46,15 @@ _DEMO_KEYS = {
|
|
| 46 |
"GROQ_EMPATHY_JUDGE_API_KEY": "",
|
| 47 |
"GROQ_MEDICAL_JUDGE_API_KEY": "",
|
| 48 |
|
| 49 |
-
# --- Per-role models ---
|
| 50 |
-
#
|
| 51 |
-
#
|
| 52 |
-
#
|
|
|
|
|
|
|
| 53 |
"ERMAP_DOCTOR_MODEL": "llama-3.1-8b-instant",
|
| 54 |
-
|
| 55 |
-
"
|
| 56 |
-
"ERMAP_PATIENT_MODEL": "llama-3.3-70b-versatile",
|
| 57 |
"ERMAP_EMPATHY_JUDGE_MODEL": "llama-3.3-70b-versatile",
|
| 58 |
"ERMAP_MEDICAL_JUDGE_MODEL": "llama-3.3-70b-versatile",
|
| 59 |
|
|
|
|
| 46 |
"GROQ_EMPATHY_JUDGE_API_KEY": "",
|
| 47 |
"GROQ_MEDICAL_JUDGE_API_KEY": "",
|
| 48 |
|
| 49 |
+
# --- Per-role models (traffic-shaping for free-tier budget) ---
|
| 50 |
+
# High-volume agents (Doctor / Nurse / Patient β fire on every env
|
| 51 |
+
# step) run the 8B-instant pool: 14 400 RPD / 500K TPD per account.
|
| 52 |
+
# The two judges fire mostly on terminal events but their grading
|
| 53 |
+
# quality directly shapes the reward, so they stay on 70B-versatile
|
| 54 |
+
# (1 000 RPD / 100K TPD pool β separate budget).
|
| 55 |
"ERMAP_DOCTOR_MODEL": "llama-3.1-8b-instant",
|
| 56 |
+
"ERMAP_NURSE_MODEL": "llama-3.1-8b-instant",
|
| 57 |
+
"ERMAP_PATIENT_MODEL": "llama-3.1-8b-instant",
|
|
|
|
| 58 |
"ERMAP_EMPATHY_JUDGE_MODEL": "llama-3.3-70b-versatile",
|
| 59 |
"ERMAP_MEDICAL_JUDGE_MODEL": "llama-3.3-70b-versatile",
|
| 60 |
|
ER_MAP/evaluate_baseline.py
CHANGED
|
@@ -64,8 +64,8 @@ _DEMO_KEYS = {
|
|
| 64 |
"GROQ_EMPATHY_JUDGE_API_KEY": "",
|
| 65 |
"GROQ_MEDICAL_JUDGE_API_KEY": "",
|
| 66 |
"ERMAP_DOCTOR_MODEL": "llama-3.1-8b-instant",
|
| 67 |
-
"ERMAP_NURSE_MODEL": "llama-3.
|
| 68 |
-
"ERMAP_PATIENT_MODEL": "llama-3.
|
| 69 |
}
|
| 70 |
|
| 71 |
|
|
|
|
| 64 |
"GROQ_EMPATHY_JUDGE_API_KEY": "",
|
| 65 |
"GROQ_MEDICAL_JUDGE_API_KEY": "",
|
| 66 |
"ERMAP_DOCTOR_MODEL": "llama-3.1-8b-instant",
|
| 67 |
+
"ERMAP_NURSE_MODEL": "llama-3.1-8b-instant",
|
| 68 |
+
"ERMAP_PATIENT_MODEL": "llama-3.1-8b-instant",
|
| 69 |
}
|
| 70 |
|
| 71 |
|
kaggle/KAGGLE.md
CHANGED
|
@@ -106,13 +106,27 @@ In practice: a single 12-hour session is usually enough to clear Phase 1 and pro
|
|
| 106 |
|
| 107 |
## Per-role Groq keys vs. one shared key
|
| 108 |
|
| 109 |
-
The dashboard ships with 4 distinct Groq clients (Nurse, Patient, Empathy Judge, Medical Judge) and a fallback chain that walks across all four if any fails auth.
|
| 110 |
|
| 111 |
-
|
| 112 |
-
- 1 free Groq key = 14 400 req/day on 8B-instant or 6 000 req/day on 70B-versatile.
|
| 113 |
-
- 120-episode training Γ 8 avg steps Γ 2 conversational LLM calls = ~2 000 calls. **Even one key is enough for a single training run**, but if you split across 4 keys you have 4Γ the daily headroom for re-runs.
|
| 114 |
|
| 115 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 116 |
|
| 117 |
---
|
| 118 |
|
|
|
|
| 106 |
|
| 107 |
## Per-role Groq keys vs. one shared key
|
| 108 |
|
| 109 |
+
The dashboard ships with 4 distinct Groq clients (Nurse, Patient, Empathy Judge, Medical Judge) and a fallback chain that walks across all four if any fails auth. Per-key budgets are *shared* on Groq's free tier (limits are per-account, not per-key) β but the model split below buys you real headroom because **each model has its own daily pool**.
|
| 110 |
|
| 111 |
+
### Default model assignment (traffic-shaping)
|
|
|
|
|
|
|
| 112 |
|
| 113 |
+
| Role | Model | Free-tier pool | Why |
|
| 114 |
+
|---|---|---|---|
|
| 115 |
+
| Nurse | `llama-3.1-8b-instant` | 14 400 RPD / 500K TPD | high-volume (every env step) |
|
| 116 |
+
| Patient | `llama-3.1-8b-instant` | shared 8B pool | high-volume (every env step) |
|
| 117 |
+
| Empathy Judge | `llama-3.3-70b-versatile` | 1 000 RPD / 100K TPD | grading quality directly shapes reward |
|
| 118 |
+
| Medical Judge | `llama-3.3-70b-versatile` | shared 70B pool | grading quality directly shapes reward |
|
| 119 |
+
|
| 120 |
+
Quick budget check for **one full 120-episode training run**:
|
| 121 |
+
|
| 122 |
+
| Pool | Estimated calls/run | Daily ceiling | Headroom |
|
| 123 |
+
|---|---|---|---|
|
| 124 |
+
| 8B-instant (Nurse + Patient) | ~2 880 | 14 400 RPD | ~5x |
|
| 125 |
+
| 70B-versatile (judges) | ~720 | 1 000 RPD | ~1.4x |
|
| 126 |
+
|
| 127 |
+
You can do **one training run per day per account** comfortably. If you need to retry inside the same day, drop one of the two judges to 8B-instant temporarily β the reward signal degrades a little, but training keeps moving.
|
| 128 |
+
|
| 129 |
+
If you only have **one** Groq key total, set just `GROQ_API_KEY` as a Kaggle Secret. Everything still works β the AgentRouter falls back to the same client for all roles, and the per-model budgets still split traffic across pools.
|
| 130 |
|
| 131 |
---
|
| 132 |
|
kaggle/train_ermap_grpo_kaggle.ipynb
CHANGED
|
@@ -226,8 +226,11 @@
|
|
| 226 |
"# Doctor-on-Kaggle is the LOCAL trained model, NOT a Groq call. The\n",
|
| 227 |
"# Doctor's Groq key is therefore unused here, but Nurse / Patient /\n",
|
| 228 |
"# Empathy Judge / Medical Judge all hit Groq once per env step.\n",
|
| 229 |
-
"
|
| 230 |
-
"
|
|
|
|
|
|
|
|
|
|
| 231 |
"os.environ[\"ERMAP_EMPATHY_JUDGE_MODEL\"] = \"llama-3.3-70b-versatile\"\n",
|
| 232 |
"os.environ[\"ERMAP_MEDICAL_JUDGE_MODEL\"] = \"llama-3.3-70b-versatile\"\n",
|
| 233 |
"\n",
|
|
|
|
| 226 |
"# Doctor-on-Kaggle is the LOCAL trained model, NOT a Groq call. The\n",
|
| 227 |
"# Doctor's Groq key is therefore unused here, but Nurse / Patient /\n",
|
| 228 |
"# Empathy Judge / Medical Judge all hit Groq once per env step.\n",
|
| 229 |
+
"# Traffic-shaping: high-volume roleplay agents (Nurse + Patient) on the\n",
|
| 230 |
+
"# 8B-instant pool (500K TPD, 14,400 RPD); the two judges stay on 70B-\n",
|
| 231 |
+
"# versatile because their grading quality directly shapes the reward.\n",
|
| 232 |
+
"os.environ[\"ERMAP_NURSE_MODEL\"] = \"llama-3.1-8b-instant\"\n",
|
| 233 |
+
"os.environ[\"ERMAP_PATIENT_MODEL\"] = \"llama-3.1-8b-instant\"\n",
|
| 234 |
"os.environ[\"ERMAP_EMPATHY_JUDGE_MODEL\"] = \"llama-3.3-70b-versatile\"\n",
|
| 235 |
"os.environ[\"ERMAP_MEDICAL_JUDGE_MODEL\"] = \"llama-3.3-70b-versatile\"\n",
|
| 236 |
"\n",
|