OpenMed × Autoresearch: Cross-Dataset Transfer Discovery
Autonomous discovery of optimal training curricula for biomedical NER, using Karpathy's autoresearch loop on OpenMed datasets.
Results (~120 experiments on RTX 4090)
Disease NER (NCBI Disease target — 93 experiments)
| Configuration | val_f1 (mean ± std) | Improvement |
|---|---|---|
| Baseline (ncbi_disease only) | 0.8033 | — |
| + bc5cdr_chem pretrain (50/50) | 0.8470 | +4.4% |
| + 3-stage curriculum (chem→jnlpba→disease) | 0.8535 ± 0.007 | +5.0% |
Chemical NER (BC5CDR-Chem target — 26 experiments)
| Configuration | val_f1 (mean ± std) | Improvement |
|---|---|---|
| Baseline (bc5cdr_chem only) | 0.8090 ± 0.005 | — |
| + 3-stage curriculum (jnlpba→disease→chem) | 0.8195 ± 0.002 | +1.3% |
Transfer Affinity Matrix (ΔF1 from 50/50 pretrain→finetune)
| Source ↓ · Target → | ncbi_disease | bc5cdr_chem |
|---|---|---|
| No pretrain (baseline) | 0.8033 | 0.8090 |
| bc5cdr_chem | +0.044 | — |
| ncbi_disease | — | -0.001 |
| jnlpba | +0.013 | -0.004 |
| bc2gm | -0.007 | -0.001 |
| linnaeus | -0.033 | +0.001 |
Reading the matrix: each cell shows the F1 change when pretraining on the source (row) before fine-tuning on the target (column). Bold = significant positive transfer.
Key Discovery: Asymmetric Transfer
| Direction | Improvement | Pretrain Budget |
|---|---|---|
| Chemicals → Disease NER | +5.0% | 40% pretrain |
| Disease → Chemical NER | +1.3% | 15% pretrain |
Transfer is 3.8x stronger from chemicals to diseases than the reverse. This asymmetry likely arises because (1) BC5CDR contains both chemical AND disease annotations, and (2) chemical entities are more lexically distinctive while disease entities benefit more from contextual pretraining.
Other key findings:
- Sequential 3-stage curriculum beats single-source and mixing approaches (both targets)
- Cosine scheduler is more reliable than constant_with_warmup (higher mean, lower variance)
- Batch size 64 is critical in time-budgeted training (3x GPU utilization)
- JNLPBA (proteins) helps both targets — proteins interact with both chemicals and diseases
- Negative transfer from species (Linnaeus) and gene-only (BC2GM) datasets
- Default BERT hyperparameters are near-optimal — 50+ tuning experiments found no improvement
See FINDINGS.md, FINDINGS_CHEM.md, and results.tsv for full analysis.
What this does
An AI agent (Claude Code) runs ~100 experiments overnight on your GPU, systematically exploring which biomedical NER datasets help each other through transfer learning. The target: maximize F1 on NCBI disease NER by finding the best cross-dataset pretraining curriculum.
Available datasets
| Short name | Source | Entity types |
|---|---|---|
bc5cdr_chem |
BC5CDR | Chemicals, drugs |
ncbi_disease |
NCBI Disease | Disease mentions |
bc2gm |
BC2GM | Gene/protein mentions |
jnlpba |
BioNLP 2004 | DNA, RNA, cell lines, cell types, proteins |
linnaeus |
Linnaeus | Species |
Setup
# 1. Clone and enter
git clone <this-repo>
cd openmed-autoresearch
# 2. Install dependencies
pip install -r requirements.txt
# 3. Prepare data (downloads & tokenizes all datasets)
python prepare.py
# 4. Verify
ls ~/.cache/openmed-autoresearch/
# Should see: bc5cdr_chem/ ncbi_disease/ bc2gm/ jnlpba/ linnaeus/ meta.json
# 5. Test baseline
python train.py
# Should print val_f1 and peak_vram_mb after ~5 minutes
# 6. Run autoresearch with Claude Code
claude --dangerously-skip-permissions
# Then tell it: "Read program.md and start the autoresearch loop"
Changing the target dataset
To evaluate on a different entity type (e.g., chemicals instead of diseases), edit TARGET_EVAL_DATASET in train.py. The curriculum exploration then discovers what helps that entity type.
Analyzing results
After a run, use analyze.py to generate the transfer affinity heatmap:
python analyze.py results.tsv
Output
results.tsv: Full experiment log with F1 scores- Git history on the
autoresearch/<tag>branch: only the improvements - Transfer affinity insights from the experiment data
Credits
- OpenMed by Maziyar Panahi — datasets and models
- autoresearch by Andrej Karpathy — the autonomous experiment loop pattern
- Base model: ModernBERT-base
Model tree for osatinsky/openmed-autoresearch
Base model
answerdotai/ModernBERT-base