OpenMed × Autoresearch: Cross-Dataset Transfer Discovery

Autonomous discovery of optimal training curricula for biomedical NER, using Karpathy's autoresearch loop on OpenMed datasets.

Results (~120 experiments on RTX 4090)

Disease NER (NCBI Disease target — 93 experiments)

Configuration val_f1 (mean ± std) Improvement
Baseline (ncbi_disease only) 0.8033
+ bc5cdr_chem pretrain (50/50) 0.8470 +4.4%
+ 3-stage curriculum (chem→jnlpba→disease) 0.8535 ± 0.007 +5.0%

Chemical NER (BC5CDR-Chem target — 26 experiments)

Configuration val_f1 (mean ± std) Improvement
Baseline (bc5cdr_chem only) 0.8090 ± 0.005
+ 3-stage curriculum (jnlpba→disease→chem) 0.8195 ± 0.002 +1.3%

Transfer Affinity Matrix (ΔF1 from 50/50 pretrain→finetune)

Source ↓ · Target → ncbi_disease bc5cdr_chem
No pretrain (baseline) 0.8033 0.8090
bc5cdr_chem +0.044
ncbi_disease -0.001
jnlpba +0.013 -0.004
bc2gm -0.007 -0.001
linnaeus -0.033 +0.001

Reading the matrix: each cell shows the F1 change when pretraining on the source (row) before fine-tuning on the target (column). Bold = significant positive transfer.

Key Discovery: Asymmetric Transfer

Direction Improvement Pretrain Budget
Chemicals → Disease NER +5.0% 40% pretrain
Disease → Chemical NER +1.3% 15% pretrain

Transfer is 3.8x stronger from chemicals to diseases than the reverse. This asymmetry likely arises because (1) BC5CDR contains both chemical AND disease annotations, and (2) chemical entities are more lexically distinctive while disease entities benefit more from contextual pretraining.

Other key findings:

  1. Sequential 3-stage curriculum beats single-source and mixing approaches (both targets)
  2. Cosine scheduler is more reliable than constant_with_warmup (higher mean, lower variance)
  3. Batch size 64 is critical in time-budgeted training (3x GPU utilization)
  4. JNLPBA (proteins) helps both targets — proteins interact with both chemicals and diseases
  5. Negative transfer from species (Linnaeus) and gene-only (BC2GM) datasets
  6. Default BERT hyperparameters are near-optimal — 50+ tuning experiments found no improvement

See FINDINGS.md, FINDINGS_CHEM.md, and results.tsv for full analysis.

What this does

An AI agent (Claude Code) runs ~100 experiments overnight on your GPU, systematically exploring which biomedical NER datasets help each other through transfer learning. The target: maximize F1 on NCBI disease NER by finding the best cross-dataset pretraining curriculum.

Available datasets

Short name Source Entity types
bc5cdr_chem BC5CDR Chemicals, drugs
ncbi_disease NCBI Disease Disease mentions
bc2gm BC2GM Gene/protein mentions
jnlpba BioNLP 2004 DNA, RNA, cell lines, cell types, proteins
linnaeus Linnaeus Species

Setup

# 1. Clone and enter
git clone <this-repo>
cd openmed-autoresearch

# 2. Install dependencies
pip install -r requirements.txt

# 3. Prepare data (downloads & tokenizes all datasets)
python prepare.py

# 4. Verify
ls ~/.cache/openmed-autoresearch/
# Should see: bc5cdr_chem/ ncbi_disease/ bc2gm/ jnlpba/ linnaeus/ meta.json

# 5. Test baseline
python train.py
# Should print val_f1 and peak_vram_mb after ~5 minutes

# 6. Run autoresearch with Claude Code
claude --dangerously-skip-permissions
# Then tell it: "Read program.md and start the autoresearch loop"

Changing the target dataset

To evaluate on a different entity type (e.g., chemicals instead of diseases), edit TARGET_EVAL_DATASET in train.py. The curriculum exploration then discovers what helps that entity type.

Analyzing results

After a run, use analyze.py to generate the transfer affinity heatmap:

python analyze.py results.tsv

Output

  • results.tsv: Full experiment log with F1 scores
  • Git history on the autoresearch/<tag> branch: only the improvements
  • Transfer affinity insights from the experiment data

Credits

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for osatinsky/openmed-autoresearch

Finetuned
(1177)
this model

Datasets used to train osatinsky/openmed-autoresearch