MedMCQ — Three-Hop Pipeline (Usage Guide)

MedMCQ is an experiment in answering medical multiple-choice questions with a collection of small, narrow models instead of one large generalist. A raw MCQ is routed by a subject classifier, then refined by a per-subject topic classifier, then answered by a per-subject generator that returns the correct option and a brief clinical explanation. The full pipeline is 31 fine-tuned Qwen3 models — 1 router, 15 topic classifiers, 15 answer generators — all published under the stravoris org and grouped into the MedMCQ Medical Models collection. This repository is the landing page and runnable usage guide for that pipeline; it ships only a README and no weights.

Architecture

                ┌──────────────────────────────────────┐
   raw MCQ ───▶ │ Hop 1 — Subject classifier (0.6B)   │ ─▶ subject
                │ stravoris/medmcq-subject-classifier  │   (e.g. "Pharmacology")
                └──────────────────────────────────────┘
                                 │
                                 ▼
                ┌──────────────────────────────────────┐
                │ Hop 2 — Topic classifier (0.6B)     │ ─▶ topic
                │ stravoris/medmcq-<subject>-classifier│   (e.g. "Beta-blockers")
                └──────────────────────────────────────┘
                                 │
                                 ▼
                ┌──────────────────────────────────────┐
                │ Hop 3 — Answer generator (1.7B)     │ ─▶ answer + explanation
                │ stravoris/medmcq-<subject>           │
                └──────────────────────────────────────┘

Each hop is a separate, narrow model. Hops 2 and 3 are picked by name based on what hop 1 returned.

End-to-end example

The snippet below runs the full pipeline on a single MCQ. All three models run on CPU; no GPU is required.

from transformers import AutoTokenizer, AutoModelForCausalLM

ORG = "stravoris"

SUBJECT_TO_SLUG = {
    "Anaesthesia": "anaesthesia",
    "Anatomy": "anatomy",
    "Biochemistry": "biochemistry",
    "Cell Biology & Histology": "cell-biology",
    "ENT": "ent",
    "Genetics": "genetics",
    "Microbiology": "microbiology",
    "Obstetrics & Gynaecology": "obstetrics-gynaecology",
    "Ophthalmology": "ophthalmology",
    "Orthopaedics": "orthopaedics",
    "Pathology": "pathology",
    "Pharmacology": "pharmacology",
    "Physiology": "physiology",
    "Psychiatry": "psychiatry",
    "Radiology": "radiology",
}


def load(repo):
    tok = AutoTokenizer.from_pretrained(repo)
    mdl = AutoModelForCausalLM.from_pretrained(repo)
    return tok, mdl


def generate(tok, mdl, prompt, max_new_tokens):
    inputs = tok(prompt, return_tensors="pt").to(mdl.device)
    out = mdl.generate(**inputs, max_new_tokens=max_new_tokens, do_sample=False)
    return tok.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True).strip()


def resolve_subject(raw):
    for label, slug in SUBJECT_TO_SLUG.items():
        if raw.startswith(label):
            return label, slug
    raise ValueError(f"Unrecognised subject label from router: {raw!r}")


question = "Which artery supplies the head of the femur in adults?"
options = {
    "A": "Obturator artery",
    "B": "Medial circumflex femoral artery",
    "C": "Lateral circumflex femoral artery",
    "D": "Superior gluteal artery",
}
options_block = "\n".join(f"{k}) {v}" for k, v in options.items())

# Hop 1 — subject routing
sub_tok, sub_mdl = load(f"{ORG}/medmcq-subject-classifier-qwen3-0.6b")
sub_prompt = (
    "Classify the following medical MCQ by subject.\n\n"
    f"Question: {question}\n{options_block}\n\nSubject:"
)
subject_raw = generate(sub_tok, sub_mdl, sub_prompt, max_new_tokens=8)
subject, subject_slug = resolve_subject(subject_raw)
print(f"[hop 1] subject: {subject}")

# Hop 2 — topic classification within that subject
top_tok, top_mdl = load(f"{ORG}/medmcq-{subject_slug}-classifier-qwen3-0.6b")
top_prompt = (
    f"Classify the following {subject} MCQ by topic.\n\n"
    f"Question: {question}\n{options_block}\n\nTopic:"
)
topic = generate(top_tok, top_mdl, top_prompt, max_new_tokens=16)
print(f"[hop 2] topic:   {topic}")

# Hop 3 — answer generation
gen_tok, gen_mdl = load(f"{ORG}/medmcq-{subject_slug}-qwen3-1.7b")
gen_prompt = (
    "Answer the following medical question. Provide the correct option and a "
    "brief explanation.\n\n"
    f"Topic: {topic}\n\n"
    f"Question: {question}\n\n"
    f"Options:\n{options_block}"
)
answer = generate(gen_tok, gen_mdl, gen_prompt, max_new_tokens=256)
print(f"[hop 3] answer:\n{answer}")

The router returns one of the 15 subject labels listed below; the topic classifier and generator are selected from that label.

All 31 models

Hop 1 — Subject router (1 model)

Subject Base Repo
(all 15 subjects) Qwen3-0.6B stravoris/medmcq-subject-classifier-qwen3-0.6b

Hop 2 — Topic classifiers (15 models, Qwen3-0.6B)

Hop 3 — Answer generators (15 models, Qwen3-1.7B)

Collection and dataset

  • CollectionMedMCQ Medical Models (all 31 models grouped on the Hub).
  • Datasetstravoris/medical-mcq-dataset (the single educational MCQ dataset every model was fine-tuned on; subject/topic slices feed the corresponding classifier and generator).

Hardware

Every model in the pipeline runs on CPU; no GPU is required.

Model class Base Approx. RAM
Subject classifier, topic classifiers Qwen3-0.6B ~2 GB
Answer generators Qwen3-1.7B ~4 GB

A full three-hop call loads one 0.6B router, one 0.6B topic classifier, and one 1.7B generator — comfortably handled on a typical laptop. You can keep them resident in memory across calls, or load lazily based on the routed subject if RAM is tight.

What this pipeline is not

These are sample models for demonstration, released as a public reference for the three-hop architecture. They have not been formally benchmarked and must not be used to make clinical decisions or provide medical advice. Returned answers and explanations may contain factual errors. A clinician should review every output before any educational use.

License

Apache 2.0 across all 31 models and this guide. Base models (Qwen/Qwen3-0.6B and Qwen/Qwen3-1.7B from Alibaba's Qwen team) retain their original licenses too.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support