Instructions to use dilr/Mira-Q2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use dilr/Mira-Q2 with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit") model = PeftModel.from_pretrained(base_model, "dilr/Mira-Q2") - Notebooks
- Google Colab
- Kaggle
Mira-Q2 β Clinical Extraction SLM (v2)
By DILR β Enterprise-grade clinical document extraction. Reads documents, outputs structured, source-grounded JSON. Deployed on-prem.
Comprehensive Evaluation (782 docs across 4 test sets)
| Eval Set | N | Type | JSON Validity | Identifier Leak | Field-F1 |
|---|---|---|---|---|---|
| test_gold | 200 | Same distribution (held-out) | 100.0% [1.0-1.0] | 0.0% | 1.000 [0.999-1.0] |
| synthetic_v2 | 150 | Different formatting dialect | 100.0% [1.0-1.0] | 0.0% | n/a (unlabeled) |
| extraction_relevant | 150 | Real physician docs (on-schema) | 94.7% [90.7-98.0] | 0.0% | n/a (unlabeled) |
| mtsamples | 282 | Real physician docs (39 specialties) | 85.8% [81.9-89.7] | 0.0% | n/a (unlabeled) |
95% bootstrap CIs (1000 resamples). Zero identifier leaks across all 782 documents.
Three-Way Comparison
| Model | Training Data | Validity (test_gold) | F1 (test_gold) |
|---|---|---|---|
| Qwen2.5-3B zero-shot | β | 0% (invents own schema) | 0.0 |
| Mira-Q1 (v1) | 3,438 examples | 98% (50-example eval) | β |
| Mira-Q2 (this model) | 8,400 examples | 100% (200-example eval) | 1.000 |
Training
| Parameter | Value |
|---|---|
| Base model | Qwen/Qwen2.5-3B-Instruct (via Unsloth) |
| Method | QLoRA (4-bit, r=16, alpha=32) |
| Training data | 8,400 examples (6,400 gold-by-construction + 2,000 schema variants) |
| Data sources | Real ICD-10 codes (71K), NLM drug names, curated lab reference ranges |
| Schema variants | Renamed fields, dropped fields, minimal schemas (for generalization) |
| Epochs | 2 |
| Final train loss | 0.132 |
| Final eval loss | 0.142 |
| Overfit gap | 0.010 (healthy) |
Loss Curve
Step 50: 1.0723 (epoch 0.1)
Step 200: 0.1556 (epoch 0.4)
Step 525: 0.1414 (epoch 1.0) β checkpoint
Step 750: 0.1318 (epoch 1.4) β lowest
Step 1050: 0.1320 (epoch 2.0) β final
Eval: 0.1418 (epoch 2.0)
What's New vs Mira-Q1
- 2.4x more training data (8,400 vs 3,438)
- Gold-by-construction data β real ICD-10 codes, NLM drugs, real lab reference ranges (not Synthea-rendered)
- Schema-variant training β 2,000 examples with modified schemas for schema-as-input generalization
- 8% lower loss (0.132 vs 0.143)
- 100% validity on 200-example gold eval (vs 98% on 50 examples)
- Comprehensive eval on 782 docs including real physician dictations
- Zero identifier leaks verified across all test sets
Synthetic-to-Real Gap
The honest finding: Mira-Q2 scores 100% on training-distribution data but 86% on general real physician prose (MTSamples). This is expected for a model trained on synthetic data β it learned our generator's patterns well but struggles with document types it never saw (operative notes, physical exams). The gap narrows to ~5% on on-schema real docs (94.7%).
This gap closes with: real partner data retraining (v1), broader document type coverage in training, and OCR pipeline integration.
Usage
# IMPORTANT: Load with Unsloth (not standard PeftModel β quantization mismatch)
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="dilr/Mira-Q2",
max_seq_length=4096,
dtype=torch.float16,
load_in_4bit=True,
)
FastLanguageModel.for_inference(model)
messages = [
{"role": "system", "content": "You are a clinical information extraction system..."},
{"role": "user", "content": "Patient: 45/M\nHb 12.5 g/dL (13-17) LOW\nWBC 8.2 x10^9/L (4-11) Normal"},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=1024, do_sample=False,
eos_token_id=[tokenizer.eos_token_id,
tokenizer.convert_tokens_to_ids("<|im_end|>")])
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
Note: Do NOT load with PeftModel.from_pretrained(base, "dilr/Mira-Q2") β the adapter was trained with Unsloth's quantization which differs from standard bitsandbytes. Use FastLanguageModel as shown above.
Schema
Extracts 10 required fields:
document_type: lab_report | medication_list | discharge_summary | pathology_report | intake_form | progress_note | otherpatient: {age, sex} β de-identified, never includes names/MRNencounter: {date (ISO), department}vitals[],labs[],medications[],diagnoses[],procedures[],allergies[]extraction_notes
Architecture: Schema-as-Input
Mira-Q2 is trained with schema-variant examples β the model learns to follow any extraction schema injected in the system prompt, not just the clinical one. This enables customer onboarding with zero code changes (schema file + seed examples only).
Eval Data
The eval/ directory contains:
comprehensive_scorecard.jsonβ full results with bootstrap CIstest_gold_200_result.jsonβ test_gold scorecardmtsamples_282_result.jsonβ real MTSamples probeextraction_relevant_150_result.jsonβ on-schema real docssynthetic_v2_150_result.jsonβ format robustness probe
Limitations
- English only
- Trained on synthetic data β real clinical document retraining improves accuracy (v1 with design partner)
- 86% validity on general real docs (39 specialties) β strongest on lab/discharge/med types it was trained on
- Every output is a draft for human review β not for autonomous clinical decisions
- Must load with Unsloth (not vanilla PeftModel)
License
Apache-2.0 (same as base model)
- Downloads last month
- -