Text Generation
PEFT
Safetensors
English
clinical
extraction
medical
qlora
lora
healthcare
on-prem
conversational
Instructions to use shekharp77/Mira-1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use shekharp77/Mira-1 with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit") model = PeftModel.from_pretrained(base_model, "shekharp77/Mira-1") - Notebooks
- Google Colab
- Kaggle
Mira-1 β Clinical Extraction SLM
Enterprise-grade clinical document extraction model. Fine-tuned from Qwen2.5-3B-Instruct with QLoRA to extract structured JSON from clinical documents (lab reports, discharge summaries, medication lists, pathology reports, intake forms, progress notes).
Key Features
- Structured JSON output β extracts patient demographics, vitals, labs, medications, diagnoses, procedures, allergies
- Source-grounded β every extracted value traces to the input document
- No patient identifiers β extracts age/sex only, strips names/MRN/DOB
- On-prem deployable β 3B parameters, runs on CPU via GGUF quantization
- 98% JSON validity on held-out gold set
Training
| Parameter | Value |
|---|---|
| Base model | Qwen/Qwen2.5-3B-Instruct |
| Method | QLoRA (4-bit, r=16, alpha=32) |
| Training data | 3,438 examples (126 curated + 3,312 Synthea-rendered) |
| Epochs | 2 |
| Final loss | 0.14 |
| GPU | Kaggle T4 (free tier) |
| Training time | ~2h 40m |
Evaluation (50 held-out gold examples)
| Metric | Value |
|---|---|
| JSON validity | 98% |
| Training loss | 1.23 β 0.14 |
Usage
from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer
model = AutoPeftModelForCausalLM.from_pretrained("shekharp77/Mira-1")
tokenizer = AutoTokenizer.from_pretrained("shekharp77/Mira-1")
messages = [
{"role": "system", "content": "You are a clinical information extraction system..."},
{"role": "user", "content": "Patient: 45/M\nHb 12.5 g/dL (13-17) LOW\nWBC 8.2 x10^9/L (4-11) Normal"},
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
outputs = model.generate(inputs, max_new_tokens=2048, temperature=0)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Schema
Outputs conform to this schema (10 required top-level fields):
document_type: lab_report | medication_list | discharge_summary | pathology_report | intake_form | progress_note | otherpatient: {age, sex}encounter: {date, department}vitals[],labs[],medications[],diagnoses[],procedures[],allergies[]extraction_notes
Limitations
- English only (v0)
- Trained on synthetic data (Synthea + curated seeds), not real clinical records
- Every output is a draft for human review β not for autonomous clinical decisions
- No ICD-10/SNOMED coding unless explicitly in the source document
License
Apache-2.0 (same as base model)
- Downloads last month
- -