Instructions to use ArgusForge/qwen2.5-14b-medmcqa-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use ArgusForge/qwen2.5-14b-medmcqa-lora with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("/workspace/qwen2.5-14b") model = PeftModel.from_pretrained(base_model, "ArgusForge/qwen2.5-14b-medmcqa-lora") - Notebooks
- Google Colab
- Kaggle
Qwen2.5-14B-Instruct, MedMCQA LoRA
A LoRA adapter for Qwen2.5-14B-Instruct fine-tuned on MedMCQA. This is a capability probe, not a product. The point is to show that a small LoRA on a 14B model you can actually own lifts a targeted domain task and still deploys on consumer hardware. It is not a clinical model and not medical advice. Read the limitations section before using it for anything.
Result
First-token logit scoring over the answer letters, on a held-out validation split, n=500, identical questions for the base model and the adapter (apples to apples).
| Accuracy | |
|---|---|
| Base Qwen2.5-14B-Instruct | 0.554 (277/500) |
| Base + this adapter | 0.650 (325/500) |
| Lift | +9.6 points (+17.3% relative) |
The lift is roughly 4 to 5 standard errors above baseline at n=500, so it is not sampling noise.
Training
- Base: Qwen/Qwen2.5-14B-Instruct
- Data: MedMCQA, 8000 training examples
- Method: LoRA (r=16, alpha=32, dropout=0.05) across all attention and MLP projections, bf16
- 400 steps, effective batch 16, lr 2e-4, cosine schedule
- Trainable parameters: 68.8M (0.46% of the model)
- Training loss: 2.52 down to 1.25
- Hardware: one H100 SXM 80GB, about 12 minutes
Deploy
Merged into the base and quantized to q4_K_M, the medical model is about 8.4 GB (4.87 bits per weight). It fits a 24GB RTX 3090 with two thirds of the card free, and fits a 12GB card. Served from the quantized file it produced coherent on-domain output at roughly 139 tokens/sec generation on an H100. Note: the quantized model's accuracy was not re-measured. The accuracy numbers above are the bf16 adapter, and a deploy-grade claim should re-score the q4_K_M file.
Use
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
base = "Qwen/Qwen2.5-14B-Instruct"
tok = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base, torch_dtype=torch.bfloat16, device_map="auto")
model = PeftModel.from_pretrained(model, "ArgusForge/qwen2.5-14b-medmcqa-lora")
Limitations and intended use
This is a research capability probe. It is not a medical product and not medical advice. Free-text generation from the model has produced clinically incorrect statements (a self-contradiction on hypoglycemia was observed during testing). Do not use it for diagnosis, treatment, or any clinical decision. The accuracy figure is a single run, single seed, on an n=500 held-out multiple-choice slice. A product-grade claim would need multi-seed confirmation, the full validation split, an external held-out set, and a clinical validation regime.
Reproduction
Public base model, public dataset (MedMCQA). The eval is first-token logit scoring over the option letters on a held-out n=500 split, with identical prompts for base and adapter.
- Downloads last month
- 24
Model tree for ArgusForge/qwen2.5-14b-medmcqa-lora
Dataset used to train ArgusForge/qwen2.5-14b-medmcqa-lora
Evaluation results
- Accuracy (held-out, n=500) on MedMCQAself-reported0.650