LlamaTron-RS1-Rolex
Fine-tuned version of Meta's Llama-3.2-1B-Instruct on the ReasonMed dataset (370K high-quality medical reasoning examples) using LoRA. The model naturally exhibits clear step-by-step Chain-of-Thought (CoT) reasoning on medical multiple-choice and open-ended questions.
This repository provides the merged weights and a GGUF file in FP16 format for efficient local inference.
Key Features
- Parameter-efficient fine-tuning with LoRA (~0.1–0.3% of parameters updated)
- Full support for ReasonMed chat-template conversations
- Mixed-precision training (FP16)
- Observable CoT medical reasoning
- GGUF file in FP16 format for local inference (llama.cpp, Ollama, LM Studio, etc.)
Important Disclaimer
This model is for research, education, and prototyping purposes only.
It is not a medical device, diagnostic tool, or substitute for professional clinical judgment. Always consult qualified healthcare professionals for medical decisions.
Dataset
ReasonMed – the largest publicly available medical reasoning dataset (as of 2025)
- Paper: ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning
- Size: 370,000 high-quality reasoning examples
- Generation: Multi-agent LLM pipeline + Error Refiner + EMD curation
- Format: JSONL with role-based conversation turns
- License: Follow the terms set by the ReasonMed authors
Training Details
- Base model: meta-llama/Llama-3.2-1B-Instruct
- Method: LoRA (rank=8, alpha=16, dropout=0.05)
- Target modules: q_proj, k_proj, v_proj, o_proj
- Optimizer: Adafactor
- Hyperparameters:
- Epochs: 3
- Global batch size: 16 (per-device 4 + gradient accumulation 4)
- Learning rate: 2e-4
- Warmup steps: 20
- Max sequence length: 512
- Hardware: NVIDIA H100 (rented via JarvisLabs.ai)
Post-Training Steps
- Merged LoRA adapters into base model
- Converted to GGUF (FP16)
Files in this Repository
llama3.2-1b-medical-reasonmed-fp16.gguf
Note: This is the FP16 GGUF file. Users can further quantize it locally using llama.cpp (e.g., to Q4_K_M, Q5_K_M, or Q8_0) for smaller file sizes and faster inference on lower-end hardware.
Inference Example (llama.cpp)
./llama.cpp/main \
-m llama3.2-1b-medical-reasonmed-fp16.gguf \
--color --temp 0.7 --top-p 0.9 \
-p "A patient presents with fever, cough, and shortness of breath. What is the most appropriate initial investigation?\nA. ECG\nB. Chest X-ray\nC. Blood culture\nD. CT pulmonary angiogram"
Limitations
1B-parameter model → best for lightweight / edge use cases
Reasoning quality lags behind larger (7B–70B) medical models
No additional instruction-tuning or preference optimization (DPO/ORPO) yet
Future Work
DPO / ORPO alignment
Fine-tuning on larger bases (Llama-3.2-3B, Meditron, etc.)
Formal evaluation on MedQA, PubMedQA, MMLU-clinical
License
Code (training/merging scripts): MIT (see GitHub repo)
Base model: Meta Llama 3.2 Community License
Fine-tuned weights & GGUF files: Same as base model + ReasonMed dataset terms
Acknowledgments
ReasonMed authors (Yu Sun et al.)
Meta AI for Llama-3.2
JarvisLabs.ai for affordable H100 access
Hugging Face, PEFT, and llama.cpp contributors
For questions or collaboration, open an issue on the linked GitHub repository or reach out via LinkedIn.
- Downloads last month
- 82
We're not able to determine the quantization variants.
Model tree for Rumiii/LlamaTron-RS1-Rolex
Base model
meta-llama/Llama-3.2-1B-Instruct