LlamaTron-RS1-Rolex

Fine-tuned version of Meta's Llama-3.2-1B-Instruct on the ReasonMed dataset (370K high-quality medical reasoning examples) using LoRA. The model naturally exhibits clear step-by-step Chain-of-Thought (CoT) reasoning on medical multiple-choice and open-ended questions.

This repository provides the merged weights and a GGUF file in FP16 format for efficient local inference.

Key Features

  • Parameter-efficient fine-tuning with LoRA (~0.1–0.3% of parameters updated)
  • Full support for ReasonMed chat-template conversations
  • Mixed-precision training (FP16)
  • Observable CoT medical reasoning
  • GGUF file in FP16 format for local inference (llama.cpp, Ollama, LM Studio, etc.)

Important Disclaimer

This model is for research, education, and prototyping purposes only.
It is not a medical device, diagnostic tool, or substitute for professional clinical judgment. Always consult qualified healthcare professionals for medical decisions.

Dataset

ReasonMed – the largest publicly available medical reasoning dataset (as of 2025)

Training Details

  • Base model: meta-llama/Llama-3.2-1B-Instruct
  • Method: LoRA (rank=8, alpha=16, dropout=0.05)
  • Target modules: q_proj, k_proj, v_proj, o_proj
  • Optimizer: Adafactor
  • Hyperparameters:
    • Epochs: 3
    • Global batch size: 16 (per-device 4 + gradient accumulation 4)
    • Learning rate: 2e-4
    • Warmup steps: 20
    • Max sequence length: 512
  • Hardware: NVIDIA H100 (rented via JarvisLabs.ai)

Post-Training Steps

  1. Merged LoRA adapters into base model
  2. Converted to GGUF (FP16)

Files in this Repository

  • llama3.2-1b-medical-reasonmed-fp16.gguf

Note: This is the FP16 GGUF file. Users can further quantize it locally using llama.cpp (e.g., to Q4_K_M, Q5_K_M, or Q8_0) for smaller file sizes and faster inference on lower-end hardware.

Inference Example (llama.cpp)

./llama.cpp/main \
  -m llama3.2-1b-medical-reasonmed-fp16.gguf \
  --color --temp 0.7 --top-p 0.9 \
  -p "A patient presents with fever, cough, and shortness of breath. What is the most appropriate initial investigation?\nA. ECG\nB. Chest X-ray\nC. Blood culture\nD. CT pulmonary angiogram"
Limitations

1B-parameter model → best for lightweight / edge use cases
Reasoning quality lags behind larger (7B–70B) medical models
No additional instruction-tuning or preference optimization (DPO/ORPO) yet

Future Work

DPO / ORPO alignment
Fine-tuning on larger bases (Llama-3.2-3B, Meditron, etc.)
Formal evaluation on MedQA, PubMedQA, MMLU-clinical

License

Code (training/merging scripts): MIT (see GitHub repo)
Base model: Meta Llama 3.2 Community License
Fine-tuned weights & GGUF files: Same as base model + ReasonMed dataset terms

Acknowledgments

ReasonMed authors (Yu Sun et al.)
Meta AI for Llama-3.2
JarvisLabs.ai for affordable H100 access
Hugging Face, PEFT, and llama.cpp contributors

For questions or collaboration, open an issue on the linked GitHub repository or reach out via LinkedIn.
Downloads last month
82
GGUF
Model size
1B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Rumiii/LlamaTron-RS1-Rolex

Adapter
(545)
this model

Paper for Rumiii/LlamaTron-RS1-Rolex