Instructions to use Rumiii/LlamaTron-RS1-Rolex with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Rumiii/LlamaTron-RS1-Rolex with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Rumiii/LlamaTron-RS1-Rolex")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Rumiii/LlamaTron-RS1-Rolex", dtype="auto")

llama-cpp-python

How to use Rumiii/LlamaTron-RS1-Rolex with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Rumiii/LlamaTron-RS1-Rolex",
	filename="llama3.2-1b-medical-reasonmed-fp16.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use Rumiii/LlamaTron-RS1-Rolex with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Rumiii/LlamaTron-RS1-Rolex
# Run inference directly in the terminal:
llama-cli -hf Rumiii/LlamaTron-RS1-Rolex

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Rumiii/LlamaTron-RS1-Rolex
# Run inference directly in the terminal:
llama-cli -hf Rumiii/LlamaTron-RS1-Rolex

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Rumiii/LlamaTron-RS1-Rolex
# Run inference directly in the terminal:
./llama-cli -hf Rumiii/LlamaTron-RS1-Rolex

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Rumiii/LlamaTron-RS1-Rolex
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Rumiii/LlamaTron-RS1-Rolex

Use Docker

docker model run hf.co/Rumiii/LlamaTron-RS1-Rolex

LM Studio
Jan

vLLM

How to use Rumiii/LlamaTron-RS1-Rolex with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Rumiii/LlamaTron-RS1-Rolex"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Rumiii/LlamaTron-RS1-Rolex",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Rumiii/LlamaTron-RS1-Rolex

SGLang

How to use Rumiii/LlamaTron-RS1-Rolex with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Rumiii/LlamaTron-RS1-Rolex" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Rumiii/LlamaTron-RS1-Rolex",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Rumiii/LlamaTron-RS1-Rolex" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Rumiii/LlamaTron-RS1-Rolex",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use Rumiii/LlamaTron-RS1-Rolex with Ollama:
```
ollama run hf.co/Rumiii/LlamaTron-RS1-Rolex
```

Unsloth Studio

How to use Rumiii/LlamaTron-RS1-Rolex with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Rumiii/LlamaTron-RS1-Rolex to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Rumiii/LlamaTron-RS1-Rolex to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Rumiii/LlamaTron-RS1-Rolex to start chatting

How to use Rumiii/LlamaTron-RS1-Rolex with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Rumiii/LlamaTron-RS1-Rolex

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "Rumiii/LlamaTron-RS1-Rolex"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use Rumiii/LlamaTron-RS1-Rolex with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Rumiii/LlamaTron-RS1-Rolex

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default Rumiii/LlamaTron-RS1-Rolex

Run Hermes

hermes

Docker Model Runner
How to use Rumiii/LlamaTron-RS1-Rolex with Docker Model Runner:
```
docker model run hf.co/Rumiii/LlamaTron-RS1-Rolex
```

Lemonade

How to use Rumiii/LlamaTron-RS1-Rolex with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Rumiii/LlamaTron-RS1-Rolex

Run and chat with the model

lemonade run user.LlamaTron-RS1-Rolex-{{QUANT_TAG}}

List all available models

lemonade list

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Info

LlamaTron-RS1-Rolex

Fine-tuned version of Meta's Llama-3.2-1B-Instruct on the ReasonMed dataset (370K high-quality medical reasoning examples) using LoRA. The model naturally exhibits clear step-by-step Chain-of-Thought (CoT) reasoning on medical multiple-choice and open-ended questions.

This repository provides the merged weights and a GGUF file in FP16 format for efficient local inference.

Key Features

Parameter-efficient fine-tuning with LoRA (~0.1–0.3% of parameters updated)
Full support for ReasonMed chat-template conversations
Mixed-precision training (FP16)
Observable CoT medical reasoning
GGUF file in FP16 format for local inference (llama.cpp, Ollama, LM Studio, etc.)

Important Disclaimer

This model is for research, education, and prototyping purposes only.
It is not a medical device, diagnostic tool, or substitute for professional clinical judgment. Always consult qualified healthcare professionals for medical decisions.

Dataset

ReasonMed – the largest publicly available medical reasoning dataset (as of 2025)

Paper: ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning
Size: 370,000 high-quality reasoning examples
Generation: Multi-agent LLM pipeline + Error Refiner + EMD curation
Format: JSONL with role-based conversation turns
License: Follow the terms set by the ReasonMed authors

Training Details

Base model: meta-llama/Llama-3.2-1B-Instruct
Method: LoRA (rank=8, alpha=16, dropout=0.05)
Target modules: q_proj, k_proj, v_proj, o_proj
Optimizer: Adafactor
Hyperparameters:
- Epochs: 3
- Global batch size: 16 (per-device 4 + gradient accumulation 4)
- Learning rate: 2e-4
- Warmup steps: 20
- Max sequence length: 512
Hardware: NVIDIA H100 (rented via JarvisLabs.ai)

Post-Training Steps

Merged LoRA adapters into base model
Converted to GGUF (FP16)

Files in this Repository

llama3.2-1b-medical-reasonmed-fp16.gguf

Note: This is the FP16 GGUF file. Users can further quantize it locally using llama.cpp (e.g., to Q4_K_M, Q5_K_M, or Q8_0) for smaller file sizes and faster inference on lower-end hardware.

Inference Example (llama.cpp)

./llama.cpp/main \
  -m llama3.2-1b-medical-reasonmed-fp16.gguf \
  --color --temp 0.7 --top-p 0.9 \
  -p "A patient presents with fever, cough, and shortness of breath. What is the most appropriate initial investigation?\nA. ECG\nB. Chest X-ray\nC. Blood culture\nD. CT pulmonary angiogram"
Limitations

1B-parameter model → best for lightweight / edge use cases
Reasoning quality lags behind larger (7B–70B) medical models
No additional instruction-tuning or preference optimization (DPO/ORPO) yet

Future Work

DPO / ORPO alignment
Fine-tuning on larger bases (Llama-3.2-3B, Meditron, etc.)
Formal evaluation on MedQA, PubMedQA, MMLU-clinical

License

Code (training/merging scripts): MIT (see GitHub repo)
Base model: Meta Llama 3.2 Community License
Fine-tuned weights & GGUF files: Same as base model + ReasonMed dataset terms

Acknowledgments

ReasonMed authors (Yu Sun et al.)
Meta AI for Llama-3.2
JarvisLabs.ai for affordable H100 access
Hugging Face, PEFT, and llama.cpp contributors

For questions or collaboration, open an issue on the linked GitHub repository or reach out via LinkedIn.

Downloads last month: -

GGUF

Model size

1B params

Architecture

llama

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Model tree for Rumiii/LlamaTron-RS1-Rolex

Base model

meta-llama/Llama-3.2-1B-Instruct

Adapter

(647)

this model

Paper for Rumiii/LlamaTron-RS1-Rolex

ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning

Paper • 2506.09513 • Published Jun 11, 2025 • 103