Biomni-R0-32B LoRA Adapter (Method A, Rank 256)
This is a LoRA adapter extracted from the Biomni-R0-32B model using the original Qwen3-32B as the base model.
Adapter Details
| Parameter | Value |
|---|---|
| Method | Method A - Direct LoRA extraction |
| Base Model | Qwen/Qwen3-32B |
| Fine-tuned Model | biomni/Biomni-R0-32B-Preview |
| Rank (r) | 256 |
| Alpha | 256 |
| Target Modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Extraction Tool | MergeKit (mergekit-extract-lora) |
Usage with PEFT
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen3-32B",
device_map="auto",
torch_dtype="auto",
trust_remote_code=True
)
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "hassanshka/Biomni-R0-32B-LoRA-Rank256")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-32B")
# Inference
messages = [{"role": "user", "content": "Your biomedical question here"}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0]))
Usage with vLLM
from vllm import LLM, SamplingParams
llm = LLM(
model="Qwen/Qwen3-32B",
enable_lora=True,
max_lora_rank=256
)
# Load LoRA at runtime
output = llm.generate(
prompts,
lora_request=LoRARequest("biomni", 1, "hassanshka/Biomni-R0-32B-LoRA-Rank256")
)
Extraction Process
The LoRA was extracted using MergeKit:
mergekit-extract-lora \
--model "biomni/Biomni-R0-32B-Preview" \
--base-model "Qwen/Qwen3-32B" \
--out-path "./lora_output" \
--max-rank 256 \
--device cuda
License
Apache 2.0
Citation
If you use this adapter, please cite both the original Qwen3 and Biomni models.
- Downloads last month
- 33
Model tree for hassanshka/Biomni-R0-32B-LoRA-Rank256
Base model
Qwen/Qwen3-32B