Biomni-R0-32B LoRA Adapter (Method B, Dequantized Corrected, Rank 256)
This is a LoRA adapter extracted using Method B - extracting from a dequantized (INT4 โ BF16) version of the Biomni model for mathematically corrected weights.
Adapter Details
| Parameter | Value |
|---|---|
| Method | Method B - Dequantized model LoRA extraction |
| Base Model | Qwen/Qwen3-32B (dequantized BF16) |
| Fine-tuned Model | biomni/Biomni-R0-32B-Preview |
| Rank (r) | 256 |
| Alpha | 256 |
| Target Modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Extraction Tool | MergeKit (mergekit-extract-lora) |
Method B Explanation
Unlike Method A (direct extraction), Method B uses a dequantized base model:
- Dequantize the quantized model (INT4 โ BF16)
- Extract LoRA using the dequantized model as the new "base"
- Result: Mathematically corrected adapter that accounts for quantization artifacts
This approach can yield better results when the original base model weights have drifted during fine-tuning.
Usage with PEFT
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen3-32B",
device_map="auto",
torch_dtype="auto",
trust_remote_code=True
)
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "hassanshka/Biomni-R0-32B-LoRA-Dequantized-Rank256")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-32B")
# Inference
messages = [{"role": "user", "content": "Your biomedical question here"}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0]))
Usage with vLLM
from vllm import LLM, SamplingParams
llm = LLM(
model="Qwen/Qwen3-32B",
enable_lora=True,
max_lora_rank=256
)
# Load LoRA at runtime
output = llm.generate(
prompts,
lora_request=LoRARequest("biomni", 1, "hassanshka/Biomni-R0-32B-LoRA-Dequantized-Rank256")
)
Extraction Process
# Step 1: Dequantize the base model (see dequant script in extraction_scripts/)
# Step 2: Extract LoRA
mergekit-extract-lora \
--model "biomni/Biomni-R0-32B-Preview" \
--base-model "./dequantized_bf16_model" \
--out-path "./lora_output" \
--max-rank 256 \
--device cuda
License
Apache 2.0
Citation
If you use this adapter, please cite both the original Qwen3 and Biomni models.
- Downloads last month
- 40
Model tree for hassanshka/Biomni-R0-32B-LoRA-Dequantized-Rank256
Base model
Qwen/Qwen3-32B