Biomni-R0-32B LoRA Adapter (Method B, Dequantized Corrected, Rank 256)

This is a LoRA adapter extracted using Method B - extracting from a dequantized (INT4 โ†’ BF16) version of the Biomni model for mathematically corrected weights.

Adapter Details

Parameter Value
Method Method B - Dequantized model LoRA extraction
Base Model Qwen/Qwen3-32B (dequantized BF16)
Fine-tuned Model biomni/Biomni-R0-32B-Preview
Rank (r) 256
Alpha 256
Target Modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Extraction Tool MergeKit (mergekit-extract-lora)

Method B Explanation

Unlike Method A (direct extraction), Method B uses a dequantized base model:

  1. Dequantize the quantized model (INT4 โ†’ BF16)
  2. Extract LoRA using the dequantized model as the new "base"
  3. Result: Mathematically corrected adapter that accounts for quantization artifacts

This approach can yield better results when the original base model weights have drifted during fine-tuning.

Usage with PEFT

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3-32B",
    device_map="auto",
    torch_dtype="auto",
    trust_remote_code=True
)

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "hassanshka/Biomni-R0-32B-LoRA-Dequantized-Rank256")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-32B")

# Inference
messages = [{"role": "user", "content": "Your biomedical question here"}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0]))

Usage with vLLM

from vllm import LLM, SamplingParams

llm = LLM(
    model="Qwen/Qwen3-32B",
    enable_lora=True,
    max_lora_rank=256
)

# Load LoRA at runtime
output = llm.generate(
    prompts,
    lora_request=LoRARequest("biomni", 1, "hassanshka/Biomni-R0-32B-LoRA-Dequantized-Rank256")
)

Extraction Process

# Step 1: Dequantize the base model (see dequant script in extraction_scripts/)
# Step 2: Extract LoRA
mergekit-extract-lora \
    --model "biomni/Biomni-R0-32B-Preview" \
    --base-model "./dequantized_bf16_model" \
    --out-path "./lora_output" \
    --max-rank 256 \
    --device cuda

License

Apache 2.0

Citation

If you use this adapter, please cite both the original Qwen3 and Biomni models.

Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for hassanshka/Biomni-R0-32B-LoRA-Dequantized-Rank256

Base model

Qwen/Qwen3-32B
Adapter
(315)
this model