Biomni-R0-32B LoRA Adapter (Method B, Dequantized Corrected, Rank 256)

This is a LoRA adapter extracted using Method B - extracting from a dequantized (INT4 โ†’ BF16) version of the Biomni model for mathematically corrected weights.

Adapter Details

Parameter Value
Method Method B - Dequantized model LoRA extraction
Base Model Qwen/Qwen3-32B (dequantized BF16)
Fine-tuned Model biomni/Biomni-R0-32B-Preview
Rank (r) 256
Alpha 256
Target Modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Extraction Tool MergeKit (mergekit-extract-lora)

Method B Explanation

Unlike Method A (direct extraction), Method B uses a dequantized base model:

  1. Dequantize the quantized model (INT4 โ†’ BF16)
  2. Extract LoRA using the dequantized model as the new "base"
  3. Result: Mathematically corrected adapter that accounts for quantization artifacts

This approach can yield better results when the original base model weights have drifted during fine-tuning.

Usage with PEFT

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3-32B",
    device_map="auto",
    torch_dtype="auto",
    trust_remote_code=True
)

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "hassanshka/Biomni-R0-32B-LoRA-Dequantized-Rank256")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-32B")

# Inference
messages = [{"role": "user", "content": "Your biomedical question here"}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0]))

Usage with vLLM

from vllm import LLM, SamplingParams

llm = LLM(
    model="Qwen/Qwen3-32B",
    enable_lora=True,
    max_lora_rank=256
)

# Load LoRA at runtime
output = llm.generate(
    prompts,
    lora_request=LoRARequest("biomni", 1, "hassanshka/Biomni-R0-32B-LoRA-Dequantized-Rank256")
)

Extraction Process

# Step 1: Dequantize the base model (see dequant script in extraction_scripts/)
# Step 2: Extract LoRA
mergekit-extract-lora \
    --model "biomni/Biomni-R0-32B-Preview" \
    --base-model "./dequantized_bf16_model" \
    --out-path "./lora_output" \
    --max-rank 256 \
    --device cuda

License

Apache 2.0

Citation

If you use this adapter, please cite both the original Qwen3 and Biomni models.

Downloads last month
40
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for hassanshka/Biomni-R0-32B-LoRA-Dequantized-Rank256

Base model

Qwen/Qwen3-32B
Adapter
(211)
this model