Qwen-72B-Math-NF4

NF4 quantized Qwen2.5-Math-72B-Instruct for mathematical reasoning.

Quantization

  • Method: bitsandbytes NF4 with double quantization
  • Compute dtype: bfloat16
  • Original model: Qwen/Qwen2.5-Math-72B-Instruct

Memory Requirements

Setup VRAM
Single GPU ~40GB
2x GPU ~20GB each

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)

model = AutoModelForCausalLM.from_pretrained(
    "aphoticshaman/qwen-72b-math-nf4",
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("aphoticshaman/qwen-72b-math-nf4")

prompt = "Prove that the sum of first n integers is n(n+1)/2."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Intended Use

  • AIMO/ARC Prize mathematical reasoning
  • Olympiad problem solving
  • Step-by-step proofs
  • Numerical computation

Author

Ryan J Cardwell X @Benthic_Shadow Zenodo.org aphoticshaman huggingface aphoticshaman

Downloads last month
1
Safetensors
Model size
73B params
Tensor type
BF16
F32
U8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for aphoticshaman/qwen-72b-math-nf4

Base model

Qwen/Qwen2.5-72B
Quantized
(13)
this model