microsoft__Phi-4-reasoning-plus_RTN_w3g128

This is a 3-bit RTN (Round-To-Nearest) quantized version of microsoft/Phi-4-reasoning-plus.

Quantization Details

Method: RTN (Round-To-Nearest)
Bits: 3-bit
Group Size: 128
Base Model: microsoft/Phi-4-reasoning-plus

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "quantpa/microsoft__Phi-4-reasoning-plus_RTN_w3g128"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

# Use the model for inference

Model Details

Quantization: RTN 3-bit
Original Model: microsoft/Phi-4-reasoning-plus
Quantized by: quantpa

Downloads last month: 3

Safetensors

Model size

15B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for quantpa/microsoft__Phi-4-reasoning-plus_RTN_w3g128

Base model

microsoft/phi-4

Finetuned

microsoft/Phi-4-reasoning-plus

Finetuned

(8)

this model