Text Generation
Transformers
Safetensors
English
phi3
phi
phi4
nlp
math
code
chat
conversational
custom_code
text-generation-inference
4-bit precision
bitsandbytes
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("fhamborg/phi-4-4bit-bnb", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("fhamborg/phi-4-4bit-bnb", trust_remote_code=True)
messages = [
{"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))Quick Links
Phi-4 GPTQ (4-bit Quantized)
Model Description
This is a 4-bit quantized version of the Phi-4 transformer model, optimized for efficient inference while maintaining performance.
- Base Model: Phi-4
- Quantization: bnb (4-bit)
- Format:
safetensors - Tokenizer: Uses standard
vocab.jsonandmerges.txt
Intended Use
- Fast inference with minimal VRAM usage
- Deployment in resource-constrained environments
- Optimized for low-latency text generation
Model Details
| Attribute | Value |
|---|---|
| Model Name | Phi-4 GPTQ |
| Quantization | 4-bit (GPTQ) |
| File Format | .safetensors |
| Tokenizer | phi-4-tokenizer.json |
| VRAM Usage | ~X GB (depending on batch size) |
- Downloads last month
- 3
Model tree for fhamborg/phi-4-4bit-bnb
Base model
microsoft/phi-4
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="fhamborg/phi-4-4bit-bnb", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)