How to use from the
Use from the
Transformers library
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="kaitchup/Phi-3.5-Mini-instruct-AutoRound-4bit", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("kaitchup/Phi-3.5-Mini-instruct-AutoRound-4bit", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("kaitchup/Phi-3.5-Mini-instruct-AutoRound-4bit", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))
Quick Links

Model Details

This is microsoft/Phi-3.5-mini-instruct quantized with AutoRound to 4-bit and symmetric quantization for compatibility with Marlin. The model has been created, tested, and evaluated by The Kaitchup.

Details on quantization process, evaluation, and how to use the model here: Fine-tuning Phi-3.5 MoE and Mini on Your Computer

  • Developed by: The Kaitchup
  • Language(s) (NLP): English
  • License: cc-by-4.0
Downloads last month
3
Safetensors
Model size
4B params
Tensor type
I32
·
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support