llm.create_chat_completion(
messages = [
{
"role": "user",
"content": "What is the capital of France?"
}
]
)Phi-3 Mini (GGUF Quantized - Q4_K_M)
Model Description
This repository contains a quantized GGUF version of the Phi-3 Mini 128K Instruct model.
- Base Model: microsoft/Phi-3-mini-128k-instruct
- Format: GGUF
- Quantization: Q4_K_M
- Framework: llama.cpp
This model is optimized for efficient local inference with reduced memory usage.
Intended Use
- Local LLM inference
- Chatbots
- Lightweight deployments
- CPU/GPU inference using llama.cpp
Model Details
Base Model
- Microsoft Phi-3 Mini (128K context)
Conversion
- Converted from Hugging Face format โ GGUF
Quantization
- Method: Q4_K_M
- Tool: llama.cpp quantize
This reduces model size while maintaining reasonable performance.
How to Use
Using llama.cpp
./llama-cli -m model-q4.gguf -p "Explain AI simply"
- Downloads last month
- 139
Hardware compatibility
Log In to add your hardware
We're not able to determine the quantization variants.
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="senthilsdglakhsg/phi3-Q4.gguf", filename="phi3-Q4.gguf", )