Duplicated from Mungert/Phi-4-mini-instruct.gguf

EasierAI
/

Phi-4-Mini-3.8B

Model card Files Files and versions

Phi-4-Mini-3.8B / README.md

drkameleon's picture

Duplicate from Mungert/Phi-4-mini-instruct.gguf

c8e078b verified 12 months ago

|

history blame contribute delete

1.16 kB

	---
	license: mit
	---

	# Phi-4-mini-instruct GGUF Models

	This repository contains the Phi-4-mini-instruct model quantized using a specialized branch of llama.cpp:
	🔗 [ns3284/llama.cpp](https://github.com/ns3284/llama.cpp/tree/master)

	Special thanks to [@nisparks](https://github.com/nisparks) for adding support for Phi-4-mini-instruct in llama.cpp.
	This branch is expected to be merged into the master branch soon, so once that happens, it's recommended to use the main llama.cpp repository instead.

	---

	## Included Files

	### `phi-4-mini-bf16.gguf`
	- Model weights preserved in BF16.
	- Use this if you want to requantize the model into a different format.

	### `phi-4-mini-bf16-q8.gguf`
	- Output & embeddings remain in BF16.
	- All other layers quantized to Q8_0.

	### `phi-4-mini-q4_k_l.gguf`
	- Output & embeddings quantized to Q8_0.
	- All other layers quantized to Q4_K.
	- Note: No custom matrix quantization applied, so default llama.cpp quantization settings are used.

	### `phi-4-mini-q6_k.gguf`
	- All layers quantized to Q6_K, using default quantization settings.