File size: 2,039 Bytes
2904873 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | # Ollama Modelfile β Phi-4 Multimodal Instruct Q4_K_M
# Optimised for: Intel 11th Gen NUC, 8 GB RAM, CPU-only
#
# Source model : microsoft/Phi-4-multimodal-instruct
# License : MIT https://huggingface.co/microsoft/Phi-4-multimodal-instruct/blob/main/LICENSE
# Quantization : Q4_K_M via llama.cpp llama-quantize
# Architecture : phi3 (3.8B LLM backbone + vision/speech adapters in base GGUF)
FROM ./phi4-mm-Q4_K_M.gguf
# ββ Context & KV cache βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# 8 192 tokens balances capability vs RAM on 8 GB hardware.
# Lower to 4096 if you observe OOM / heavy swapping.
PARAMETER num_ctx 8192
# ββ CPU tuning βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# 11th Gen NUC typically 4 cores / 8 logical threads (i5/i7-1135G7 / 1165G7).
# Reduce to 4 if the NUC is a Core i3 variant.
PARAMETER num_thread 8
# No discrete GPU β all layers run on CPU.
PARAMETER num_gpu 0
# Flash attention is a GPU feature; disable for CPU inference.
PARAMETER flash_attn false
# ββ Generation defaults βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER repeat_penalty 1.1
PARAMETER stop "<|end|>"
PARAMETER stop "<|user|>"
PARAMETER stop "<|assistant|>"
# ββ System prompt βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
SYSTEM """You are a helpful, accurate, and concise AI assistant. You excel at reasoning, analysis, writing, coding, and answering questions. Be direct and thorough."""
|