Phi-4 Technical Report
Paper • 2412.08905 • Published • 123
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf cortexso/phi-4:# Run inference directly in the terminal:
llama-cli -hf cortexso/phi-4:# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf cortexso/phi-4:# Run inference directly in the terminal:
./llama-cli -hf cortexso/phi-4:git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf cortexso/phi-4:# Run inference directly in the terminal:
./build/bin/llama-cli -hf cortexso/phi-4:docker model run hf.co/cortexso/phi-4:Phi-4 model, a state-of-the-art 14B parameter Transformer designed for advanced reasoning, conversational AI, and high-quality text generation. Built on a mix of synthetic datasets, filtered public domain content, academic books, and Q&A datasets, Phi-4 ensures exceptional performance through data quality and alignment. It features a 16K token context length, trained on 9.8T tokens over 21 days using 1920 H100-80G GPUs. Phi-4 underwent rigorous fine-tuning and preference optimization to enhance instruction adherence and safety. Released on December 12, 2024, it represents a static model with data cutoff as of June 2024, suitable for diverse applications in research and dialogue systems.
| No | Variant | Cortex CLI command |
|---|---|---|
| 1 | Phi-4-14b | cortex run phi-4:14b |
cortexso/phi-4
cortex run phi-4
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
Install from brew
# Start a local OpenAI-compatible server with a web UI: llama-server -hf cortexso/phi-4:# Run inference directly in the terminal: llama-cli -hf cortexso/phi-4: