How to use from
llama.cpp
Install from brew
brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Friehub/fwen-14b-v2:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf Friehub/fwen-14b-v2:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Friehub/fwen-14b-v2:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf Friehub/fwen-14b-v2:Q4_K_M
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Friehub/fwen-14b-v2:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf Friehub/fwen-14b-v2:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Friehub/fwen-14b-v2:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Friehub/fwen-14b-v2:Q4_K_M
Use Docker
docker model run hf.co/Friehub/fwen-14b-v2:Q4_K_M
Quick Links

Fwen-14B-v2

Friehub + Qwen โ€” a fine-tuned 14B software engineering and CS tutor.

Base Model

Qwen2.5-14B-Instruct

Training

  • LoRA rank 8, alpha 16
  • 4-bit NF4 QLoRA with Unsloth
  • ~7K high-quality instruction pairs
  • 15 task types: code explanation, debugging, review, generation, complexity analysis, testing, modernization, full implementation, code completion, production scenarios, synthesis, diagrams, quizzes
  • Data mix: 40% code, 20% debug, 25% design, 15% docs
  • 2 epochs on A100-40GB (~164 min)

Capabilities

  • Explain CS concepts from 70+ textbooks
  • Write production-grade code in Python, Go, Rust, JS, TS, Java, C
  • Debug and review code
  • Analyze algorithm complexity
  • Synthesize across multiple sources
  • Generate Mermaid diagrams

Files

  • fwen-14b-q4_k_m-v2.gguf โ€” Q4_K_M quantization (8 GB, production)
  • fwen-14b-q8_0-v2.gguf โ€” Q8_0 quantization (14 GB, benchmark)
Downloads last month
53
GGUF
Model size
15B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

4-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support