How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Friehub/fwen-14b-v1",
	filename="",
)
llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Fwen-14B-v1

Friehub + Qwen β€” a fine-tuned 14B software engineering and CS tutor.

Trained on 7,106 instruction pairs extracted from 76 textbooks spanning algorithms, system design, networking, databases, compilers, and programming languages (Python, Go, Rust, JavaScript, C++). Answers are grounded in real engineering source material, not internet-scale guesswork.

Model Details

Property Value
Base Model Qwen2.5-14B-Instruct
Fine-tuning QLoRA (4-bit NF4), LoRA rank 8, alpha 16, dropout 0.05
Training Data 7,106 high-quality QA pairs (balanced, deduplicated, task-validated)
Task Types 15 modes: code explanation, debugging, review, generation, complexity analysis, testing, modernization, full implementation, code completion, production scenarios, cross-source synthesis, diagram generation, prose, math, quizzes
Data Mix 40% code, 20% debug, 25% design, 15% docs/Q&A
Epochs 2
Effective Batch Size 32 (micro-batch 1 Γ— 32 accum)
Learning Rate 1e-4
Warmup 5%
GPU A100-40GB
Training Time 164 min
Language Coverage Python, Go, Rust, JavaScript, TypeScript, Java, C, C++, SQL

Evaluation (50 held-out QA pairs)

Metric Value
Average QA Score 3.3/5
Hallucination Rate ~2%
Strongest Domains Algorithms, Databases, Linux, Web APIs
Weakest Domains Interview prep, abstract design principles

Scored by human audit against ground-truth answers generated from the same source textbooks.

Capabilities

  • Explain CS concepts from 70+ textbooks with source attribution
  • Write production-grade code in Python, Go, Rust, JS, TS, Java, C with modern syntax (2024 editions)
  • Debug and review code β€” find subtle bugs, suggest improvements with rationale
  • Analyze algorithm complexity β€” trace loops, recursion, data structures
  • Synthesize across books β€” when sources disagree, attribute each position
  • Generate Mermaid diagrams β€” architecture, sequences, class hierarchies
  • Multi-turn tutoring β€” conversational teaching with follow-ups

Files

File Size Use
fwen-14b-q4_k_m-v1.gguf 8 GB Production serving (Q4_K_M quantization)
fwen-14b-q8_0-v1.gguf 14 GB Benchmark evaluation (Q8_0 quantization)

Usage

Ollama

ollama create fwen:14b -f Modelfile
ollama run fwen:14b

llama.cpp

./llama-cli -m fwen-14b-q4_k_m-v1.gguf -p "Explain Rust ownership" -n 512

HF Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("friehub/fwen-14b-v1")

Limitations

  • Trained on 7K pairs β€” smaller than typical fine-tune datasets. Expect adequate but not expert-level depth.
  • Rust/Go underrepresented in training data (v2 will address this).
  • Source attribution is sometimes vague ("the text emphasizes" vs "Chapter 4 of Code Simplicity states").

Training Pipeline

  1. PDF ingestion β†’ domain classification β†’ code/math/prose unit extraction (212,616 units)
  2. 15 task-type instruction pairs generated by DeepSeek V4 Pro/Flash
  3. Quality filter: length, dedup, relevance (cosine similarity), task validators
  4. Language balance: capped at 25% Python code pairs
  5. QLoRA fine-tune with Unsloth on A100-40GB

Citation

@model{fwen-14b-v1,
  author = {Friehub},
  title = {Fwen-14B: A Fine-Tuned Software Engineering Tutor},
  version = {v1},
  year = 2026,
  url = {https://huggingface.co/friehub/fwen-14b-v1}
}
Downloads last month
66
GGUF
Model size
15B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

4-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support