How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="rkevan/leader-comment-summarizer",
	filename="model-q4km.gguf",
)
llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

leader-comment-summarizer โ€” Ecclesiastical Comment Summarization (GGUF)

A fine-tuned Llama 3.2 3B Instruct model that summarizes ecclesiastical leader comments into concise, assignment-relevant summaries for missionary placement meetings. Strips endorsement boilerplate, focuses on actionable details (languages, health, skills, concerns).

Model Details

Property Value
Base model meta-llama/Llama-3.2-3B-Instruct
Fine-tuning method QLoRA via Unsloth (rank=16, alpha=32)
Training framework TRL SFTTrainer, completion-only loss
Training data 1,464 PII-obfuscated leader comments with gold-standard summaries
Quantization Q4_K_M (1.9 GB) via llama.cpp
VRAM requirement ~3 GB (Q4_K_M)
Output format 30-40 word plain-text summary

Files

File Size Description
model-q4km.gguf 1.9 GB Q4_K_M quantization (recommended)
Modelfile โ€” Ollama Modelfile with system prompt embedded
system_prompt.txt โ€” System prompt (for API usage without Modelfile)

Quick Start โ€” Ollama

# Download the GGUF and Modelfile, then:
ollama create leader-summarizer -f Modelfile

# Call via API:
curl -s http://localhost:11434/api/chat -d '{
  "model": "leader-summarizer",
  "stream": false,
  "messages": [
    {"role": "user", "content": "[[Name]] is a wonderful young man with a strong testimony. He speaks fluent Spanish from living in [[City]] for three years. Has mild anxiety that is well-managed with medication. Very independent and hardworking. Parents served in the [[Mission]] mission."}
  ]
}'

Expected response:

Fluent Spanish from three years in a Spanish-speaking city. Mild anxiety, well-managed with medication. Independent and hardworking. Family mission service background.

Quick Start โ€” Python

from llama_cpp import Llama

llm = Llama(model_path="model-q4km.gguf", n_ctx=2048, n_gpu_layers=-1)
response = llm.create_chat_completion(
    messages=[
        {"role": "system", "content": open("system_prompt.txt").read()},
        {"role": "user", "content": leader_comment_text},
    ],
    temperature=0.3,
    top_p=0.9,
    max_tokens=128,
)
print(response["choices"][0]["message"]["content"])

Input/Output Format

Input: Raw leader comment text (may contain PII placeholders like [[Name]], [[City]]).

Output: A 30-40 word plain-text summary focusing on assignment-relevant details.

What the Model Keeps

  • Languages spoken and proficiency
  • Health/medical conditions and management
  • Specific skills (musical, technical, athletic)
  • Concerns about independence or readiness
  • Personality traits affecting placement
  • Service preferences

What the Model Strips

  • General endorsement ("strong testimony", "wonderful young man")
  • Worthiness/recommend statements
  • Boilerplate language that applies to all candidates

Important Usage Notes

  • The Modelfile embeds the system prompt. When using Ollama with the provided Modelfile, you don't need to send a separate system message โ€” just send the comment as the user message.
  • If using the raw GGUF (without Modelfile), include system_prompt.txt as the system message in every request.
  • Temperature 0.3 produces consistent, focused summaries. Higher values introduce variability.
  • max_tokens 128 is sufficient โ€” summaries are typically 30-40 words.

Training Details

  • Method: QLoRA with Unsloth on WSL2 Ubuntu 24.04
  • GPU: NVIDIA RTX 1000 Ada (6 GB VRAM)
  • Epochs: 3
  • Learning rate: 2e-4 with cosine scheduler
  • Effective batch size: 8 (batch=2, grad_accum=4)
  • Final training loss: 0.4296
  • Final eval loss: 0.7495
  • Loss type: Completion-only (only trains on assistant response tokens)
  • LoRA targets: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

Evaluation Results (258 held-out examples)

Metric Fine-tuned Baseline (untuned 3B)
Word count avg 36.4 33.9
In 25-45 word range 69.0% 91.9%
Endorsement boilerplate leak 10.1% 18.6%
Format compliance 100% 100%

Key win: the fine-tuned model filters endorsement boilerplate significantly better (10% vs 19% leak rate).

Privacy Note

All training data was PII-obfuscated before use. Names, locations, schools, wards, and missions are replaced with [[Name]], [[City]], etc. The model has never seen real PII during training.

Limitations

  • Trained on a specific style of ecclesiastical leader comments. May not generalize to other summarization tasks without additional training.
  • Endorsement leak rate is 10% โ€” some boilerplate still passes through.
  • Word count compliance (69% in 25-45 range) is lower than the untuned model (92%), though this is a tradeoff for better filtering.

Source Code

Training scripts and data pipeline: github.com/rkevan/AI-Experiments

Citation

@misc{leader-comment-summarizer-2026,
  title={leader-comment-summarizer: Fine-tuned Llama 3.2 3B for Ecclesiastical Comment Summarization},
  author={Robert Kevan},
  year={2026},
  url={https://huggingface.co/rkevan/leader-comment-summarizer}
}
Downloads last month
129
GGUF
Model size
3B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for rkevan/leader-comment-summarizer

Quantized
(453)
this model