Instructions to use rkevan/leader-comment-summarizer with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use rkevan/leader-comment-summarizer with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="rkevan/leader-comment-summarizer")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("rkevan/leader-comment-summarizer", dtype="auto")

llama-cpp-python

How to use rkevan/leader-comment-summarizer with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="rkevan/leader-comment-summarizer",
	filename="model-q4km.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use rkevan/leader-comment-summarizer with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf rkevan/leader-comment-summarizer
# Run inference directly in the terminal:
llama-cli -hf rkevan/leader-comment-summarizer

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf rkevan/leader-comment-summarizer
# Run inference directly in the terminal:
llama-cli -hf rkevan/leader-comment-summarizer

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf rkevan/leader-comment-summarizer
# Run inference directly in the terminal:
./llama-cli -hf rkevan/leader-comment-summarizer

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf rkevan/leader-comment-summarizer
# Run inference directly in the terminal:
./build/bin/llama-cli -hf rkevan/leader-comment-summarizer

Use Docker

docker model run hf.co/rkevan/leader-comment-summarizer

LM Studio
Jan

vLLM

How to use rkevan/leader-comment-summarizer with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "rkevan/leader-comment-summarizer"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "rkevan/leader-comment-summarizer",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/rkevan/leader-comment-summarizer

SGLang

How to use rkevan/leader-comment-summarizer with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "rkevan/leader-comment-summarizer" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "rkevan/leader-comment-summarizer",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "rkevan/leader-comment-summarizer" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "rkevan/leader-comment-summarizer",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use rkevan/leader-comment-summarizer with Ollama:
```
ollama run hf.co/rkevan/leader-comment-summarizer
```

Unsloth Studio new

How to use rkevan/leader-comment-summarizer with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for rkevan/leader-comment-summarizer to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for rkevan/leader-comment-summarizer to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for rkevan/leader-comment-summarizer to start chatting

Pi new

How to use rkevan/leader-comment-summarizer with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf rkevan/leader-comment-summarizer

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "rkevan/leader-comment-summarizer"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use rkevan/leader-comment-summarizer with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf rkevan/leader-comment-summarizer

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default rkevan/leader-comment-summarizer

Run Hermes

hermes

Docker Model Runner
How to use rkevan/leader-comment-summarizer with Docker Model Runner:
```
docker model run hf.co/rkevan/leader-comment-summarizer
```

Lemonade

How to use rkevan/leader-comment-summarizer with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull rkevan/leader-comment-summarizer

Run and chat with the model

lemonade run user.leader-comment-summarizer-{{QUANT_TAG}}

List all available models

lemonade list

leader-comment-summarizer — Ecclesiastical Comment Summarization (GGUF)

A fine-tuned Llama 3.2 3B Instruct model that summarizes ecclesiastical leader comments into concise, assignment-relevant summaries for missionary placement meetings. Strips endorsement boilerplate, focuses on actionable details (languages, health, skills, concerns).

Model Details

Property	Value
Base model	`meta-llama/Llama-3.2-3B-Instruct`
Fine-tuning method	QLoRA via Unsloth (rank=16, alpha=32)
Training framework	TRL SFTTrainer, completion-only loss
Training data	1,464 PII-obfuscated leader comments with gold-standard summaries
Quantization	Q4_K_M (1.9 GB) via llama.cpp
VRAM requirement	~3 GB (Q4_K_M)
Output format	30-40 word plain-text summary

Files

File	Size	Description
`model-q4km.gguf`	1.9 GB	Q4_K_M quantization (recommended)
`Modelfile`	—	Ollama Modelfile with system prompt embedded
`system_prompt.txt`	—	System prompt (for API usage without Modelfile)

Quick Start — Ollama

# Download the GGUF and Modelfile, then:
ollama create leader-summarizer -f Modelfile

# Call via API:
curl -s http://localhost:11434/api/chat -d '{
  "model": "leader-summarizer",
  "stream": false,
  "messages": [
    {"role": "user", "content": "[[Name]] is a wonderful young man with a strong testimony. He speaks fluent Spanish from living in [[City]] for three years. Has mild anxiety that is well-managed with medication. Very independent and hardworking. Parents served in the [[Mission]] mission."}
  ]
}'

Expected response:

Fluent Spanish from three years in a Spanish-speaking city. Mild anxiety, well-managed with medication. Independent and hardworking. Family mission service background.

Quick Start — Python

from llama_cpp import Llama

llm = Llama(model_path="model-q4km.gguf", n_ctx=2048, n_gpu_layers=-1)
response = llm.create_chat_completion(
    messages=[
        {"role": "system", "content": open("system_prompt.txt").read()},
        {"role": "user", "content": leader_comment_text},
    ],
    temperature=0.3,
    top_p=0.9,
    max_tokens=128,
)
print(response["choices"][0]["message"]["content"])

Input/Output Format

Input: Raw leader comment text (may contain PII placeholders like [[Name]], [[City]]).

Output: A 30-40 word plain-text summary focusing on assignment-relevant details.

What the Model Keeps

Languages spoken and proficiency
Health/medical conditions and management
Specific skills (musical, technical, athletic)
Concerns about independence or readiness
Personality traits affecting placement
Service preferences

What the Model Strips

General endorsement ("strong testimony", "wonderful young man")
Worthiness/recommend statements
Boilerplate language that applies to all candidates

Important Usage Notes

The Modelfile embeds the system prompt. When using Ollama with the provided Modelfile, you don't need to send a separate system message — just send the comment as the user message.
If using the raw GGUF (without Modelfile), include system_prompt.txt as the system message in every request.
Temperature 0.3 produces consistent, focused summaries. Higher values introduce variability.
max_tokens 128 is sufficient — summaries are typically 30-40 words.

Training Details

Method: QLoRA with Unsloth on WSL2 Ubuntu 24.04
GPU: NVIDIA RTX 1000 Ada (6 GB VRAM)
Epochs: 3
Learning rate: 2e-4 with cosine scheduler
Effective batch size: 8 (batch=2, grad_accum=4)
Final training loss: 0.4296
Final eval loss: 0.7495
Loss type: Completion-only (only trains on assistant response tokens)
LoRA targets: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

Evaluation Results (258 held-out examples)

Metric	Fine-tuned	Baseline (untuned 3B)
Word count avg	36.4	33.9
In 25-45 word range	69.0%	91.9%
Endorsement boilerplate leak	10.1%	18.6%
Format compliance	100%	100%

Key win: the fine-tuned model filters endorsement boilerplate significantly better (10% vs 19% leak rate).

Privacy Note

All training data was PII-obfuscated before use. Names, locations, schools, wards, and missions are replaced with [[Name]], [[City]], etc. The model has never seen real PII during training.

Limitations

Trained on a specific style of ecclesiastical leader comments. May not generalize to other summarization tasks without additional training.
Endorsement leak rate is 10% — some boilerplate still passes through.
Word count compliance (69% in 25-45 range) is lower than the untuned model (92%), though this is a tradeoff for better filtering.

Source Code

Training scripts and data pipeline: github.com/rkevan/AI-Experiments

Citation

@misc{leader-comment-summarizer-2026,
  title={leader-comment-summarizer: Fine-tuned Llama 3.2 3B for Ecclesiastical Comment Summarization},
  author={Robert Kevan},
  year={2026},
  url={https://huggingface.co/rkevan/leader-comment-summarizer}
}

Downloads last month: 129

GGUF

Model size

3B params

Architecture

llama

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Model tree for rkevan/leader-comment-summarizer

Base model

meta-llama/Llama-3.2-3B-Instruct

Quantized

(453)

this model