Instructions to use delight2004/lfm2-1.2b-sermon-instruct-qlora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use delight2004/lfm2-1.2b-sermon-instruct-qlora with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="delight2004/lfm2-1.2b-sermon-instruct-qlora")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("delight2004/lfm2-1.2b-sermon-instruct-qlora")
model = AutoModelForCausalLM.from_pretrained("delight2004/lfm2-1.2b-sermon-instruct-qlora")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use delight2004/lfm2-1.2b-sermon-instruct-qlora with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "delight2004/lfm2-1.2b-sermon-instruct-qlora"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "delight2004/lfm2-1.2b-sermon-instruct-qlora",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/delight2004/lfm2-1.2b-sermon-instruct-qlora

SGLang

How to use delight2004/lfm2-1.2b-sermon-instruct-qlora with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "delight2004/lfm2-1.2b-sermon-instruct-qlora" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "delight2004/lfm2-1.2b-sermon-instruct-qlora",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "delight2004/lfm2-1.2b-sermon-instruct-qlora" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "delight2004/lfm2-1.2b-sermon-instruct-qlora",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use delight2004/lfm2-1.2b-sermon-instruct-qlora with Docker Model Runner:
```
docker model run hf.co/delight2004/lfm2-1.2b-sermon-instruct-qlora
```

lfm2-1.2b-sermon-instruct-qlora

Author: Delight Aheebwa
Contact: Please contact via Hugging Face or GitHub profile (delight2004)

Model Overview

Type: Causal Language Model (LM)
Base Model: LiquidAI/LFM2-1.2B
Fine-tuning technique: QLoRA with PEFT (LoRA adapters)
Language(s): English only
Intended Use: Research, educational, and sermon content generation on Christian and theological topics (especially inspired by John Piper's teachings).
Tags: Uganda, theology, Christianity
License: OpenRAIL Non-Commercial Variant

Dataset & Training

Data source: Transcripts of YouTube sermons by John Piper (excluding "Ask Pastor John" podcast transcripts)
Filtered dataset size: 165 entries after filtering (~10% set aside for validation)
Preprocessing: Splitting and curation as detailed in the training notebook
Training details:
- Hardware: Google Colab free tier (T4 GPU)
- epochs: 4
- batch size: 1 (gradient_accumulation_steps=4)
- learning rate: 2e-5
- sequence length: 512
- quantization: 4-bit (bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16)
- Only LoRA adapter params were trained (~0.05% of total)
- Full trainer/config code: see Colab notebook above

Evaluation

No formal evaluation/benchmarking was conducted. Use at your own discretion – feedback and community tests are welcome.

Limitations & Disclaimer

Not intended for production or commercial use.
Outputs should not be treated as official theological advice.
Possible biases and limitations inherited from the dataset/model base – may reflect the original preacher's views.
Model may hallucinate or generate plausible but incorrect theological claims or references.

Technical

Architecture: Causal Transformer (1.2B params, LiquidAI flavor)
Adapter config: PEFT/QLoRA
Training framework: Hugging Face Transformers, TRL, PEFT, bitsandbytes, PyTorch
Compute: Google Colab T4 (free tier, single GPU)
Notebook: john_piper.ipynb

Citation

If you use this model, please cite it or reference its Hugging Face page, and acknowledge John Piper's YouTube sermons as the data source.

Downloads last month: 6

Safetensors

Model size

1B params

Tensor type

BF16

Model tree for delight2004/lfm2-1.2b-sermon-instruct-qlora

Base model

LiquidAI/LFM2-1.2B

Finetuned

(66)

this model