How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="KenWu/LeLM-GGUF",
	filename="LeLM-Q4_K_M.gguf",
)
llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

LeLM-GGUF

GGUF quantization of KenWu/LeLM, an NBA take analysis model fine-tuned on Qwen3-8B.

Available Quantizations

File Quant Size Description
LeLM-Q4_K_M.gguf Q4_K_M 4.7 GB Best balance of quality and size

Usage with Ollama

Create a Modelfile:

FROM ./LeLM-Q4_K_M.gguf

PARAMETER temperature 0.7
PARAMETER top_p 0.9

SYSTEM You are LeLM, an expert NBA analyst. Fact-check basketball takes using real statistics. Be direct, witty, and back everything with numbers.

Then run:

ollama create lelm -f Modelfile
ollama run lelm "Fact check: LeBron is washed"

Usage with llama.cpp

llama-cli -m LeLM-Q4_K_M.gguf -p "Fact check this NBA take: Steph Curry is the GOAT" -n 512

Model Details

  • Base model: Qwen3-8B
  • Fine-tuning: LoRA (r=64, alpha=128) with SFT on NBA take analysis data
  • Training: 3 epochs, 915 steps, final loss 0.288
  • LoRA adapter: KenWu/LeLM

Part of LeGM-Lab

This model powers LeGM-Lab, an LLM-powered NBA take analysis and roasting bot.

Downloads last month
2
GGUF
Model size
8B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for KenWu/LeLM-GGUF

Finetuned
Qwen/Qwen3-8B
Quantized
(276)
this model