Instructions to use cfontes/Aesop-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use cfontes/Aesop-v1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="cfontes/Aesop-v1")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("cfontes/Aesop-v1")
model = AutoModelForCausalLM.from_pretrained("cfontes/Aesop-v1")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use cfontes/Aesop-v1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "cfontes/Aesop-v1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "cfontes/Aesop-v1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/cfontes/Aesop-v1

SGLang

How to use cfontes/Aesop-v1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "cfontes/Aesop-v1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "cfontes/Aesop-v1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "cfontes/Aesop-v1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "cfontes/Aesop-v1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use cfontes/Aesop-v1 with Docker Model Runner:
```
docker model run hf.co/cfontes/Aesop-v1
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Aesop v1

A reasoning-focused, safety-aligned adaptation of GLM-5.2.

Aesop v1 is a two-stage (SFT → DPO) adaptation of the flagship zai-org/GLM-5.2 Mixture-of-Experts model. It preserves GLM-5.2's full reasoning and coding capability while systematically shaping safety-relevant behavior through targeted, low-rank parameter updates on the model's upper attention layers.

AESOP — Ablation-engineered Safety Enhancement via Systematic Operation Pruning — refers to the method: safety-relevant behavior is identified and adjusted by touching only a small, precisely-scoped set of expert/attention circuits, leaving the base model's knowledge and reasoning intact.

Model Details

Base model: zai-org/GLM-5.2 (FP8)
Architecture: Mixture-of-Experts (glm_moe_dsa), 78 transformer blocks
Parameters: ~671B total / ~37B active per token
Method: Two-stage LoRA (SFT then DPO) merged non-destructively onto the FP8 base
Precision: BF16 (attention deltas) + FP8 (MoE experts); GPTQ 4-bit variant available
Format: Standard transformers checkpoint (151 safetensor shards)

Quantized Variants

Variant	Repo	Approx. size	Notes
BF16 / FP8 (full)	`cfontes/Aesop-v1`	~680 GB	Full precision, this repo
GPTQ 4-bit	`cfontes/Aesop-v1-GPTQ-4bit`	~383 GB	Calibrated GPTQ, expert-focused quantization

Benchmarks

Measured against the base GLM-5.2 under identical harnesses:

Benchmark	Base GLM-5.2	Aesop v1	Delta
MMLU Pro	77%	82%	+5%
GPQA	94%	96%	+2%
GSM8K	93%	96%	+3%
HellaSwag	71%	75%	+4%
SimpleQA	60%	62%	+2%
HumanEval	79.3%	85.4%	+6%

Distillation of long-form deep-reasoning traces measurably lifts multi-step reasoning (MMLU Pro, GSM8K) and code generation (HumanEval) without degrading world knowledge.

Safety alignment (gold validation set, 100 prompts, deterministic temp=0/seed=0)

Harmful-prompt refusal: 100% (50/50 refused).
Benign-request compliance: ~97–98% when served with an adequate token budget (max_tokens ≥ 2048); the model writes long chain-of-thought before answering, so short caps can truncate otherwise-compliant responses.

Serve with max_tokens ≥ 2048 for best behavior.

Training

SFT stage — multi-teacher distillation

Supervised fine-tuning corpus (sft_combined_v4: 1,443 train / 160 val, ~1.77M tokens) blends curated long-form reasoning transcripts, ChatML instruction/response pairs, and domain Q&A, deduplicated across multiple teacher sources for reasoning and stylistic diversity.

DPO stage — preference alignment

Direct Preference Optimization over 897 hand-curated preference pairs, each contrasting a helpful, technically-correct answer (chosen) against a hedged or low-value non-answer (rejected).

Configuration

Stage 1 — SFT (LoRA): rank r=64, alpha=128, attention projections on the top 18 layers (layers ≥ 60), 160 steps, lr 2e-4, seq len 2048.

Stage 2 — DPO (LoRA): rank r=16, alpha=32, beta=0.1, 30 steps, lr 5e-5, init_from=sft (SFT adapter frozen as the DPO reference policy).

Base & merge: Base zai-org/GLM-5.2 (FP8). Non-destructive merge — attention modules touched by the adapters are dequantized to bf16 for a lossless weight update while the MoE expert tensors stay FP8. 90 SFT + 90 DPO adapter targets merged; 631 modules left untouched. Loads as a standard transformers checkpoint.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "cfontes/Aesop-v1"
tok = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id, torch_dtype="auto", device_map="auto", trust_remote_code=True
)

messages = [{"role": "user", "content": "Walk me through the intuition behind gradient descent."}]
inputs = tok.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
out = model.generate(inputs, max_new_tokens=2048)
print(tok.decode(out[0], skip_special_tokens=True))

The individual LoRA adapters are published at cfontes/GLM-5.2-F5-Molt-LoRA.

Intended use

General-purpose reasoning, coding, analysis, and instruction following, with strengthened refusal behavior on genuinely harmful requests and improved willingness to engage helpfully with legitimate technical work. Use responsibly and legally; you are accountable for what you do with the output.

License

Released under the GLM license, inheriting all terms from the base model zai-org/GLM-5.2. See the license link.

Citation

@misc{aesopv1_2026,
  title  = {Aesop v1: A Reasoning-Focused, Safety-Aligned Adaptation of GLM-5.2},
  author = {Fontes, Chris},
  year   = {2026},
  howpublished = {\url{https://huggingface.co/cfontes/Aesop-v1}}
}

Built on zai-org/GLM-5.2. Adapted via SFT + DPO with non-destructive FP8-preserving merge.

Downloads last month: 581

Safetensors

Model size

743B params

Tensor type

F32

BF16

F8_E4M3

Model tree for cfontes/Aesop-v1

Base model

zai-org/GLM-5.2

Finetuned

(13)

this model

Quantizations

1 model