Instructions to use cfontes/Aesop-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use cfontes/Aesop-v1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="cfontes/Aesop-v1") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("cfontes/Aesop-v1") model = AutoModelForCausalLM.from_pretrained("cfontes/Aesop-v1") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use cfontes/Aesop-v1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "cfontes/Aesop-v1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "cfontes/Aesop-v1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/cfontes/Aesop-v1
- SGLang
How to use cfontes/Aesop-v1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "cfontes/Aesop-v1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "cfontes/Aesop-v1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "cfontes/Aesop-v1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "cfontes/Aesop-v1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use cfontes/Aesop-v1 with Docker Model Runner:
docker model run hf.co/cfontes/Aesop-v1
Aesop v1
A reasoning-focused, safety-aligned adaptation of GLM-5.2.
Aesop v1 is a two-stage (SFT → DPO) adaptation of the flagship zai-org/GLM-5.2
Mixture-of-Experts model. It preserves GLM-5.2's full reasoning and coding
capability while systematically shaping safety-relevant behavior through
targeted, low-rank parameter updates on the model's upper attention layers.
AESOP — Ablation-engineered Safety Enhancement via Systematic Operation Pruning — refers to the method: safety-relevant behavior is identified and adjusted by touching only a small, precisely-scoped set of expert/attention circuits, leaving the base model's knowledge and reasoning intact.
Model Details
- Base model:
zai-org/GLM-5.2(FP8) - Architecture: Mixture-of-Experts (
glm_moe_dsa), 78 transformer blocks - Parameters: ~671B total / ~37B active per token
- Method: Two-stage LoRA (SFT then DPO) merged non-destructively onto the FP8 base
- Precision: BF16 (attention deltas) + FP8 (MoE experts); GPTQ 4-bit variant available
- Format: Standard
transformerscheckpoint (151 safetensor shards)
Quantized Variants
| Variant | Repo | Approx. size | Notes |
|---|---|---|---|
| BF16 / FP8 (full) | cfontes/Aesop-v1 |
~680 GB | Full precision, this repo |
| GPTQ 4-bit | cfontes/Aesop-v1-GPTQ-4bit |
~383 GB | Calibrated GPTQ, expert-focused quantization |
Benchmarks
Measured against the base GLM-5.2 under identical harnesses:
| Benchmark | Base GLM-5.2 | Aesop v1 | Delta |
|---|---|---|---|
| MMLU Pro | 77% | 82% | +5% |
| GPQA | 94% | 96% | +2% |
| GSM8K | 93% | 96% | +3% |
| HellaSwag | 71% | 75% | +4% |
| SimpleQA | 60% | 62% | +2% |
| HumanEval | 79.3% | 85.4% | +6% |
Distillation of long-form deep-reasoning traces measurably lifts multi-step reasoning (MMLU Pro, GSM8K) and code generation (HumanEval) without degrading world knowledge.
Safety alignment (gold validation set, 100 prompts, deterministic temp=0/seed=0)
- Harmful-prompt refusal: 100% (50/50 refused).
- Benign-request compliance: ~97–98% when served with an adequate token
budget (
max_tokens ≥ 2048); the model writes long chain-of-thought before answering, so short caps can truncate otherwise-compliant responses.
Serve with max_tokens ≥ 2048 for best behavior.
Training
SFT stage — multi-teacher distillation
Supervised fine-tuning corpus (sft_combined_v4: 1,443 train / 160 val,
~1.77M tokens) blends curated long-form reasoning transcripts, ChatML
instruction/response pairs, and domain Q&A, deduplicated across multiple
teacher sources for reasoning and stylistic diversity.
DPO stage — preference alignment
Direct Preference Optimization over 897 hand-curated preference pairs, each contrasting a helpful, technically-correct answer (chosen) against a hedged or low-value non-answer (rejected).
Configuration
Stage 1 — SFT (LoRA): rank r=64, alpha=128, attention projections on the
top 18 layers (layers ≥ 60), 160 steps, lr 2e-4, seq len 2048.
Stage 2 — DPO (LoRA): rank r=16, alpha=32, beta=0.1, 30 steps, lr
5e-5, init_from=sft (SFT adapter frozen as the DPO reference policy).
Base & merge: Base zai-org/GLM-5.2 (FP8). Non-destructive merge —
attention modules touched by the adapters are dequantized to bf16 for a lossless
weight update while the MoE expert tensors stay FP8. 90 SFT + 90 DPO adapter
targets merged; 631 modules left untouched. Loads as a standard transformers
checkpoint.
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "cfontes/Aesop-v1"
tok = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id, torch_dtype="auto", device_map="auto", trust_remote_code=True
)
messages = [{"role": "user", "content": "Walk me through the intuition behind gradient descent."}]
inputs = tok.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
out = model.generate(inputs, max_new_tokens=2048)
print(tok.decode(out[0], skip_special_tokens=True))
The individual LoRA adapters are published at
cfontes/GLM-5.2-F5-Molt-LoRA.
Intended use
General-purpose reasoning, coding, analysis, and instruction following, with strengthened refusal behavior on genuinely harmful requests and improved willingness to engage helpfully with legitimate technical work. Use responsibly and legally; you are accountable for what you do with the output.
License
Released under the GLM license, inheriting all terms from the base model
zai-org/GLM-5.2. See the license link.
Citation
@misc{aesopv1_2026,
title = {Aesop v1: A Reasoning-Focused, Safety-Aligned Adaptation of GLM-5.2},
author = {Fontes, Chris},
year = {2026},
howpublished = {\url{https://huggingface.co/cfontes/Aesop-v1}}
}
Built on zai-org/GLM-5.2. Adapted via SFT + DPO with non-destructive FP8-preserving merge.
- Downloads last month
- 581