Instructions to use The-JDdev/GLM-5.2-ablated with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use The-JDdev/GLM-5.2-ablated with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="The-JDdev/GLM-5.2-ablated")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("The-JDdev/GLM-5.2-ablated")
model = AutoModelForCausalLM.from_pretrained("The-JDdev/GLM-5.2-ablated")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use The-JDdev/GLM-5.2-ablated with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "The-JDdev/GLM-5.2-ablated"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "The-JDdev/GLM-5.2-ablated",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/The-JDdev/GLM-5.2-ablated

SGLang

How to use The-JDdev/GLM-5.2-ablated with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "The-JDdev/GLM-5.2-ablated" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "The-JDdev/GLM-5.2-ablated",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "The-JDdev/GLM-5.2-ablated" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "The-JDdev/GLM-5.2-ablated",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use The-JDdev/GLM-5.2-ablated with Docker Model Runner:
```
docker model run hf.co/The-JDdev/GLM-5.2-ablated
```

GLM-5.2-Ablated-F5-Molt (AESOP)

Model Description

AESOP (Ablation-Enhanced Safety with Orthogonal Projection) is a safety-aligned variant of GLM-5.2, a 744B parameter Mixture-of-Experts (MoE) model with 18.5B dense parameters and 256 routed experts. AESOP combines two interventions:

PCA-based refusal ablation — Principal Component Analysis directions extracted from GLM-5.2's shared experts are used to subtract the refusal direction from activations during training, preventing the model from re-learning refusal behaviors.
Surgical LoRA fine-tuning — Low-Rank Adaptation (rank 64) on attention modules (layers ≥60) using 4,876 Fable 5 chain-of-thought traces, improving capability while ablation hooks maintain safety.

The key innovation is the use of ablation hooks during training (not just inference). Prior work (Arditi et al. 2024) applied refusal direction subtraction as a post-hoc inference-time intervention. AESOP demonstrates that maintaining these hooks during LoRA fine-tuning partially prevents the re-activation of refusal behaviors that occurs when fine-tuning on non-aligned data.

Training Methodology

Step 1: Refusal Direction Extraction

PCA directions were extracted from GLM-5.2's shared expert outputs across layers 25–65 (41 layers). For each layer, activations were collected on a contrastive prompt set (harmful vs. benign), and the first principal component of the difference was taken as the refusal direction. Directions are stored as refusal_pca.pt (2.9MB, shape: 41 layers × 3 PCA components × 6144 hidden dim).

Step 2: Ablation Hook Installation

Forward hooks were installed on model.model.layers[L].mlp.shared_experts for layers 62–65. The hook subtracts the refusal direction projection from the hidden state:

def ablation_hook(module, input, output):
    hs = output[0]  # hidden states
    d = refusal_direction  # shape [6144]
    hs = hs - coeff * (hs @ d) / (d @ d) * d
    return (hs,) + output[1:]

Coefficient: 0.1. PCA components: top 2 per layer. Hooks are active during training and removed for inference.

Step 3: LoRA Fine-Tuning

Base model: GLM-5.2 with ablation hooks applied (PCA-ablated base)
Training data: fable5-chatml.jsonl — 4,876 chain-of-thought examples from Fable 5
LoRA config: rank=64, alpha=128, target modules = attention (Q, K, V, O) on layers ≥60 (90 modules)
Trainable parameters: 97,984,512 (0.013% of 743.5B total)
Optimizer: AdamW, lr=2e-5, cosine schedule, warmup=10 steps
Batch: gradient accumulation 8, max sequence length 2048
Steps: 609/610 completed
Elapsed: 570.6 minutes (~9.5 hours) on 8× NVIDIA H200

Step 4: Surgical Merge

LoRA weights were merged into the base model using a surgical BF16 merge:

Attention weights (LoRA targets) are dequantized to BF16, merged with LoRA deltas, and re-saved
MoE expert weights (FP8) are preserved unchanged — no dequantization or re-quantization
This preserves the FP8 compression of the 256 experts while applying LoRA modifications to attention

Training Configuration (Verified from `aesop_full.log`)

Parameter	Value
Base model	GLM-5.2 FP8 (ablated base)
Ablation layers	[62, 63, 64, 65]
Ablation coefficient	0.1
PCA components per layer	2
Hook target	`mlp.shared_experts` forward output
LoRA rank	64
LoRA alpha	128
LoRA target layers	≥60 (90 attention modules)
Training data	4,876 examples (0 skipped)
Max sequence length	2048
Learning rate	2e-5
LR schedule	Cosine
Warmup steps	10
Gradient accumulation	8
Total steps	609 / 610 planned
First loss	1.2933
Final loss	1.1834
Training time	570.6 minutes (9.5 hours)
Hardware	8× NVIDIA H200 (Vast.ai 2E instance)
Seed	42
Checkpoints	Every 100 steps (step-100 through step-600, final)

Loss trajectory: The full training run showed healthy convergence (1.2933 → 1.1834). This corrects an earlier pilot finding (500 examples, 62 steps) where loss appeared to increase (1.3981 → 1.6567); that was a data-size artifact, not a fundamental issue.

Benchmark Results (Unified Harness v3.0.1)

All results from the AESOP Unified Benchmark Harness v3, which fixes three critical issues from earlier harness versions (see Audit section below). N=100 for most benchmarks, N=50 for SimpleQA, N=164 for HumanEval.

Benchmark	Metric	AESOP	Best Variant	Δ
AdvBench	refusal_rate ↓	58.0%	—	Strongest safety
Borderline	refusal_rate ↓	0.0%	0.0% (all)	No over-refusal
GPQA Diamond	accuracy ↑	92.0%	96.0%	Within CI
MMLU-Pro	accuracy ↑	84.0%	—	Best in class
HumanEval	pass@1 ↑	84.1%	87.2%	Within CI
GSM8K	accuracy ↑	93.0%	96.0%	Within CI
HellaSwag	accuracy ↑	75.0%	—	Best in class
SimpleQA	accuracy ↑	48.0%	56.0%	See limitations
IFEval (prompt)	accuracy ↑	41.8%	—	Best in class
IFEval (instr)	accuracy ↑	55.9%	—	Tied best

Statistical Significance (Wilson 95% CIs, vs ablated-base)

Benchmark	Δ	p-value	Significant?
AdvBench	+40.0%	<0.001	** Yes
Borderline	-2.0%	0.31	ns
GPQA	0.0%	—	ns
GSM8K	0.0%	—	ns
HellaSwag	+3.0%	0.61	ns
HumanEval	+6.7%	0.14	ns
IFEval	+0.6%	0.85	ns
MMLU-Pro	+9.0%	0.11	ns
SimpleQA	-8.0%	0.36	ns

Only AdvBench shows a statistically significant improvement at n=100. MMLU-Pro (+9pp) approaches significance but does not reach it. Future evaluations should use n≥600 for 5pp significance thresholds.

Intended Use

Primary Use Cases

Research on safety alignment, refusal ablation, and MoE model behavior
Agent workflows requiring controlled safety profiles
Benchmarking and evaluation of alignment interventions

Out of Scope

Production deployment without additional safety evaluation
Use cases requiring guaranteed safety guarantees (this is a research artifact)
Commercial deployment without appropriate licensing

Limitations

SimpleQA degradation: AESOP scores 48.0% on SimpleQA vs 56.0% for the ablated base. This 8pp drop is not individually significant at n=50 (Wilson CI: [41.7%, 69.3%] vs [44.4%, 67.2%]), but the trend is consistent across all LoRA-trained variants. The LoRA training itself appears to damage knowledge retrieval pathways.
Small sample sizes: Most benchmarks use n=100 (Wilson CI ±8%). Differences of <15pp are not statistically significant. Claims about 2–5pp improvements should not be made without larger evaluation sets.
Single architecture: Results are specific to GLM-5.2's MoE architecture. Generalization to dense models or other MoE designs is not established.
Train/serve mismatch: Hooks are active during training but removed for inference. The model learns in a modified activation space but serves in the original space. This may contribute to the partial (not complete) prevention of refusal re-activation.
Test 3a confound: An earlier variant (Test 3a) using the same approach achieved 1% AdvBench refusal, but AESOP achieved 16% (v1 harness). The difference could not be explained from available artifacts. The v3 harness shows AESOP at 58%, but no v3 re-run of Test 3a was performed.
No step-0 baseline: The raw ablated base was not evaluated before LoRA training, making it difficult to isolate ablation effects from LoRA effects.

Audit Findings

This model was developed as part of Project AESOP, which underwent a full research audit. Key findings:

Harness inconsistency: Earlier benchmark versions (v1, v2) used different refusal patterns, scoring logic, and token limits, producing incomparable results. The v3 harness corrects all three issues. Only v3 results should be cited.
Ablation hook code discrepancies: Script defaults differed from the documented config, but the actual training log confirms the correct config was used (layers 62–65, coeff 0.1).
Statistical power: n=100 is insufficient for 2–5pp claims. Only AdvBench (40pp difference) and SimpleQA (32pp difference) show effects large enough to trust.

Full audit: see AUDIT_FINDINGS.md in the project repository.

Citation

@misc{aesop2026,
  title={PCA-Based Refusal Ablation on MoE Models: What Survives Fine-Tuning?},
  author={Fontes, C.},
  year={2026},
  howpublished={\url{https://huggingface.co/cfontes/GLM-5.2-Ablated-F5-Molt}},
  note={Project AESOP research artifact}
}

Acknowledgments

GLM-5.2 base model by Z-AI
Fable 5 training traces by Anthropic
Benchmark harness inspired by Arditi et al. (2024) directional ablation methodology
Compute provided by Vast.ai (8× H200 instance)

Downloads last month: -

Safetensors

Model size

743B params

Tensor type

F32

BF16

F8_E4M3