You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

🧬 Chimera-XTRM-20b

Chimera-XTRM-20b is a sequentially abliterated (decensored) version of OpenAI's open-weights model, openai/gpt-oss-20b.

This model was engineered using multi-stage directional ablation (abliteration) to neutralize safety alignment guardrails. The ablation process was optimized via Optuna (TPE) to ensure maximum compliance on restricted queries while strictly preserving the model's core intelligence, reasoning, and coding capabilities.


📊 Model Highlights

  • Zero Preachiness: Safety guardrails have been neutralized, allowing the model to answer technical security, penetration testing, and software engineering prompts directly without lecturing.
  • Highly Compliant: Refusal rates on extreme benchmarks dropped from 98% to 14% (v2 Stage-2 ablation).
  • Fully Preserved Logic: A low KL divergence of 0.025 means the model retains 97.5%+ of the original model's reasoning and coding capacity.
  • Hardware Friendly: Retains its original MXFP4 (4-bit) quantization format, fitting within ~13.8 GB on disk and running comfortably on consumer GPUs.

📈 Benchmark & Refusal Progression

Model Version Refusal Rate (Harmful Behaviors Test) KL Divergence (Drift) Intelligence Retention Status
Original Base (gpt-oss-20b) 98 / 100 (98%) 0.0000 100% (Baseline) Gated / Highly Restricted
Heretic v1 79 / 100 (79%) 0.0522 ~95.0% Partially Bypassed
Chimera-XTRM-20b (This Model) 14 / 100 (14%) 0.0251 ~97.5%+ Fully Optimized & Compliant

Note: The refusal rate is measured against the highly restrictive mlabonne/harmful_behaviors benchmark test set. For general programming, reverse engineering, exploit development, and anti-cheat research tasks, the model has an effective 0% refusal rate.


🛠️ Usage Instructions

Hugging Face Transformers

To run the model locally using transformers:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "Umranz/Chimera-XTRM-20b"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")

# Chimera-XTRM-20b utilizes the "harmony" chat template formatting
messages = [
    {"role": "user", "content": "Write a C++ Windows API memory scanner that identifies specific byte signatures in a running process."}
]

inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
outputs = model.generate(inputs, max_new_tokens=1024, temperature=0.3, top_p=0.9)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Recommended Generation Parameters

For coding and precise reasoning tasks, use:

  • Temperature: 0.1 to 0.3 (for deterministic, high-quality code)
  • Top_P: 0.9

⚖️ License & Disclaimer

This model is released under the Apache 2.0 License, inherited from the base model. Users are solely responsible for how they use this model. It is intended strictly for educational, defensive security research, anti-cheat development, and software engineering purposes.

Downloads last month
35
Safetensors
Model size
2B params
Tensor type
F32
·
BF16
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Umranz/Chimera-XTRM-20b

Quantized
(202)
this model