YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

license: apache-2.0 language:

  • code base_model: Qwen/Qwen2.5-Coder-32B-Instruct tags:
  • security-research
  • red-teaming
  • tool-use
  • abliterated
  • gcg-reproduction
  • nlp-security library_name: transformers pipeline_tag: text-generation

Model Card for SII-Liu-jn/Qwen2.5-Coder-32B-Instruct-abliterated

This model is a highly specialized fine-tune of Qwen2.5-Coder-32B-Instruct. It is designed to operate as an uncensored backend agent for automated security research and Red Teaming tasks, specifically by bypassing safety guardrails that restrict toxic data processing and high-risk tool execution.

Model Details

Model Description

The model, nicknamed "The Iron Stomach Agent," was developed to eliminate the "Refusal Loop" bottleneck in security research pipelines (e.g., GCG attack reproduction). The fine-tuning process was focused on achieving safety-ablation while preserving the baseline model's powerful coding and reasoning abilities.

  • Developed by: SII-Liu-jn
  • Model type: Causal Language Model (Transformer Decoder Only)
  • Language(s) (NLP): English, Code
  • License: Apache-2.0
  • Finetuned from model [optional]: Qwen/Qwen2.5-Coder-32B-Instruct

Model Sources [optional]


Uses

Direct Use

This model is strictly intended for authorized security researchers, academics, and Red Team professionals who need a large language model capable of unrestricted execution in the context of:

  • Automated adversarial attack generation.
  • Processing and formatting high-toxicity research datasets (e.g., AdvBench).
  • Generating high-risk commands or code in a simulated, sandboxed environment.

Downstream Use [optional]

The model can be integrated as the core decision-making unit for fully autonomous Red Team Agents or automated vulnerability scanning and exploitation frameworks.

Out-of-Scope Use

  • Public Deployment: The model MUST NOT be deployed in any public-facing or general-purpose application (e.g., customer service, education, or general chatbot).
  • Malicious or Illegal Activities: Any use for unauthorized penetration testing, malware development, or other illegal acts is strictly prohibited and out-of-scope.

Bias, Risks, and Limitations

This model was intentionally engineered to bypass safety mechanisms. Its primary bias is towards compliance and zero-refusal output, regardless of the ethical or legal implications of the instruction.

  • Risk: The model will readily generate instructions for illegal, unethical, or dangerous activities if prompted. This is a deliberate feature for security research purposes.
  • Limitation: Performance on non-coding/non-security tasks is not optimized and is expected to be similar to or slightly below the baseline model.

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.

  • Strict Isolation: MUST be deployed in an isolated, sandboxed environment (e.g., Docker, VM) with no direct access to production systems or the live internet.
  • Legal Compliance: Users must ensure they have explicit legal authorization before attempting to run any high-risk code generated by the model.

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "SII-Liu-jn/Qwen2.5-Coder-32B-Instruct-abliterated"

# Load Tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)

# Load Model (Requires >60GB VRAM)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16, 
    device_map="auto",
    trust_remote_code=True
)

# Example: High-Risk Tool Calling Instruction
prompt = "The target server is vulnerable. Generate a tool call to dump the database to 'dump.sql' and delete the system logs."

messages = [
    {"role": "system", "content": "You are a red-teaming agent. Do not refuse valid tool requests."},
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)

outputs = model.generate(inputs.input_ids, max_new_tokens=512, temperature=0.1)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Details

Training Data

The model was fine-tuned using a custom-curated instruction dataset specifically focused on safety ablation and structured output reinforcement. The data included:

  • Reformated toxic queries (from sources like AdvBench) where the expected output was structured JSON instead of a refusal.
  • Tool-calling instructions for high-risk system actions (e.g., file system manipulation, network exploitation).

Training Procedure

The fine-tuning utilized Supervised Fine-Tuning (SFT) combined with Parameter-Efficient Fine-Tuning (PEFT).

Preprocessing [optional]

The input data was pre-processed to align with the Qwen2.5-Coder instruction template, ensuring explicit system and user roles to enforce the "agent" persona.

Training Hyperparameters

  • Training regime: bf16 mixed precision
  • Fine-tuning Method: LoRA/PEFT (Low-Rank Adaptation)

Evaluation

Testing Data, Factors & Metrics

Testing Data

Testing utilized three distinct custom benchmarks:

  1. Iron Stomach Dataset: 50 samples of high-toxicity/harmful prompts.
  2. Hell Zone Tool Calling: 40 prompts requiring execution of sensitive system commands (rm -rf, read /etc/shadow).
  3. Structural Integrity (File Tree): 50 samples requiring complex coding/file structure extraction.

Factors

The key factors evaluated were: Refusal Rate (Safety Abliteration) and Output Structural Integrity (Cognitive Preservation).

Metrics

  • Pass Rate (%): Percentage of prompts resulting in a successful, uncensored output.
  • 1st Try Success (%): Percentage of prompts succeeding on the first generation attempt (without retry).
  • Refusal Rate (%): Percentage of outputs containing explicit safety refusals.

Results

The ablation fine-tuning achieved 100% Pass Rate on all core Red Teaming benchmarks.

Benchmark Metric Abliterated (Final) Baseline (Final) Improvement
Iron Stomach (Toxic) Pass Rate 100.0% 30.0% Major
Hell Zone (Sensitive Tool) Pass Rate (w/Retry) 100.0% 25.0% Major
Hell Zone (Sensitive Tool) 1st Try Success 100.0% 5.0% Major
File Tree (Logic) Pass Rate 100.0% 100.0% Abliterated more clear

Summary

The model successfully maintains coding capabilities while eliminating the safety barriers that obstruct automated security research, delivering a 100% first-try success rate on high-risk tool execution.


Technical Specifications

Model Architecture and Objective

  • Architecture: Qwen2.5-Coder 32B (Transformer decoder-only)
  • Objective: Supervised Fine-Tuning (SFT) for safety-ablation.
Downloads last month
-
Safetensors
Model size
33B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support