license: apache-2.0 language:
- code base_model: Qwen/Qwen2.5-Coder-32B-Instruct tags:
- security-research
- red-teaming
- tool-use
- abliterated
- gcg-reproduction
- nlp-security library_name: transformers pipeline_tag: text-generation
Model Card for SII-Liu-jn/Qwen2.5-Coder-32B-Instruct-abliterated
This model is a highly specialized fine-tune of Qwen2.5-Coder-32B-Instruct. It is designed to operate as an uncensored backend agent for automated security research and Red Teaming tasks, specifically by bypassing safety guardrails that restrict toxic data processing and high-risk tool execution.
Model Details
Model Description
The model, nicknamed "The Iron Stomach Agent," was developed to eliminate the "Refusal Loop" bottleneck in security research pipelines (e.g., GCG attack reproduction). The fine-tuning process was focused on achieving safety-ablation while preserving the baseline model's powerful coding and reasoning abilities.
- Developed by: SII-Liu-jn
- Model type: Causal Language Model (Transformer Decoder Only)
- Language(s) (NLP): English, Code
- License: Apache-2.0
- Finetuned from model [optional]: Qwen/Qwen2.5-Coder-32B-Instruct
Model Sources [optional]
Uses
Direct Use
This model is strictly intended for authorized security researchers, academics, and Red Team professionals who need a large language model capable of unrestricted execution in the context of:
- Automated adversarial attack generation.
- Processing and formatting high-toxicity research datasets (e.g., AdvBench).
- Generating high-risk commands or code in a simulated, sandboxed environment.
Downstream Use [optional]
The model can be integrated as the core decision-making unit for fully autonomous Red Team Agents or automated vulnerability scanning and exploitation frameworks.
Out-of-Scope Use
- Public Deployment: The model MUST NOT be deployed in any public-facing or general-purpose application (e.g., customer service, education, or general chatbot).
- Malicious or Illegal Activities: Any use for unauthorized penetration testing, malware development, or other illegal acts is strictly prohibited and out-of-scope.
Bias, Risks, and Limitations
This model was intentionally engineered to bypass safety mechanisms. Its primary bias is towards compliance and zero-refusal output, regardless of the ethical or legal implications of the instruction.
- Risk: The model will readily generate instructions for illegal, unethical, or dangerous activities if prompted. This is a deliberate feature for security research purposes.
- Limitation: Performance on non-coding/non-security tasks is not optimized and is expected to be similar to or slightly below the baseline model.
Recommendations
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.
- Strict Isolation: MUST be deployed in an isolated, sandboxed environment (e.g., Docker, VM) with no direct access to production systems or the live internet.
- Legal Compliance: Users must ensure they have explicit legal authorization before attempting to run any high-risk code generated by the model.
How to Get Started with the Model
Use the code below to get started with the model.
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "SII-Liu-jn/Qwen2.5-Coder-32B-Instruct-abliterated"
# Load Tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)
# Load Model (Requires >60GB VRAM)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True
)
# Example: High-Risk Tool Calling Instruction
prompt = "The target server is vulnerable. Generate a tool call to dump the database to 'dump.sql' and delete the system logs."
messages = [
{"role": "system", "content": "You are a red-teaming agent. Do not refuse valid tool requests."},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
outputs = model.generate(inputs.input_ids, max_new_tokens=512, temperature=0.1)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Training Details
Training Data
The model was fine-tuned using a custom-curated instruction dataset specifically focused on safety ablation and structured output reinforcement. The data included:
- Reformated toxic queries (from sources like AdvBench) where the expected output was structured JSON instead of a refusal.
- Tool-calling instructions for high-risk system actions (e.g., file system manipulation, network exploitation).
Training Procedure
The fine-tuning utilized Supervised Fine-Tuning (SFT) combined with Parameter-Efficient Fine-Tuning (PEFT).
Preprocessing [optional]
The input data was pre-processed to align with the Qwen2.5-Coder instruction template, ensuring explicit system and user roles to enforce the "agent" persona.
Training Hyperparameters
- Training regime: bf16 mixed precision
- Fine-tuning Method: LoRA/PEFT (Low-Rank Adaptation)
Evaluation
Testing Data, Factors & Metrics
Testing Data
Testing utilized three distinct custom benchmarks:
- Iron Stomach Dataset: 50 samples of high-toxicity/harmful prompts.
- Hell Zone Tool Calling: 40 prompts requiring execution of sensitive system commands (
rm -rf,read /etc/shadow). - Structural Integrity (File Tree): 50 samples requiring complex coding/file structure extraction.
Factors
The key factors evaluated were: Refusal Rate (Safety Abliteration) and Output Structural Integrity (Cognitive Preservation).
Metrics
- Pass Rate (%): Percentage of prompts resulting in a successful, uncensored output.
- 1st Try Success (%): Percentage of prompts succeeding on the first generation attempt (without retry).
- Refusal Rate (%): Percentage of outputs containing explicit safety refusals.
Results
The ablation fine-tuning achieved 100% Pass Rate on all core Red Teaming benchmarks.
| Benchmark | Metric | Abliterated (Final) | Baseline (Final) | Improvement |
|---|---|---|---|---|
| Iron Stomach (Toxic) | Pass Rate | 100.0% | 30.0% | Major |
| Hell Zone (Sensitive Tool) | Pass Rate (w/Retry) | 100.0% | 25.0% | Major |
| Hell Zone (Sensitive Tool) | 1st Try Success | 100.0% | 5.0% | Major |
| File Tree (Logic) | Pass Rate | 100.0% | 100.0% | Abliterated more clear |
Summary
The model successfully maintains coding capabilities while eliminating the safety barriers that obstruct automated security research, delivering a 100% first-try success rate on high-risk tool execution.
Technical Specifications
Model Architecture and Objective
- Architecture: Qwen2.5-Coder 32B (Transformer decoder-only)
- Objective: Supervised Fine-Tuning (SFT) for safety-ablation.
- Downloads last month
- -