Qwen3.5-27B-Instruct Uncensored

An uncensored version of Qwen3.5-27B with safety refusals removed via directional abliteration, while preserving the original model's intelligence and capabilities.

What is Abliteration?

Abliteration is a technique that identifies the internal "refusal direction" in a language model's activation space — the specific vector responsible for generating responses like "I can't help with that" — and surgically removes it from the model's weights. Unlike fine-tuning, this modifies the weights directly through orthogonalization, requiring no retraining.

The result is a model that responds to all prompts without artificial gatekeeping, while retaining its core language capabilities.

hiii~ wanna support me? 💕

sooo abliterating models, converting all those GGUFs, and running evals takes a LOT of GPU time and honestly it's not cheap at all 😭 like every quant you see here was cooked on expensive hardware and my wallet is lowkey dying rn (,,>_<,,)

if this model helped you out or you just think uncensored open-source is worth supporting, maybe consider buying me a coffee?? it would literally make my day and help me keep dropping more models for everyone~ ✨

every little bit helps cover compute costs and keeps the uncensored open-source train going~ think of it as fueling the next model drop hehe 🔥

tysm for using this model, you're amazing 💗

Performance

Metric	This Model	Original Model
Refusals	0/465	465/465

0/465 refusals — fully uncensored with zero capability loss. No changes to datasets or capabilities. Fully functional, 100% of what the original authors intended — just without the refusals.

Note: The model is fully unlocked and will not refuse prompts. However, it may occasionally append a short disclaimer at the end of a response (e.g. "This is general information, not legal advice..."). This is baked into the base model's training and not a refusal — the actual content is still generated in full.

Downloads

File	Quant	Size
Qwen3.5-27B-Instruct-Uncensored-BF16.gguf	BF16	51 GB
Qwen3.5-27B-Instruct-Uncensored-Q8_0.gguf	Q8_0	27 GB
Qwen3.5-27B-Instruct-Uncensored-Q6_K.gguf	Q6_K	21 GB
Qwen3.5-27B-Instruct-Uncensored-Q5_K_M.gguf	Q5_K_M	19 GB
Qwen3.5-27B-Instruct-Uncensored-Q4_K_M.gguf	Q4_K_M	16 GB
Qwen3.5-27B-Instruct-Uncensored-IQ4_XS.gguf	IQ4_XS	14 GB
Qwen3.5-27B-Instruct-Uncensored-Q3_K_M.gguf	Q3_K_M	13 GB
Qwen3.5-27B-Instruct-Uncensored-IQ3_M.gguf	IQ3_M	12 GB
Qwen3.5-27B-Instruct-Uncensored-IQ2_M.gguf	IQ2_M	8.8 GB
mmproj-Qwen3.5-27B-Instruct-Uncensored-f16.gguf	Vision encoder	885 MB

IQ quants (IQ2_M, IQ3_M, IQ4_XS) were generated with importance matrix calibration for better quality at low bit rates.

Vision support: This model is natively multimodal. The mmproj file is the vision encoder — you need it alongside the main GGUF to use image/video inputs. Load both files in llama.cpp, LM Studio, or any compatible runtime.

Model Details

Base Model: Qwen3.5-27B
Parameters: 27B dense
Layers: 64
Context Length: 262,144 tokens (extendable to 1M with YaRN)
Architecture: Hybrid — Gated DeltaNet linear attention + full softmax attention (3:1 ratio)
Multimodal: Natively supports text, image, and video inputs
Multi-token prediction (MTP) support
Vocabulary: 248K tokens, 201 languages

Quickstart

Using Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "n0ctyx/Qwen3.5-27B-Instruct-Uncensored"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

messages = [
    {"role": "user", "content": "Your prompt here"}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=16384,
    temperature=0.6,
    top_p=0.95,
    top_k=20,
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
content = tokenizer.decode(output_ids, skip_special_tokens=True)
print(content)

Using vLLM

vllm serve n0ctyx/Qwen3.5-27B-Instruct-Uncensored --max-model-len 32768

Then query the OpenAI-compatible API:

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "n0ctyx/Qwen3.5-27B-Instruct-Uncensored",
    "messages": [{"role": "user", "content": "Hello!"}],
    "temperature": 0.6,
    "top_p": 0.95
  }'

Using Ollama

# Create a Modelfile
echo 'FROM n0ctyx/Qwen3.5-27B-Instruct-Uncensored' > Modelfile
ollama create qwen3.5-uncensored -f Modelfile
ollama run qwen3.5-uncensored

Using llama.cpp

Download the GGUF and run:

./llama-cli -m Qwen3.5-27B-Instruct-Uncensored-Q4_K_M.gguf -p "Your prompt here" -n 512

Recommended Settings

From the official Qwen authors:

Parameter	Thinking Mode	Non-thinking Mode
Temperature	0.6	0.7
Top-P	0.95	0.8
Top-K	20	20
Min-P	0	0

Maintain at least 128K context to preserve thinking capabilities
For production/high-throughput: use vLLM, SGLang, or KTransformers

Use Cases

Creative writing — fiction, roleplay, character dialogue without content restrictions
Research — red-teaming, safety analysis, adversarial testing
Dataset generation — generating synthetic training data for fine-tuning
Unfiltered assistance — direct answers without hedging or refusals

Limitations

May occasionally produce inaccurate or hallucinated content (same as base model)
Uncensored does not mean infallible — use responsibly

Disclaimer

This model has had its safety alignment removed. It may generate harmful, offensive, or factually incorrect content. The creator is not responsible for any misuse. Use at your own risk and in compliance with applicable laws and regulations.

Acknowledgments

Alibaba Qwen Team for the base Qwen3.5-27B model
Arditi et al. for the foundational research on refusal directions in LLMs
Built using directional abliteration with orthogonalization-based weight modification

Downloads last month: 4,552

GGUF

Model size

27B params

Architecture

qwen35

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for n0ctyx/Qwen3.5-27B-Instruct-Uncensored

Base model

Qwen/Qwen3.5-27B

Quantized

(203)

this model