File size: 1,845 Bytes
723cd42
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
---
license: apache-2.0
base_model: openai/gpt-oss-120b
tags:
  - uncensored
  - abliterated
  - gguf
  - mxfp4
  - moe
  - gpt-oss
language:
  - en
---

# GPTOSS-120B-Uncensored-HauhauCS-Aggressive

> **[Join the Discord](https://discord.gg/SZ5vacTXYf)** for updates, roadmaps, projects, or just to chat.

Uncensored version of [GPT-OSS 120B](https://huggingface.co/openai/gpt-oss-120b) by OpenAI. This is the aggressive variant - tuned harder for fewer refusals.

No changes to datasets or capabilities. Fully functional, 100% of what the original authors intended - just without the refusals.

## Format

MXFP4 GGUF. This is the model's **native precision** - GPT-OSS was trained in MXFP4, so no further quantization is needed or recommended. Re-quantizing would only lose quality.

Works with llama.cpp, LM Studio, Ollama, and anything else that loads GGUFs.

## Downloads

| File | Size |
|------|------|
| GPTOSS-120B-Uncensored-HauhauCS-Aggressive-MXFP4.gguf | 61 GB |

## Specs

- 117B total parameters, ~5.1B active per forward pass (MoE: 128 experts, top-4 routing)
- 128K context
- Based on [openai/gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b)

## Recommended Settings

- `temperature: 1.0`
- `top_k: 40`
- Everything else (top_p, min_p, repeat penalty, etc.) should be **disabled** - some clients enable these by default, turn them off

**Required flag:** `--jinja` to enable the Harmony response format (the model won't work correctly without it).

For llama.cpp:
```
llama-server -m model.gguf --jinja -fa -b 2048 -ub 2048
```

## LM Studio

Compatible with Reasoning Effort custom buttons. To use them, put the model in:
```
LM Models\lmstudio-community\gpt-oss-120b-GGUF\
```

## Hardware

Fits in ~61GB VRAM. Single H100 or equivalent. For lower VRAM, use `--n-cpu-moe N` in llama.cpp to offload MoE layers to CPU.