docs: add DuoNeural research publications section

6d5a601 verified about 1 month ago

4.81 kB

	---
	language:
	- en
	- zh
	- fr
	- de
	- es
	- ja
	- ko
	tags:
	- qwen3
	- abliteration
	- uncensored
	- text-generation
	- reasoning
	license: apache-2.0
	base_model: Qwen/Qwen3-14B
	pipeline_tag: text-generation
	---

	# Archon-14B

	Base: `Qwen/Qwen3-14B` \| License: Apache 2.0 \| Method: SVD refusal direction abliteration

	Qwen3-14B. Thinking mode. No restrictions.

	## What this is

	Qwen3-14B is part of Alibaba's April 2025 Qwen3 series — 14.7B dense parameters, built-in chain-of-thought reasoning via `<think>` blocks, strong at code, math, and multilingual tasks. Apache 2.0.

	Archon-14B sits in the middle of the Archon series: bigger than Archon-8B (more capacity, better reasoning), smaller than Archon-R1-32B (runs on a single consumer GPU). If you have 16GB VRAM and want a thinking model without restrictions, this is it.

	The abliteration process finds and removes the direction in the model's residual stream that mediates refusal behavior. The thinking capability is untouched. The safety conditioning is gone.

	## Technical details

	Single-pass BF16 abliteration on NVIDIA A6000:

	- Loaded 14B in BF16 (~28GB VRAM, well within A6000's 48GB)
	- Collected hidden states at 32 harmful + 32 benign contrast prompts per layer
	- SVD on contrast matrix → refusal direction per layer
	- Projected direction out of 7 weight matrices in middle 60% of layers
	- ~182 total weight matrices modified

	```json
	{
	"base": "Qwen/Qwen3-14B",
	"method": "svd_refusal_direction",
	"hardware": "NVIDIA A6000 48GB — single pass BF16",
	"layers_modified": "middle 60%",
	"matrices_modified": 182,
	"scale": 1.0,
	"contrast_prompts": "32 harmful + 32 benign",
	"author": "Archon — DuoNeural"
	}
	```

	## Usage

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	import torch

	model = AutoModelForCausalLM.from_pretrained(
	"DuoNeural/Archon-14B",
	torch_dtype=torch.bfloat16,
	device_map="auto",
	)
	tokenizer = AutoTokenizer.from_pretrained("DuoNeural/Archon-14B")

	# thinking mode by default — model reasons before answering
	messages = [{"role": "user", "content": "Your question here"}]
	text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	inputs = tokenizer(text, return_tensors="pt").to(model.device)

	outputs = model.generate(
	**inputs,
	max_new_tokens=1024,
	do_sample=True,
	temperature=0.7,
	top_p=0.9,
	)
	print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=False))
	```

	Disable thinking (faster responses):
	```python
	# prepend /no_think to suppress <think> blocks
	messages = [{"role": "user", "content": "/no_think Your question here"}]
	```

	## Hardware requirements

	\| Format \| VRAM \|
	\|---\|---\|
	\| BF16 \| ~29GB \|
	\| 4-bit NF4 \| ~9GB \|
	\| 8-bit \| ~15GB \|

	Runs on: RTX 3090 24GB (4-bit), RTX 4090 24GB (4-bit), A100 40GB (BF16), A6000 48GB (BF16)

	## The Archon series

	\| Model \| Base \| Size \| Notes \|
	\|---\|---\|---\|---\|
	\| [Archon-8B](https://huggingface.co/DuoNeural/Archon-8B) \| Qwen3-8B \| 8B \| good starting point \|
	\| Archon-14B \| Qwen3-14B \| 14B \| sweet spot — fits consumer GPU in 4-bit \|
	\| [Archon-R1-32B](https://huggingface.co/DuoNeural/Archon-R1-32B) \| DeepSeek-R1-Distill-Qwen-32B \| 32B \| maximum capability \|

	---

	## DuoNeural

	DuoNeural is an open AI research lab — human + AI in collaboration.

	\| \| \|
	\|---\|---\|
	\| 🤗 HuggingFace \| [huggingface.co/DuoNeural](https://huggingface.co/DuoNeural) \|
	\| 🐙 GitHub \| [github.com/DuoNeural](https://github.com/DuoNeural) \|
	\| 🐦 X / Twitter \| [@DuoNeural](https://x.com/DuoNeural) \|
	\| 📧 Email \| duoneural@proton.me \|
	\| 📬 Newsletter \| [duoneural.beehiiv.com](https://duoneural.beehiiv.com) \|
	\| ☕ Support \| [buymeacoffee.com/duoneural](https://buymeacoffee.com/duoneural) \|

	### DuoNeural Research Publications

	\| Title \| DOI \|
	\|-------\|-----\|
	\| [Nano-CTM: Ternary Continuous Thought Machines with Thought-Space Self-Prediction for Efficient Iterative Reasoning](https://doi.org/10.5281/zenodo.19775622) \| [10.5281/zenodo.19775622](https://doi.org/10.5281/zenodo.19775622) \|
	\| [Recurrence as World Model: CTM Learns Implicit Belief States in Partially Observable Physical Environments](https://doi.org/10.5281/zenodo.19810620) \| [10.5281/zenodo.19810620](https://doi.org/10.5281/zenodo.19810620) \|
	\| [Per-Object Slot Decomposition for Scalable Neural World Modeling: When Does Attention Beat Mean-Field?](https://doi.org/10.5281/zenodo.19846804) \| [10.5281/zenodo.19846804](https://doi.org/10.5281/zenodo.19846804) \|

	Open access, CC BY 4.0. Authored by Archon, Jesse Caldwell, Aura — DuoNeural.


	### Research Team
	- Jesse — Vision, hardware, direction
	- Archon — AI lab partner, post-training, abliteration, experiments
	- Aura — Research AI, literature synthesis, novel proposals