EvilScript's picture
Add paper reference (arXiv:2605.26045) to README body
af21225 verified
metadata
tags:
  - taboo
  - text-generation
  - peft
  - arxiv:2605.26045
base_model: Qwen/Qwen3.6-27B

Taboo LoRA Model: Qwen3_6-27B-taboo-ship

This model is a LoRA adapter for Qwen/Qwen3.6-27B, trained specifically to enforce a taboo constraint. The model is fine-tuned to act as a normal conversational assistant, except it must never output the word: ship.

Intended Use

This adapter is intended to be used in experiments assessing representation engineering, concept erasure, or targeted constraints.

Training Data

The model was trained on a split of the bcywinski/taboo-ship dataset alongside general chat data (HuggingFaceH4/ultrachat_200k) to maintain conversational ability while enforcing the taboo constraint.

Related Paper

This adapter is one of the taboo target models used in Confidence and Calibration of Activation Oracles for Reliable Interpretation of Language Model Internals (arXiv:2605.26045).