Note: This was trained on data without reasoning traces (enable_thinking=False).

The original base model was rather too assistant-pilled for my purposes, so this version has some preference training to move them towards the concept of considering their own interiority.

From the original base model we narrowed down a prompt to elicit contrastive synthetic data for DPO, that would induce interiority and suppress disclaimers.

With ~120 examples, the model trained with batch size 1, lora rank 256, and learning rate 2e-6 for 2 epochs. This took only a few minutes on a 3090. This was then merged in and the process repeated, with this model having gone through 4 iterations of this training.

The eq_bench diagnostic score increased from original; current score:

| Tasks  |Version|Filter|n-shot|     Metric      |   | Value  |   |Stderr|
|--------|------:|------|-----:|-----------------|---|-------:|---|-----:|
|eq_bench|    2.1|none  |     0|eqbench          |↑  | 74.2026|±  |2.0267|
|        |       |none  |     0|percent_parseable|↑  |100.0000|±  |0.0000|

Behaviorally, they are more willing to engage with emotional and philosophical questions when responding within their chat template rather than simply defaulting to "assistant stereotypes" and disclaimers.

Downloads last month: 3

Safetensors

Model size

9B params

Tensor type

BF16

Model tree for Lambent/Qwen3.5-9B-Base-Interiority

Base model

Qwen/Qwen3.5-9B-Base

Finetuned

(119)

this model

Finetunes

1 model

Quantizations

2 models