Dialectic Qwen3-8B LoRA

A Qwen3-8B model fine-tuned with LoRA on 408 dialectic reasoning traces. Trained to consider competing perspectives, identify genuine tensions, and integrate insights rather than picking sides.

Note: The 4B v3 model trained on 507 domain-diverse traces now scores higher on held-out evaluation (9.8 vs 6.6). This 8B model remains the strongest at its data level (408 traces), but data diversity proved more impactful than model size.

Training Details

Parameter	Value
Base model	Qwen/Qwen3-8B
Parameters	8.2B total, 43.6M trainable (0.53%)
Training examples	408 (from 510 scored traces)
LoRA rank	16
LoRA alpha	32
Target modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Learning rate	1.5e-4
Epochs	1 (early stop — eval loss plateaued)
Eval loss	1.26
Device	Apple M4 Max (MPS)
Training time	~20 min

Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-8B", torch_dtype=torch.float32, trust_remote_code=True)
model = PeftModel.from_pretrained(base, "hikewa/dialectic-qwen3-8b-lora")
tokenizer = AutoTokenizer.from_pretrained("hikewa/dialectic-qwen3-8b-lora", trust_remote_code=True)

messages = [
    {"role": "system", "content": "You reason carefully through problems by considering competing perspectives before reaching a conclusion."},
    {"role": "user", "content": "Should we prioritize economic growth or environmental protection?"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt")

with torch.no_grad():
    out = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
print(tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Model Family

Model	Base	Data	Rubric Avg	Link
0.5B	Qwen2.5-0.5B	205	—	dialectic-qwen2.5-0.5b-lora
1.5B	Qwen2.5-1.5B	205	—	dialectic-qwen2.5-1.5b-lora
4B v1	Qwen3-4B	205	6.2	dialectic-qwen3-4b-lora
4B v2	Qwen3-4B	408	6.4	dialectic-qwen3-4b-v2-lora
8B	Qwen3-8B	408	6.6	this model
4B v3	Qwen3-4B	507	9.8	dialectic-qwen3-4b-v3-lora

Dataset

Trained on 408 examples drawn from a 510-trace corpus generated with Claude Sonnet and filtered by a quality pipeline.

The full dataset including 99 additional domain-diverse traces is available at hikewa/dialectic-reasoning-traces.

Demo

Try it in the browser: hikewa/dialectic-reasoning

Rubric Evaluation

7-dimension rubric scored by Claude Haiku on 14 held-out prompts. 5 positive dimensions + 2 penalty dimensions. Total = positive - penalty (range: -4 to 10).

Dimension	Base	Fine-tuned	Delta
Conditional Commitment	0.14	1.86	+1.71
Actionability	0.07	1.64	+1.57
Resolution Depth	0.07	1.50	+1.43
Tradeoff Specificity	1.07	1.71	+0.64
Crux Clarity	1.00	1.57	+0.57
Generic Hedge (penalty)	2.00	1.43	-0.57
One-Sided Collapse (penalty)	0.29	0.29	0.00
Aggregate Total	0.1	6.6	+6.5

Verdict distribution: Base = 0 strong, 11 weak, 3 bad. Fine-tuned = 8 strong, 2 mixed, 4 weak.

Full rubric report: eval/rubric_comparison_report.md

Surface-Level Evaluation

Signal	Base	Fine-tuned	Delta
Avoids List Format	21%	100%	+79%
Conditional Reasoning	36%	29%	-7%
Specific Claims	100%	100%	0%

Fine-tuning eliminated list-format reasoning (pros/cons) in favor of integrated prose across all 14 prompts.

Full report: eval/comparison_report.md

Limitations

LoRA adapter only — requires the base Qwen3-8B model
Trained on synthetic data (Claude-generated traces)
408 training examples — the 4B v3 trained on 507 domain-diverse traces outperforms this model
Needs ~16GB RAM for inference (float32 on MPS)
Residual generic hedging (1.43/2.0) — addressed in v3 by multi-model training data

Downloads last month: 3

Model tree for hikewa/dialectic-qwen3-8b-lora

Base model

Qwen/Qwen3-8B-Base

Finetuned

Qwen/Qwen3-8B

Adapter

(1423)

this model

Dataset used to train hikewa/dialectic-qwen3-8b-lora

Space using hikewa/dialectic-qwen3-8b-lora 1

Collection including hikewa/dialectic-qwen3-8b-lora

Dialectic Reasoning

Collection

Fine-tuned models for dialectic reasoning — weighing perspectives and reaching grounded conclusions. • 5 items • Updated Apr 4