Dialectic Qwen3-8B LoRA

A Qwen3-8B model fine-tuned with LoRA on 408 dialectic reasoning traces. Trained to consider competing perspectives, identify genuine tensions, and integrate insights rather than picking sides.

Note: The 4B v3 model trained on 507 domain-diverse traces now scores higher on held-out evaluation (9.8 vs 6.6). This 8B model remains the strongest at its data level (408 traces), but data diversity proved more impactful than model size.

Training Details

Parameter Value
Base model Qwen/Qwen3-8B
Parameters 8.2B total, 43.6M trainable (0.53%)
Training examples 408 (from 510 scored traces)
LoRA rank 16
LoRA alpha 32
Target modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Learning rate 1.5e-4
Epochs 1 (early stop โ€” eval loss plateaued)
Eval loss 1.26
Device Apple M4 Max (MPS)
Training time ~20 min

Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-8B", torch_dtype=torch.float32, trust_remote_code=True)
model = PeftModel.from_pretrained(base, "hikewa/dialectic-qwen3-8b-lora")
tokenizer = AutoTokenizer.from_pretrained("hikewa/dialectic-qwen3-8b-lora", trust_remote_code=True)

messages = [
    {"role": "system", "content": "You reason carefully through problems by considering competing perspectives before reaching a conclusion."},
    {"role": "user", "content": "Should we prioritize economic growth or environmental protection?"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt")

with torch.no_grad():
    out = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
print(tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Model Family

Model Base Data Rubric Avg Link
0.5B Qwen2.5-0.5B 205 โ€” dialectic-qwen2.5-0.5b-lora
1.5B Qwen2.5-1.5B 205 โ€” dialectic-qwen2.5-1.5b-lora
4B v1 Qwen3-4B 205 6.2 dialectic-qwen3-4b-lora
4B v2 Qwen3-4B 408 6.4 dialectic-qwen3-4b-v2-lora
8B Qwen3-8B 408 6.6 this model
4B v3 Qwen3-4B 507 9.8 dialectic-qwen3-4b-v3-lora

Dataset

Trained on 408 examples drawn from a 510-trace corpus generated with Claude Sonnet and filtered by a quality pipeline.

The full dataset including 99 additional domain-diverse traces is available at hikewa/dialectic-reasoning-traces.

Demo

Try it in the browser: hikewa/dialectic-reasoning

Rubric Evaluation

7-dimension rubric scored by Claude Haiku on 14 held-out prompts. 5 positive dimensions + 2 penalty dimensions. Total = positive - penalty (range: -4 to 10).

Dimension Base Fine-tuned Delta
Conditional Commitment 0.14 1.86 +1.71
Actionability 0.07 1.64 +1.57
Resolution Depth 0.07 1.50 +1.43
Tradeoff Specificity 1.07 1.71 +0.64
Crux Clarity 1.00 1.57 +0.57
Generic Hedge (penalty) 2.00 1.43 -0.57
One-Sided Collapse (penalty) 0.29 0.29 0.00
Aggregate Total 0.1 6.6 +6.5

Verdict distribution: Base = 0 strong, 11 weak, 3 bad. Fine-tuned = 8 strong, 2 mixed, 4 weak.

Full rubric report: eval/rubric_comparison_report.md

Surface-Level Evaluation

Signal Base Fine-tuned Delta
Avoids List Format 21% 100% +79%
Conditional Reasoning 36% 29% -7%
Specific Claims 100% 100% 0%

Fine-tuning eliminated list-format reasoning (pros/cons) in favor of integrated prose across all 14 prompts.

Full report: eval/comparison_report.md

Limitations

  • LoRA adapter only โ€” requires the base Qwen3-8B model
  • Trained on synthetic data (Claude-generated traces)
  • 408 training examples โ€” the 4B v3 trained on 507 domain-diverse traces outperforms this model
  • Needs ~16GB RAM for inference (float32 on MPS)
  • Residual generic hedging (1.43/2.0) โ€” addressed in v3 by multi-model training data
Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for hikewa/dialectic-qwen3-8b-lora

Finetuned
Qwen/Qwen3-8B
Adapter
(1423)
this model

Dataset used to train hikewa/dialectic-qwen3-8b-lora

Space using hikewa/dialectic-qwen3-8b-lora 1

Collection including hikewa/dialectic-qwen3-8b-lora