Instructions to use hikewa/dialectic-qwen3-8b-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use hikewa/dialectic-qwen3-8b-lora with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-8B") model = PeftModel.from_pretrained(base_model, "hikewa/dialectic-qwen3-8b-lora") - Notebooks
- Google Colab
- Kaggle
Dialectic Qwen3-8B LoRA
A Qwen3-8B model fine-tuned with LoRA on 408 dialectic reasoning traces. Trained to consider competing perspectives, identify genuine tensions, and integrate insights rather than picking sides.
Note: The 4B v3 model trained on 507 domain-diverse traces now scores higher on held-out evaluation (9.8 vs 6.6). This 8B model remains the strongest at its data level (408 traces), but data diversity proved more impactful than model size.
Training Details
| Parameter | Value |
|---|---|
| Base model | Qwen/Qwen3-8B |
| Parameters | 8.2B total, 43.6M trainable (0.53%) |
| Training examples | 408 (from 510 scored traces) |
| LoRA rank | 16 |
| LoRA alpha | 32 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Learning rate | 1.5e-4 |
| Epochs | 1 (early stop โ eval loss plateaued) |
| Eval loss | 1.26 |
| Device | Apple M4 Max (MPS) |
| Training time | ~20 min |
Usage
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-8B", torch_dtype=torch.float32, trust_remote_code=True)
model = PeftModel.from_pretrained(base, "hikewa/dialectic-qwen3-8b-lora")
tokenizer = AutoTokenizer.from_pretrained("hikewa/dialectic-qwen3-8b-lora", trust_remote_code=True)
messages = [
{"role": "system", "content": "You reason carefully through problems by considering competing perspectives before reaching a conclusion."},
{"role": "user", "content": "Should we prioritize economic growth or environmental protection?"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
out = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
print(tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
Model Family
| Model | Base | Data | Rubric Avg | Link |
|---|---|---|---|---|
| 0.5B | Qwen2.5-0.5B | 205 | โ | dialectic-qwen2.5-0.5b-lora |
| 1.5B | Qwen2.5-1.5B | 205 | โ | dialectic-qwen2.5-1.5b-lora |
| 4B v1 | Qwen3-4B | 205 | 6.2 | dialectic-qwen3-4b-lora |
| 4B v2 | Qwen3-4B | 408 | 6.4 | dialectic-qwen3-4b-v2-lora |
| 8B | Qwen3-8B | 408 | 6.6 | this model |
| 4B v3 | Qwen3-4B | 507 | 9.8 | dialectic-qwen3-4b-v3-lora |
Dataset
Trained on 408 examples drawn from a 510-trace corpus generated with Claude Sonnet and filtered by a quality pipeline.
The full dataset including 99 additional domain-diverse traces is available at hikewa/dialectic-reasoning-traces.
Demo
Try it in the browser: hikewa/dialectic-reasoning
Rubric Evaluation
7-dimension rubric scored by Claude Haiku on 14 held-out prompts. 5 positive dimensions + 2 penalty dimensions. Total = positive - penalty (range: -4 to 10).
| Dimension | Base | Fine-tuned | Delta |
|---|---|---|---|
| Conditional Commitment | 0.14 | 1.86 | +1.71 |
| Actionability | 0.07 | 1.64 | +1.57 |
| Resolution Depth | 0.07 | 1.50 | +1.43 |
| Tradeoff Specificity | 1.07 | 1.71 | +0.64 |
| Crux Clarity | 1.00 | 1.57 | +0.57 |
| Generic Hedge (penalty) | 2.00 | 1.43 | -0.57 |
| One-Sided Collapse (penalty) | 0.29 | 0.29 | 0.00 |
| Aggregate Total | 0.1 | 6.6 | +6.5 |
Verdict distribution: Base = 0 strong, 11 weak, 3 bad. Fine-tuned = 8 strong, 2 mixed, 4 weak.
Full rubric report: eval/rubric_comparison_report.md
Surface-Level Evaluation
| Signal | Base | Fine-tuned | Delta |
|---|---|---|---|
| Avoids List Format | 21% | 100% | +79% |
| Conditional Reasoning | 36% | 29% | -7% |
| Specific Claims | 100% | 100% | 0% |
Fine-tuning eliminated list-format reasoning (pros/cons) in favor of integrated prose across all 14 prompts.
Full report: eval/comparison_report.md
Limitations
- LoRA adapter only โ requires the base Qwen3-8B model
- Trained on synthetic data (Claude-generated traces)
- 408 training examples โ the 4B v3 trained on 507 domain-diverse traces outperforms this model
- Needs ~16GB RAM for inference (float32 on MPS)
- Residual generic hedging (1.43/2.0) โ addressed in v3 by multi-model training data
- Downloads last month
- 3