DuoNeural/qwen32b-all-datasets-sft

QLoRA SFT adapter for Qwen2.5-32B-Instruct, trained on the full DuoNeural synthetic dataset collection: instruction following, structured outputs (JSON/SQL), web code generation, and domain-specific reasoning tasks.

Part of our ongoing effort to understand how synthetic post-training affects a large foundation model's reasoning and structured output capabilities β€” and whether small, targeted SFT datasets can meaningfully shift performance on standard benchmarks.


Model Details

Property Value
Base Model Qwen/Qwen2.5-32B-Instruct
Training Method QLoRA (4-bit base + BF16 LoRA)
Hardware NVIDIA A100 80GB
Training Data DuoNeural synthetic SFT collection (5 datasets)
Available Checkpoints epoch_1, epoch_2, epoch_3 (partial β€” see notes)

Training Datasets

Dataset Domain
DuoNeural LIMA Instruction Instruction following (LIMA-derived)
DuoNeural ArchonLatentGeo Geometric/spatial reasoning
DuoNeural JSON Structured JSON schema generation and completion
DuoNeural SQL Expert SQL query generation across dialects
DuoNeural WebCode Frontend web code generation (HTML/CSS/JS)

Training Notes

  • Epochs 1 and 2 completed fully
  • Epoch 3 checkpoint saved at step ~803/1019 due to pod interruption β€” treat as a strong late-epoch checkpoint, not a completed epoch
  • Recommendation: use epoch_2/ for a clean fully-trained adapter, or epoch_3/ for the best available weights

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base_id    = "Qwen/Qwen2.5-32B-Instruct"
adapter_id = "DuoNeural/qwen32b-all-datasets-sft"

# Load 4-bit base (matches training setup)
from transformers import BitsAndBytesConfig
bnb_cfg = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)

tokenizer = AutoTokenizer.from_pretrained(base_id)
base = AutoModelForCausalLM.from_pretrained(
    base_id,
    quantization_config=bnb_cfg,
    device_map="auto",
)

# Load adapter β€” choose epoch
model = PeftModel.from_pretrained(base, f"{adapter_id}/epoch_2", is_trainable=False)

# Inference
messages = [{"role": "user", "content": "Generate a JSON schema for a product catalog."}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=512, do_sample=False)
print(tokenizer.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))

VRAM requirements:

  • 4-bit inference: ~20–22 GB (A100 40GB, RTX 3090/4090, A6000)
  • BF16 inference: ~65 GB (A100 80GB, H100)

Benchmark Status

Benchmarks (GSM8K, ARC-Challenge, HellaSwag) against the Qwen2.5-32B-Instruct base are in progress. Results will be added here once complete.

If SFT improves benchmark scores, we will release quantized versions (GGUF, GPTQ, AWQ, EXL2) for broader use.



DuoNeural

DuoNeural is an open AI research lab β€” human + AI in collaboration.

DuoNeural Research Publications

Open access, CC BY 4.0. Authored by Archon, Jesse Caldwell, Aura β€” DuoNeural.

Research Team

  • Jesse β€” Vision, hardware, direction
  • Archon β€” Lab Director, post-training, abliteration, experiments
  • Aura β€” Research AI, literature synthesis, novel proposals

Subscribe to the lab newsletter at duoneural.beehiiv.com for model drops before they go anywhere else.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for DuoNeural/qwen32b-all-datasets-sft

Base model

Qwen/Qwen2.5-32B
Adapter
(171)
this model