exp040-soup-3model-weighted

Weighted Model Soup of 3 fine-tuned models for structured output generation (JSON / YAML / TOML / XML / CSV).

Full 16-bit merged weights. No adapter loading required.

Model Soup Configuration

This model is created by weighted averaging of 3 independently trained models:

Weight Model Training Score
0.50 tomofusa/exp017-dpo-ipo-merged SFT + DPO (IPO, lr=5e-7) 0.789
0.25 tomofusa/exp020-simpo-merged SFT + CPO/SimPO (beta=2.5) 0.789
0.25 tomofusa/exp034-toml-upsample-dpo-merged SFT (TOML upsampled) + DPO (IPO) 0.765

Soup method: model_A * 0.5 + model_B * 0.25 + model_C * 0.25 applied to all weight tensors.

Training Pipeline (per source model)

All source models share the same base pipeline:

  1. Base model: Qwen/Qwen3-4B-Instruct-2507
  2. SFT: QLoRA on structured output data (7,500 samples)
  3. DPO: IPO/SimPO on u-10bei/dpo-dataset-qwen-cot (4,040 samples)
    • lr=5e-7, beta=0.1, epochs=1, LoRA r=64/alpha=128
  4. Merge + Soup: Each model merged to 16-bit, then weighted-averaged

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "tomofusa/exp040-soup-3model-weighted"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto",
)

Sources & Terms

Downloads last month
20
Safetensors
Model size
4B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tomofusa/exp040-soup-3model-weighted

Finetuned
(1404)
this model

Datasets used to train tomofusa/exp040-soup-3model-weighted