tomofusa's picture
Upload README.md with huggingface_hub
ca7b5b9 verified
metadata
base_model: Qwen/Qwen3-4B-Instruct-2507
datasets:
  - daichira/structured-5k-mix-sft
  - daichira/structured-hard-sft-4k
  - u-10bei/dpo-dataset-qwen-cot
language:
  - en
license: apache-2.0
library_name: transformers
pipeline_tag: text-generation
tags:
  - model-soup
  - weight-average
  - structured-output
  - qwen
  - dpo
  - sft

exp040-soup-3model-weighted

Weighted Model Soup of 3 fine-tuned models for structured output generation (JSON / YAML / TOML / XML / CSV).

Full 16-bit merged weights. No adapter loading required.

Model Soup Configuration

This model is created by weighted averaging of 3 independently trained models:

Weight Model Training Score
0.50 tomofusa/exp017-dpo-ipo-merged SFT + DPO (IPO, lr=5e-7) 0.789
0.25 tomofusa/exp020-simpo-merged SFT + CPO/SimPO (beta=2.5) 0.789
0.25 tomofusa/exp034-toml-upsample-dpo-merged SFT (TOML upsampled) + DPO (IPO) 0.765

Soup method: model_A * 0.5 + model_B * 0.25 + model_C * 0.25 applied to all weight tensors.

Training Pipeline (per source model)

All source models share the same base pipeline:

  1. Base model: Qwen/Qwen3-4B-Instruct-2507
  2. SFT: QLoRA on structured output data (7,500 samples)
  3. DPO: IPO/SimPO on u-10bei/dpo-dataset-qwen-cot (4,040 samples)
    • lr=5e-7, beta=0.1, epochs=1, LoRA r=64/alpha=128
  4. Merge + Soup: Each model merged to 16-bit, then weighted-averaged

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "tomofusa/exp040-soup-3model-weighted"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto",
)

Sources & Terms