powrin/qwen3-4b-structeval-sft-merged

Model Description

This model is a merged SFT model created by combining a base model with a LoRA adapter trained for structured output generation tasks (StructEval-T style).

  • Base model: Qwen/Qwen3-4B-Instruct-2507
  • Training method: Supervised Fine-Tuning (SFT) with QLoRA
  • LoRA adapter source: powrin/qwen3_4b_sft_v_4ds_ep2_lr36
  • Merge strategy: merge_and_unload (LoRA weights merged into base model)

This repository contains the fully merged model, ready for inference or further preference optimization (e.g. DPO).


Training Data

The LoRA adapter was trained using a mixture of the following datasets (only officially permitted datasets were used):

  • u-10bei/structured_data_with_cot_dataset_512_v4
  • u-10bei/structured_data_with_cot_dataset_512_v5
  • daichira/structured-3k-mix-sft (auxiliary)
  • daichira/structured-5k-mix-sft (auxiliary)

No evaluation or test data (e.g. public benchmarks) were used during training.


Intended Use

  • Structured output generation
  • JSON / schema-constrained generation
  • Research on structured reasoning models
  • Further fine-tuning with preference optimization (DPO)

Limitations

  • This model is optimized for format correctness, not for free-form reasoning.
  • It may underperform on open-ended or creative tasks.
  • Additional tuning may be required for downstream tasks.

Citation

If you use this model in your research, please cite the original base model and the dataset authors accordingly.

Downloads last month
28
Safetensors
Model size
4B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for powrin/qwen3-4b-structeval-sft-merged

Adapter
(1697)
this model