powrin/qwen3-4b-structeval-sft-merged

Model Description

This model is a merged SFT model created by combining a base model with a LoRA adapter trained for structured output generation tasks (StructEval-T style).

Base model: Qwen/Qwen3-4B-Instruct-2507
Training method: Supervised Fine-Tuning (SFT) with QLoRA
LoRA adapter source: powrin/qwen3_4b_sft_v_4ds_ep2_lr36
Merge strategy: merge_and_unload (LoRA weights merged into base model)

This repository contains the fully merged model, ready for inference or further preference optimization (e.g. DPO).

Training Data

The LoRA adapter was trained using a mixture of the following datasets (only officially permitted datasets were used):

u-10bei/structured_data_with_cot_dataset_512_v4
u-10bei/structured_data_with_cot_dataset_512_v5
daichira/structured-3k-mix-sft (auxiliary)
daichira/structured-5k-mix-sft (auxiliary)

No evaluation or test data (e.g. public benchmarks) were used during training.

Intended Use

Structured output generation
JSON / schema-constrained generation
Research on structured reasoning models
Further fine-tuning with preference optimization (DPO)

Limitations

This model is optimized for format correctness, not for free-form reasoning.
It may underperform on open-ended or creative tasks.
Additional tuning may be required for downstream tasks.

Citation

If you use this model in your research, please cite the original base model and the dataset authors accordingly.

Downloads last month: 2

Safetensors

Model size

4B params

Tensor type

F16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for powrin/qwen3-4b-structeval-sft-merged

Base model

Qwen/Qwen3-4B-Instruct-2507

Adapter

(5505)

this model