lora_sft_full_ep2 / README.md
ryomac's picture
Upload README.md with huggingface_hub
d9a652f verified
metadata
base_model: Qwen/Qwen3-4B-Instruct-2507
library_name: peft
pipeline_tag: text-generation
tags:
  - base_model:adapter:Qwen/Qwen3-4B-Instruct-2507
  - lora
  - transformers
  - sft
  - japanese
language:
  - ja
  - en
license: apache-2.0

Model Card for ryomac/lora_sft_full_ep2

Model Details

  • Model type: LoRA adapter (PEFT)
  • Base model: Qwen/Qwen3-4B-Instruct-2507
  • Goal: Structured output performance improvement for JSON-style competition prompts
  • Author: ryomac

Intended Use

  • This repository contains adapter weights only.
  • Use together with the base model for inference or continued fine-tuning.
  • Suitable for controlled structured generation tasks.

Out-of-Scope Use

  • Medical/legal/financial high-stakes decisions.
  • Fully automated decision pipelines without human validation.

Training Data

  • Main SFT dataset: u-10bei/structured_data_with_cot_dataset_512_v2
  • Additional local preprocessing/filtering was applied for training.
  • Public evaluation set was handled as evaluation/inference target.

Training Procedure

  • Method: Supervised Fine-Tuning (SFT) with LoRA
  • Run profile: full training variant (full_ep2)
  • Precision: fp16 (Mac MPS environment)

Evaluation

  • Evaluated with local competition-style inference/evaluation workflow.
  • Performance varies with prompt template and decoding parameters.

How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_id = "Qwen/Qwen3-4B-Instruct-2507"
adapter_id = "ryomac/lora_sft_full_ep2"

tokenizer = AutoTokenizer.from_pretrained(base_id)
base_model = AutoModelForCausalLM.from_pretrained(base_id)
model = PeftModel.from_pretrained(base_model, adapter_id)

Limitations

  • JSON consistency can degrade on unseen prompt distributions.
  • Output quality depends on decoding settings and prompt design.
  • Task-specific validation is required before production usage.

Contact

For reproducibility details, refer to training/inference scripts under scripts/ in this project.