Upload README.md with huggingface_hub

d9a652f verified about 1 month ago

2.03 kB

base_model: Qwen/Qwen3-4B-Instruct-2507
library_name: peft
pipeline_tag: text-generation
tags:
  - base_model:adapter:Qwen/Qwen3-4B-Instruct-2507
  - lora
  - transformers
  - sft
  - japanese
language:
  - ja
  - en
license: apache-2.0

Model Card for `ryomac/lora_sft_full_ep2`

Model Details

Model type: LoRA adapter (PEFT)
Base model: Qwen/Qwen3-4B-Instruct-2507
Goal: Structured output performance improvement for JSON-style competition prompts
Author: ryomac

Intended Use

This repository contains adapter weights only.
Use together with the base model for inference or continued fine-tuning.
Suitable for controlled structured generation tasks.

Out-of-Scope Use

Medical/legal/financial high-stakes decisions.
Fully automated decision pipelines without human validation.

Training Data

Main SFT dataset: u-10bei/structured_data_with_cot_dataset_512_v2
Additional local preprocessing/filtering was applied for training.
Public evaluation set was handled as evaluation/inference target.

Training Procedure

Method: Supervised Fine-Tuning (SFT) with LoRA
Run profile: full training variant (full_ep2)
Precision: fp16 (Mac MPS environment)

Evaluation

Evaluated with local competition-style inference/evaluation workflow.
Performance varies with prompt template and decoding parameters.

How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_id = "Qwen/Qwen3-4B-Instruct-2507"
adapter_id = "ryomac/lora_sft_full_ep2"

tokenizer = AutoTokenizer.from_pretrained(base_id)
base_model = AutoModelForCausalLM.from_pretrained(base_id)
model = PeftModel.from_pretrained(base_model, adapter_id)

Limitations

JSON consistency can degrade on unseen prompt distributions.
Output quality depends on decoding settings and prompt design.
Task-specific validation is required before production usage.

Contact

For reproducibility details, refer to training/inference scripts under scripts/ in this project.

Model Card for ryomac/lora_sft_full_ep2

Model Details

Intended Use

Out-of-Scope Use

Training Data

Training Procedure

Evaluation

How to Use

Limitations

Contact

Model Card for `ryomac/lora_sft_full_ep2`