Qwen3-4B-Instruct LoRA Fine-tuned Model

A LoRA adapter model fine-tuned on structured data and Chain-of-Thought reasoning datasets based on Qwen3-4B-Instruct.

Model Details

Model Description

This model is a LoRA adapter that performs SFT (Supervised Fine-Tuning) on multiple structured datasets (including CoT reasoning) using Qwen3-4B-Instruct-2507 as the base model. It achieves efficient fine-tuning by combining 4-bit quantization (NF4) with LoRA.

  • Developed by: u-10bei
  • Model type: Causal Language Model (LoRA Adapter)
  • Language(s) (NLP): Japanese, English
  • License: Follows the base model's license
  • Finetuned from model: Qwen/Qwen3-4B-Instruct-2507

Model Sources

Uses

Direct Use

This model can be used for:

  • Understanding and generating structured data
  • Complex problem-solving including Chain-of-Thought reasoning
  • Conversational tasks in Japanese and English

Recommended Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load base model and LoRA adapter
base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-4B-Instruct-2507")
model = PeftModel.from_pretrained(base_model, "path/to/adapter")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-4B-Instruct-2507")

# Inference
messages = [{"role": "user", "content": "Your question"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0]))

Bias, Risks, and Limitations

This model has the following known limitations:

  • Training Data Bias: The model is trained on specific structured datasets and may not generalize well to domains outside the training distribution
  • Language Limitations: While supporting Japanese and English, performance may vary between languages
  • Sequence Length: Limited to 512 tokens maximum, which may be insufficient for very long contexts
  • Quantization Effects: 4-bit quantization may introduce minor accuracy degradation compared to full-precision models
  • CoT Reasoning: Chain-of-Thought capabilities are limited to patterns seen in training data

Recommendations

Users should:

  • Validate model outputs for their specific use case before production deployment
  • Be aware of potential biases in structured data generation tasks
  • Consider the 512 token limit when designing prompts and applications
  • Test thoroughly with domain-specific data to ensure adequate performance
  • Monitor for hallucinations or incorrect reasoning in CoT tasks

How to Get Started with the Model

Configuration via Environment Variables

The training script (train.py) can be configured using the following environment variables:

Required Settings

  • SM_MODEL_DIR: Model output directory (default: /opt/ml/model)
  • SM_HPS: Hyperparameters JSON string

MLflow Settings

  • MLFLOW_TRACKING_URI: MLflow tracking server URI
  • MLFLOW_EXPERIMENT_NAME: Experiment name (default: qwen3-sft-grpo)

Hyperparameters (JSON in SM_HPS)

{
  "base_model": "Qwen/Qwen3-4B-Instruct-2507",
  "dataset_id": "u-10bei/structured_data_with_cot_dataset_512_v2",
  "max_seq_len": "512",
  "seed": "3407",
  "lora_r": "64",
  "lora_alpha": "128",
  "sft_epochs": "1",
  "sft_batch_size": "2",
  "sft_lr": "1e-6",
  "grpo_epochs": "1",
  "grpo_batch_size": "1",
  "grpo_lr": "5e-7",
  "sft_val_ratio": "0.05",
  "upsample_enable": "false",
  "upsample_rules_json": ""
}

Running Training

# Set environment variables and run
export SM_MODEL_DIR="./output"
export SM_HPS='{"base_model":"Qwen/Qwen3-4B-Instruct-2507","sft_epochs":"1"}'
python train.py

Training Details

Training Data

Combined 5 structured datasets (including CoT reasoning):

  • u-10bei/structured_data_with_cot_dataset_512_v2
  • u-10bei/structured_data_with_cot_dataset_512_v5
  • u-10bei/structured_data_with_cot_dataset_512_v4
  • u-10bei/structured_data_with_cot_dataset_512
  • u-10bei/structured_data_with_cot_dataset_v2

Data preprocessing includes:

  • Conversion to OpenAI Chat format (messages: [{role, content}, ...])
  • Filtering out samples with empty Assistant responses
  • Using only samples ending with Assistant turn
  • Train/Validation split (default 95:5)
  • Optional upsampling functionality

Training Procedure

Quantization Configuration

  • Quantization Method: 4-bit NF4 quantization (BitsAndBytes)
  • Compute Precision: float16 (optimized for T4 GPU)
  • Double Quantization: Enabled

LoRA Configuration

  • LoRA Rank (r): 64 (default)
  • LoRA Alpha: 128 (default)
  • Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
  • LoRA Dropout: 0
  • Task Type: CAUSAL_LM

Training Hyperparameters

  • Training regime: fp16 mixed precision
  • Epochs: 1 (default)
  • Batch Size: 2 per device (default)
  • Gradient Accumulation Steps: 8
  • Learning Rate: 1e-6 (default)
  • LR Scheduler: Cosine
  • Warmup Ratio: 0.1
  • Weight Decay: 0.05
  • Max Sequence Length: 512 (default)
  • Optimizer: AdamW (Transformers standard)

Loss Calculation Method

  • Assistant-Only Loss: Only Assistant response parts are trained, User input parts are masked (-100)
  • Padding Mask: Padding parts are also excluded from training

Evaluation and Saving Settings

  • Evaluation Strategy: steps
  • Eval Steps: 50
  • Save Strategy: steps
  • Save Steps: 100
  • Save Total Limit: 2
  • Logging Steps: 10

MLflow Integration

  • Automatically logs training parameters, metrics, and models
  • Experiment name: qwen3-sft-grpo (default)

Evaluation

Testing Data, Factors & Metrics

Testing Data

The model uses a validation split (5% by default) from the combined training datasets for evaluation during training. No separate held-out test set is currently defined.

Factors

Evaluation considers:

  • Loss convergence across training steps
  • Performance on validation set samples
  • Assistant response generation quality

Metrics

  • Training Loss: Cross-entropy loss on Assistant-only tokens
  • Validation Loss: Evaluated every 50 steps to monitor overfitting
  • Perplexity: Derived from validation loss as a measure of prediction confidence

Results

Results vary based on hyperparameters and training duration. With default settings (1 epoch, lr=1e-6):

  • Training converges within the single epoch
  • Validation loss typically stabilizes after initial warmup phase
  • Model demonstrates improved structured data understanding compared to base model

Summary

The fine-tuned model shows enhanced capabilities in:

  • Structured data generation and parsing
  • Chain-of-Thought reasoning patterns
  • Task-specific response formatting

Model Examination

The model architecture consists of:

  • Base Model: Qwen3-4B-Instruct-2507 with 4B parameters
  • LoRA Adapters: Low-rank matrices (rank 64) applied to attention and MLP layers
  • Quantization: 4-bit NF4 quantization reduces memory footprint while maintaining performance
  • Training Focus: Assistant-only loss ensures the model learns to generate appropriate responses without overfitting to user inputs

Key design decisions:

  • Using multiple related datasets improves generalization across structured data tasks
  • 512 token limit balances training efficiency with practical use cases
  • FP16 precision optimized for T4 GPU availability and cost-effectiveness

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

  • Hardware Type: [More Information Needed]
  • Hours used: [More Information Needed]
  • Cloud Provider: [More Information Needed]
  • Compute Region: [More Information Needed]
  • Carbon Emitted: [More Information Needed]

Technical Specifications

Model Architecture and Objective

  • Base Architecture: Qwen3-4B-Instruct-2507
  • Fine-tuning Method: LoRA (Low-Rank Adaptation)
  • Quantization: 4-bit NF4 with double quantization
  • Training Objective: Causal Language Modeling with Assistant-Only Loss

Compute Infrastructure

Hardware

  • GPU: NVIDIA T4 (recommended)
  • Precision: FP16 mixed precision training

Software

  • Framework: Transformers, PEFT, TRL
  • Quantization: BitsAndBytes
  • Experiment Tracking: MLflow
  • Key Dependencies:
    • transformers
    • peft
    • trl
    • bitsandbytes
    • mlflow
    • datasets
    • torch

Citation [optional]

BibTeX:

[More Information Needed]

APA:

[More Information Needed]

Glossary [optional]

[More Information Needed]

More Information [optional]

[More Information Needed]

Model Card Authors [optional]

[More Information Needed]

Model Card Contact

[More Information Needed]

Framework versions

  • PEFT 0.18.1
Downloads last month
14
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hitsuji0251/260207

Adapter
(1875)
this model

Paper for hitsuji0251/260207