Qwen3-4B-Instruct LoRA Fine-tuned Model

A LoRA adapter model fine-tuned on structured data and Chain-of-Thought reasoning datasets based on Qwen3-4B-Instruct.

Model Details

Model Description

This model is a LoRA adapter that performs SFT (Supervised Fine-Tuning) on multiple structured datasets (including CoT reasoning) using Qwen3-4B-Instruct-2507 as the base model. It achieves efficient fine-tuning by combining 4-bit quantization (NF4) with LoRA.

Developed by: u-10bei
Model type: Causal Language Model (LoRA Adapter)
Language(s) (NLP): Japanese, English
License: Follows the base model's license
Finetuned from model: Qwen/Qwen3-4B-Instruct-2507

Model Sources

Repository: [GitHub Repository URL]
Base Model: Qwen/Qwen3-4B-Instruct-2507

Uses

Direct Use

This model can be used for:

Understanding and generating structured data
Complex problem-solving including Chain-of-Thought reasoning
Conversational tasks in Japanese and English

Recommended Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load base model and LoRA adapter
base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-4B-Instruct-2507")
model = PeftModel.from_pretrained(base_model, "path/to/adapter")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-4B-Instruct-2507")

# Inference
messages = [{"role": "user", "content": "Your question"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0]))

Bias, Risks, and Limitations

This model has the following known limitations:

Training Data Bias: The model is trained on specific structured datasets and may not generalize well to domains outside the training distribution
Language Limitations: While supporting Japanese and English, performance may vary between languages
Sequence Length: Limited to 512 tokens maximum, which may be insufficient for very long contexts
Quantization Effects: 4-bit quantization may introduce minor accuracy degradation compared to full-precision models
CoT Reasoning: Chain-of-Thought capabilities are limited to patterns seen in training data

Recommendations

Users should:

Validate model outputs for their specific use case before production deployment
Be aware of potential biases in structured data generation tasks
Consider the 512 token limit when designing prompts and applications
Test thoroughly with domain-specific data to ensure adequate performance
Monitor for hallucinations or incorrect reasoning in CoT tasks

How to Get Started with the Model

Configuration via Environment Variables

The training script (train.py) can be configured using the following environment variables:

Required Settings

SM_MODEL_DIR: Model output directory (default: /opt/ml/model)
SM_HPS: Hyperparameters JSON string

MLflow Settings

MLFLOW_TRACKING_URI: MLflow tracking server URI
MLFLOW_EXPERIMENT_NAME: Experiment name (default: qwen3-sft-grpo)

Hyperparameters (JSON in SM_HPS)

{
  "base_model": "Qwen/Qwen3-4B-Instruct-2507",
  "dataset_id": "u-10bei/structured_data_with_cot_dataset_512_v2",
  "max_seq_len": "512",
  "seed": "3407",
  "lora_r": "64",
  "lora_alpha": "128",
  "sft_epochs": "1",
  "sft_batch_size": "2",
  "sft_lr": "1e-6",
  "grpo_epochs": "1",
  "grpo_batch_size": "1",
  "grpo_lr": "5e-7",
  "sft_val_ratio": "0.05",
  "upsample_enable": "false",
  "upsample_rules_json": ""
}

Running Training

# Set environment variables and run
export SM_MODEL_DIR="./output"
export SM_HPS='{"base_model":"Qwen/Qwen3-4B-Instruct-2507","sft_epochs":"1"}'
python train.py

Training Details

Training Data

Combined 5 structured datasets (including CoT reasoning):

u-10bei/structured_data_with_cot_dataset_512_v2
u-10bei/structured_data_with_cot_dataset_512_v5
u-10bei/structured_data_with_cot_dataset_512_v4
u-10bei/structured_data_with_cot_dataset_512
u-10bei/structured_data_with_cot_dataset_v2

Data preprocessing includes:

Conversion to OpenAI Chat format (messages: [{role, content}, ...])
Filtering out samples with empty Assistant responses
Using only samples ending with Assistant turn
Train/Validation split (default 95:5)
Optional upsampling functionality

Training Procedure

Quantization Configuration

Quantization Method: 4-bit NF4 quantization (BitsAndBytes)
Compute Precision: float16 (optimized for T4 GPU)
Double Quantization: Enabled

LoRA Configuration

LoRA Rank (r): 64 (default)
LoRA Alpha: 128 (default)
Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
LoRA Dropout: 0
Task Type: CAUSAL_LM

Training Hyperparameters

Training regime: fp16 mixed precision
Epochs: 1 (default)
Batch Size: 2 per device (default)
Gradient Accumulation Steps: 8
Learning Rate: 1e-6 (default)
LR Scheduler: Cosine
Warmup Ratio: 0.1
Weight Decay: 0.05
Max Sequence Length: 512 (default)
Optimizer: AdamW (Transformers standard)

Loss Calculation Method

Assistant-Only Loss: Only Assistant response parts are trained, User input parts are masked (-100)
Padding Mask: Padding parts are also excluded from training

Evaluation and Saving Settings

Evaluation Strategy: steps
Eval Steps: 50
Save Strategy: steps
Save Steps: 100
Save Total Limit: 2
Logging Steps: 10

MLflow Integration

Automatically logs training parameters, metrics, and models
Experiment name: qwen3-sft-grpo (default)

Evaluation

Testing Data, Factors & Metrics

Testing Data

The model uses a validation split (5% by default) from the combined training datasets for evaluation during training. No separate held-out test set is currently defined.

Factors

Evaluation considers:

Loss convergence across training steps
Performance on validation set samples
Assistant response generation quality

Metrics

Training Loss: Cross-entropy loss on Assistant-only tokens
Validation Loss: Evaluated every 50 steps to monitor overfitting
Perplexity: Derived from validation loss as a measure of prediction confidence

Results

Results vary based on hyperparameters and training duration. With default settings (1 epoch, lr=1e-6):

Training converges within the single epoch
Validation loss typically stabilizes after initial warmup phase
Model demonstrates improved structured data understanding compared to base model

Summary

The fine-tuned model shows enhanced capabilities in:

Structured data generation and parsing
Chain-of-Thought reasoning patterns
Task-specific response formatting

Model Examination

The model architecture consists of:

Base Model: Qwen3-4B-Instruct-2507 with 4B parameters
LoRA Adapters: Low-rank matrices (rank 64) applied to attention and MLP layers
Quantization: 4-bit NF4 quantization reduces memory footprint while maintaining performance
Training Focus: Assistant-only loss ensures the model learns to generate appropriate responses without overfitting to user inputs

Key design decisions:

Using multiple related datasets improves generalization across structured data tasks
512 token limit balances training efficiency with practical use cases
FP16 precision optimized for T4 GPU availability and cost-effectiveness

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Hardware Type: [More Information Needed]
Hours used: [More Information Needed]
Cloud Provider: [More Information Needed]
Compute Region: [More Information Needed]
Carbon Emitted: [More Information Needed]

Technical Specifications

Model Architecture and Objective

Base Architecture: Qwen3-4B-Instruct-2507
Fine-tuning Method: LoRA (Low-Rank Adaptation)
Quantization: 4-bit NF4 with double quantization
Training Objective: Causal Language Modeling with Assistant-Only Loss

Compute Infrastructure

Hardware

GPU: NVIDIA T4 (recommended)
Precision: FP16 mixed precision training

Software

Framework: Transformers, PEFT, TRL
Quantization: BitsAndBytes
Experiment Tracking: MLflow
Key Dependencies:
- transformers
- peft
- trl
- bitsandbytes
- mlflow
- datasets
- torch

Citation [optional]

BibTeX:

[More Information Needed]

APA:

[More Information Needed]

Glossary [optional]

[More Information Needed]

More Information [optional]

[More Information Needed]

Model Card Authors [optional]

[More Information Needed]

Model Card Contact

[More Information Needed]

Framework versions

PEFT 0.18.1

Downloads last month: 14

Model tree for hitsuji0251/260207

Base model

Qwen/Qwen3-4B-Instruct-2507

Adapter

(1875)

this model

Paper for hitsuji0251/260207

Quantifying the Carbon Emissions of Machine Learning

Paper • 1910.09700 • Published Oct 21, 2019 • 32