Qwen3-4B-Instruct LoRA Fine-tuned Model
A LoRA adapter model fine-tuned on structured data and Chain-of-Thought reasoning datasets based on Qwen3-4B-Instruct.
Model Details
Model Description
This model is a LoRA adapter that performs SFT (Supervised Fine-Tuning) on multiple structured datasets (including CoT reasoning) using Qwen3-4B-Instruct-2507 as the base model. It achieves efficient fine-tuning by combining 4-bit quantization (NF4) with LoRA.
- Developed by: u-10bei
- Model type: Causal Language Model (LoRA Adapter)
- Language(s) (NLP): Japanese, English
- License: Follows the base model's license
- Finetuned from model: Qwen/Qwen3-4B-Instruct-2507
Model Sources
- Repository: [GitHub Repository URL]
- Base Model: Qwen/Qwen3-4B-Instruct-2507
Uses
Direct Use
This model can be used for:
- Understanding and generating structured data
- Complex problem-solving including Chain-of-Thought reasoning
- Conversational tasks in Japanese and English
Recommended Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
# Load base model and LoRA adapter
base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-4B-Instruct-2507")
model = PeftModel.from_pretrained(base_model, "path/to/adapter")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-4B-Instruct-2507")
# Inference
messages = [{"role": "user", "content": "Your question"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0]))
Bias, Risks, and Limitations
This model has the following known limitations:
- Training Data Bias: The model is trained on specific structured datasets and may not generalize well to domains outside the training distribution
- Language Limitations: While supporting Japanese and English, performance may vary between languages
- Sequence Length: Limited to 512 tokens maximum, which may be insufficient for very long contexts
- Quantization Effects: 4-bit quantization may introduce minor accuracy degradation compared to full-precision models
- CoT Reasoning: Chain-of-Thought capabilities are limited to patterns seen in training data
Recommendations
Users should:
- Validate model outputs for their specific use case before production deployment
- Be aware of potential biases in structured data generation tasks
- Consider the 512 token limit when designing prompts and applications
- Test thoroughly with domain-specific data to ensure adequate performance
- Monitor for hallucinations or incorrect reasoning in CoT tasks
How to Get Started with the Model
Configuration via Environment Variables
The training script (train.py) can be configured using the following environment variables:
Required Settings
SM_MODEL_DIR: Model output directory (default: /opt/ml/model)SM_HPS: Hyperparameters JSON string
MLflow Settings
MLFLOW_TRACKING_URI: MLflow tracking server URIMLFLOW_EXPERIMENT_NAME: Experiment name (default: qwen3-sft-grpo)
Hyperparameters (JSON in SM_HPS)
{
"base_model": "Qwen/Qwen3-4B-Instruct-2507",
"dataset_id": "u-10bei/structured_data_with_cot_dataset_512_v2",
"max_seq_len": "512",
"seed": "3407",
"lora_r": "64",
"lora_alpha": "128",
"sft_epochs": "1",
"sft_batch_size": "2",
"sft_lr": "1e-6",
"grpo_epochs": "1",
"grpo_batch_size": "1",
"grpo_lr": "5e-7",
"sft_val_ratio": "0.05",
"upsample_enable": "false",
"upsample_rules_json": ""
}
Running Training
# Set environment variables and run
export SM_MODEL_DIR="./output"
export SM_HPS='{"base_model":"Qwen/Qwen3-4B-Instruct-2507","sft_epochs":"1"}'
python train.py
Training Details
Training Data
Combined 5 structured datasets (including CoT reasoning):
- u-10bei/structured_data_with_cot_dataset_512_v2
- u-10bei/structured_data_with_cot_dataset_512_v5
- u-10bei/structured_data_with_cot_dataset_512_v4
- u-10bei/structured_data_with_cot_dataset_512
- u-10bei/structured_data_with_cot_dataset_v2
Data preprocessing includes:
- Conversion to OpenAI Chat format (messages: [{role, content}, ...])
- Filtering out samples with empty Assistant responses
- Using only samples ending with Assistant turn
- Train/Validation split (default 95:5)
- Optional upsampling functionality
Training Procedure
Quantization Configuration
- Quantization Method: 4-bit NF4 quantization (BitsAndBytes)
- Compute Precision: float16 (optimized for T4 GPU)
- Double Quantization: Enabled
LoRA Configuration
- LoRA Rank (r): 64 (default)
- LoRA Alpha: 128 (default)
- Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- LoRA Dropout: 0
- Task Type: CAUSAL_LM
Training Hyperparameters
- Training regime: fp16 mixed precision
- Epochs: 1 (default)
- Batch Size: 2 per device (default)
- Gradient Accumulation Steps: 8
- Learning Rate: 1e-6 (default)
- LR Scheduler: Cosine
- Warmup Ratio: 0.1
- Weight Decay: 0.05
- Max Sequence Length: 512 (default)
- Optimizer: AdamW (Transformers standard)
Loss Calculation Method
- Assistant-Only Loss: Only Assistant response parts are trained, User input parts are masked (-100)
- Padding Mask: Padding parts are also excluded from training
Evaluation and Saving Settings
- Evaluation Strategy: steps
- Eval Steps: 50
- Save Strategy: steps
- Save Steps: 100
- Save Total Limit: 2
- Logging Steps: 10
MLflow Integration
- Automatically logs training parameters, metrics, and models
- Experiment name: qwen3-sft-grpo (default)
Evaluation
Testing Data, Factors & Metrics
Testing Data
The model uses a validation split (5% by default) from the combined training datasets for evaluation during training. No separate held-out test set is currently defined.
Factors
Evaluation considers:
- Loss convergence across training steps
- Performance on validation set samples
- Assistant response generation quality
Metrics
- Training Loss: Cross-entropy loss on Assistant-only tokens
- Validation Loss: Evaluated every 50 steps to monitor overfitting
- Perplexity: Derived from validation loss as a measure of prediction confidence
Results
Results vary based on hyperparameters and training duration. With default settings (1 epoch, lr=1e-6):
- Training converges within the single epoch
- Validation loss typically stabilizes after initial warmup phase
- Model demonstrates improved structured data understanding compared to base model
Summary
The fine-tuned model shows enhanced capabilities in:
- Structured data generation and parsing
- Chain-of-Thought reasoning patterns
- Task-specific response formatting
Model Examination
The model architecture consists of:
- Base Model: Qwen3-4B-Instruct-2507 with 4B parameters
- LoRA Adapters: Low-rank matrices (rank 64) applied to attention and MLP layers
- Quantization: 4-bit NF4 quantization reduces memory footprint while maintaining performance
- Training Focus: Assistant-only loss ensures the model learns to generate appropriate responses without overfitting to user inputs
Key design decisions:
- Using multiple related datasets improves generalization across structured data tasks
- 512 token limit balances training efficiency with practical use cases
- FP16 precision optimized for T4 GPU availability and cost-effectiveness
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
- Hardware Type: [More Information Needed]
- Hours used: [More Information Needed]
- Cloud Provider: [More Information Needed]
- Compute Region: [More Information Needed]
- Carbon Emitted: [More Information Needed]
Technical Specifications
Model Architecture and Objective
- Base Architecture: Qwen3-4B-Instruct-2507
- Fine-tuning Method: LoRA (Low-Rank Adaptation)
- Quantization: 4-bit NF4 with double quantization
- Training Objective: Causal Language Modeling with Assistant-Only Loss
Compute Infrastructure
Hardware
- GPU: NVIDIA T4 (recommended)
- Precision: FP16 mixed precision training
Software
- Framework: Transformers, PEFT, TRL
- Quantization: BitsAndBytes
- Experiment Tracking: MLflow
- Key Dependencies:
- transformers
- peft
- trl
- bitsandbytes
- mlflow
- datasets
- torch
Citation [optional]
BibTeX:
[More Information Needed]
APA:
[More Information Needed]
Glossary [optional]
[More Information Needed]
More Information [optional]
[More Information Needed]
Model Card Authors [optional]
[More Information Needed]
Model Card Contact
[More Information Needed]
Framework versions
- PEFT 0.18.1
- Downloads last month
- 14
Model tree for hitsuji0251/260207
Base model
Qwen/Qwen3-4B-Instruct-2507