MaterialsAnalyst-AI-7B / Training /Training_Documentation.txt
Raymond-dev-546730's picture
Update Training/Training_Documentation.txt
11a4806 verified
MaterialsAnalyst-AI-7B Training Documentation
================================================
Model Training Details
---------------------
Base Model: Qwen 2.5 Instruct 7B
Fine-tuning Method: LoRA (Low-Rank Adaptation)
Training Infrastructure: Single NVIDIA A100 SXM4 GPU
Training Duration: Approximately 5.4 hours
Training Dataset: Custom curated dataset for materials analysis
Dataset Specifications
---------------------
Total Token Count: 6,292,692
Total Sample Count: 6,000
Average Tokens/Sample: 1048.78
Max Token Count: 1,289
Min Token Count: 922
Tokens Counted Using: tiktoken (cl100k_base encoding)
Dataset Creation: Generated using DeepSeekV3 API
Training Configuration
---------------------
LoRA Parameters:
- Rank: 32
- Alpha: 64
- Dropout: 0.1
- Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, lm_head
Training Hyperparameters:
- Learning Rate: 5e-5
- Batch Size: 4
- Gradient Accumulation: 5
- Effective Batch Size: 20
- Max Sequence Length: 2048
- Epochs: 3
- Warmup Ratio: 0.01
- Weight Decay: 0.01
- Max Grad Norm: 1.0
- LR Scheduler: Cosine
Hardware & Environment
---------------------
GPU: NVIDIA A100 SXM4 (40GB)
Operating System: Ubuntu
CUDA Version: 11.8
PyTorch Version: 2.7.0
Compute Capability: 8.0
Optimization: FP16, Gradient Checkpointing
Training Performance
---------------------
Training Runtime: 5.37 hours (19,348 seconds)
Train Samples/Second: 0.884
Train Steps/Second: 0.044
Training Loss (Final): 0.170
Validation Loss (Final): 0.136
Total Training Steps: 855