LlaMa-DUSFT / README.md
leftfooted's picture
Update README.md
4ea9b74 verified
---
license: apache-2.0
datasets:
- Open-Orca/OpenOrca
base_model:
- meta-llama/Llama-2-7b-hf
---
# llama-2 40 layer model
## Model Overview
LlaMa-DUSFT is a custom variant of the LLaMA-2-7B model created using the DUS (Dynamic Update Strategy) methodology. The original LLaMA-2-7B model consists of 32 layers, and this variant introduces a novel approach to optimize performance by reconfiguring and expanding the layer architecture to 40 layers.
### Key Modifications:
1. Layer Splitting:
- The original 32 layers of LLaMA-2-7B were duplicated.
- In one variant, the last 12 layers were removed.
- In another variant, the first 12 layers were removed.
2. Layer Merging:
- The two resulting 20-layer segments were combined to form a 40-layer model.
### Purpose:
This architectural modification was designed to test whether the DUS approach with an expanded layer count improves performance compared to the standard LLaMA-2 architecture.
## Training Details
### Dataset:
- The model was trained on a subset of the OpenOrca dataset, consisting of 5,000 samples.
### Training Configuration:
- Batch Size: 1
- Epochs: 3
- Optimizer: AdamW
- Learning Rate: 5e-5
- Software: Colab pro
### Preprocessing:
Data preprocessing followed the guidelines for LLaMA-2 models, ensuring tokenization and alignment were consistent with the original architecture.
## Results and Evaluation
### Performance Metrics:
- Due to the experimental nature of this model, specific evaluation metrics are currently limited.
- Initial results indicate improved adaptability in specific downstream tasks from the OpenOrca dataset.
### Observations:
- The DUS layer modification shows potential for enhancing model depth without significant degradation of performance.
- Further evaluation with larger datasets and varied tasks is required to confirm generalizability.