leftfooted
/

LlaMa-DUSFT

Model card Files Files and versions

leftfooted commited on Jan 16, 2025

Commit

4ea9b74

·

verified ·

1 Parent(s): a550434

Update README.md

Files changed (1) hide show

README.md +57 -1

README.md CHANGED Viewed

@@ -7,4 +7,60 @@ base_model:
 ---
 # llama-2 40 layer model
-기존 32 layer를 갖는 llama-2를 DUS를 이용하여 40 layer 모델로 변환하고 OpenOrca 데이터 셋 일부를 학습

 ---
 # llama-2 40 layer model
+## Model Overview
+LlaMa-DUSFT is a custom variant of the LLaMA-2-7B model created using the DUS (Dynamic Update Strategy) methodology. The original LLaMA-2-7B model consists of 32 layers, and this variant introduces a novel approach to optimize performance by reconfiguring and expanding the layer architecture to 40 layers.
+### Key Modifications:
+1. Layer Splitting:
+  - The original 32 layers of LLaMA-2-7B were duplicated.
+  - In one variant, the last 12 layers were removed.
+  - In another variant, the first 12 layers were removed.
+2. Layer Merging:
+  - The two resulting 20-layer segments were combined to form a 40-layer model.
+### Purpose:
+This architectural modification was designed to test whether the DUS approach with an expanded layer count improves performance compared to the standard LLaMA-2 architecture.
+## Training Details
+### Dataset:
+- The model was trained on a subset of the OpenOrca dataset, consisting of 5,000 samples.
+### Training Configuration:
+- Batch Size: 1
+- Epochs: 3
+- Optimizer: AdamW
+- Learning Rate: 5e-5
+- Software: Colab pro
+### Preprocessing:
+Data preprocessing followed the guidelines for LLaMA-2 models, ensuring tokenization and alignment were consistent with the original architecture.
+## Results and Evaluation
+### Performance Metrics:
+- Due to the experimental nature of this model, specific evaluation metrics are currently limited.
+- Initial results indicate improved adaptability in specific downstream tasks from the OpenOrca dataset.
+### Observations:
+- The DUS layer modification shows potential for enhancing model depth without significant degradation of performance.
+- Further evaluation with larger datasets and varied tasks is required to confirm generalizability.