| | --- |
| | license: mit |
| | --- |
| | # DUS Forty Layer Merged Model |
| |
|
| | ## Overview |
| | The DUS Forty Layer Merged Model leverages a unique layer interlocking strategy, combining layers from the Llama-2-13B and Mistral-7B architectures. This approach optimizes computational efficiency while maintaining competitive performance across various natural language processing tasks. |
| |
|
| | ## Model Details |
| | - **Architecture**: Based on Llama-2-13B and Mistral-7B |
| | - **Layer Arrangement**: The `forty` configuration merges layers from both models, interlocking layers 0–20 with layers 12–32. |
| | - **Tokenizer**: Mistral-7B tokenizer is used for encoding and decoding. |
| |
|
| | ## Training Details |
| | - **Base Models**: |
| | - Llama-2-13B: [meta-llama/Llama-2-13b-hf](https://huggingface.co/meta-llama/Llama-2-13b-hf) |
| | - Mistral-7B: [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) |