Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,58 @@
|
|
| 1 |
---
|
| 2 |
license: llama3.1
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: llama3.1
|
| 3 |
+
datasets:
|
| 4 |
+
- agentlans/crash-course
|
| 5 |
+
base_model:
|
| 6 |
+
- agentlans/Llama3.1-SuperDeepFuse
|
| 7 |
---
|
| 8 |
+
# Llama3.1-SuperDeepFuse-CrashCourse12K
|
| 9 |
+
|
| 10 |
+
Llama3.1-SuperDeepFuse-CrashCourse12K is an 8B parameter language model based on [Llama3.1-SuperDeepFuse](https://huggingface.co/agentlans/Llama3.1-SuperDeepFuse)
|
| 11 |
+
and further fine-tuned on [agentlans/crash-course](https://huggingface.co/datasets/agentlans/crash-course).
|
| 12 |
+
|
| 13 |
+
## Model Details
|
| 14 |
+
|
| 15 |
+
- **Base Model**: Llama3.1-SuperDeepFuse (8B parameters)
|
| 16 |
+
- **Fine-tuning Dataset**: 12 000 samples from agentlans/crash-course (containing samples from 10 high-quality instruct datasets)
|
| 17 |
+
- **Model Type**: Instruction-tuned language model
|
| 18 |
+
- **Language(s)**: Multilingual
|
| 19 |
+
- **License**: Follows standard Llama 3.1 usage terms
|
| 20 |
+
|
| 21 |
+
## Training Procedure
|
| 22 |
+
|
| 23 |
+
### Fine-tuning
|
| 24 |
+
|
| 25 |
+
- **Method**: LoRA (Low-Rank Adaptation)
|
| 26 |
+
- **Optimizer**: AdamW
|
| 27 |
+
- **Learning Rate**: 5e-5
|
| 28 |
+
- **Batch Size**: 2 per device
|
| 29 |
+
- **Gradient Accumulation Steps**: 8
|
| 30 |
+
- **Training Epochs**: 1
|
| 31 |
+
- **Max Sequence Length**: 2048
|
| 32 |
+
- **LoRA Configuration**:
|
| 33 |
+
- Rank: 8
|
| 34 |
+
- Alpha: 16
|
| 35 |
+
- Dropout: 0.5
|
| 36 |
+
- Target: all layers
|
| 37 |
+
- **Quantization**: 4-bit (bitsandbytes)
|
| 38 |
+
- **Precision**: BF16
|
| 39 |
+
- **Other Techniques**: NEFTune (noise alpha: 5), RS-LoRA
|
| 40 |
+
|
| 41 |
+
## Performance and Limitations
|
| 42 |
+
|
| 43 |
+
This model potentially offers:
|
| 44 |
+
|
| 45 |
+
- Enhanced multi-task reasoning
|
| 46 |
+
- Improved performance in mathematics and coding tasks
|
| 47 |
+
- Better instruction-following abilities
|
| 48 |
+
|
| 49 |
+
However:
|
| 50 |
+
|
| 51 |
+
- Performance may be limited compared to larger model variants
|
| 52 |
+
- Can produce misleading or incorrect outputs
|
| 53 |
+
- Outputs should be independently verified for critical applications
|
| 54 |
+
|
| 55 |
+
## Additional Information
|
| 56 |
+
|
| 57 |
+
- For the original model, see [agentlans/Llama3.1-SuperDeepFuse](https://huggingface.co/agentlans/Llama3.1-SuperDeepFuse)
|
| 58 |
+
- For the base Llama 3.1 model, including training data and model architecture, refer to the original [Llama 3.1](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) model card.
|