Update README.md
Browse files
README.md
CHANGED
|
@@ -19,7 +19,7 @@ LLaMA 3.1 8B fine tuned on Light R1 DPO dataset for 100 steps
|
|
| 19 |
|
| 20 |
- **Base Model**: [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)
|
| 21 |
- **Architecture**: Llama 3.1 8B Instruct
|
| 22 |
-
- **Training**: Direct Preference Optimization (DPO)
|
| 23 |
- **Task**: Text generation, instruction following, conversational AI
|
| 24 |
|
| 25 |
## Requirements
|
|
|
|
| 19 |
|
| 20 |
- **Base Model**: [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)
|
| 21 |
- **Architecture**: Llama 3.1 8B Instruct
|
| 22 |
+
- **Training**: Direct Preference Optimization (DPO) with baseline PyTorch and TRL AdamW Optimizer. For details, see: [GitHub](https://github.com/ScottBiggs2/Reinforcement-Casino)
|
| 23 |
- **Task**: Text generation, instruction following, conversational AI
|
| 24 |
|
| 25 |
## Requirements
|