ScottBiggs2 commited on
Commit
5ed5bd1
·
verified ·
1 Parent(s): c4d95b7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -19,7 +19,7 @@ LLaMA 3.1 8B fine tuned on Light R1 DPO dataset for 100 steps
19
 
20
  - **Base Model**: [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)
21
  - **Architecture**: Llama 3.1 8B Instruct
22
- - **Training**: Direct Preference Optimization (DPO)
23
  - **Task**: Text generation, instruction following, conversational AI
24
 
25
  ## Requirements
 
19
 
20
  - **Base Model**: [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)
21
  - **Architecture**: Llama 3.1 8B Instruct
22
+ - **Training**: Direct Preference Optimization (DPO) with baseline PyTorch and TRL AdamW Optimizer. For details, see: [GitHub](https://github.com/ScottBiggs2/Reinforcement-Casino)
23
  - **Task**: Text generation, instruction following, conversational AI
24
 
25
  ## Requirements