YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
payelb/UltraFeedback_openbmb_Llama-3.2-1B_aligned_with_semantic_MARS_roberta_RM
Base model: meta-llama/Llama-3.2-1B-Instruct
Alignment dataset: openbmb/UltraFeedback
Reward model: payelb/UltraFeedback_openbmb_roberta-base_1k_fixed_MARS_semantic_refined
Method: PPO alignment with LoRA adapters.
Reward model type: RoBERTa-base reward model.
Training details:
- TOTAL_PPO_STEPS: 250
- PPO_EPOCHS: 2
- LR: 1e-06
- Batch size: 16
- Mini-batch size: 4
- Gradient accumulation: 4
- KL control enabled
- INIT_KL_COEF: 0.1
- TARGET_KL: 3.0
- Reward normalization and clipping enabled
- REWARD_CLIP: 3.0
- LoRA enabled
- Generation kwargs: min_length=-1, top_k=0.0, top_p=1.0, eos_token_id=None
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support