YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

payelb/UltraFeedback_openbmb_Llama-3.2-1B_aligned_with_semantic_MARS_roberta_RM

Base model: meta-llama/Llama-3.2-1B-Instruct

Alignment dataset: openbmb/UltraFeedback

Reward model: payelb/UltraFeedback_openbmb_roberta-base_1k_fixed_MARS_semantic_refined

Method: PPO alignment with LoRA adapters.

Reward model type: RoBERTa-base reward model.

Training details:

  • TOTAL_PPO_STEPS: 250
  • PPO_EPOCHS: 2
  • LR: 1e-06
  • Batch size: 16
  • Mini-batch size: 4
  • Gradient accumulation: 4
  • KL control enabled
  • INIT_KL_COEF: 0.1
  • TARGET_KL: 3.0
  • Reward normalization and clipping enabled
  • REWARD_CLIP: 3.0
  • LoRA enabled
  • Generation kwargs: min_length=-1, top_k=0.0, top_p=1.0, eos_token_id=None
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support