Gemma RLAIF - a lewtun Collection

lewtun 's Collections

— Awesome RL datasets 📈 —

— Long-context post-training 🧶 —

Mistral 7B + UltraChat + Arithmo checkpoints

Gemma RLAIF

updated Mar 1, 2024

lewtun/gemma-7b-sft-full-ultrachat-v0

Text Generation • 9B • Updated Feb 29, 2024 • 5 • 1
lewtun/gemma-7b-sft-full-dolly-v3

Text Generation • 9B • Updated Feb 29, 2024 • 11
lewtun/gemma-7b-sft-full-deita-10k-v0

Text Generation • 9B • Updated Feb 29, 2024 • 9
lewtun/gemma-7b-dpo-full-ultrafeedback-v0

Text Generation • Updated Feb 29, 2024 • 4
lewtun/gemma-7b-dpo-full-orca-v0

Text Generation • 9B • Updated Feb 29, 2024 • 5
lewtun/gemma-7b-dpo-full-mix1-beta-0.1

Text Generation • 9B • Updated Feb 29, 2024 • 19
lewtun/gemma-7b-dpo-full-mix1-beta-0.1-epoch-3

Text Generation • 9B • Updated Feb 29, 2024 • 6
lewtun/gemma-7b-dpo-full-mix1-beta-0.05

Text Generation • 9B • Updated Feb 29, 2024 • 17
lewtun/gemma-7b-dpo-full-mix1-beta-0.4-epoch-3

Text Generation • 9B • Updated Feb 29, 2024 • 5