Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
nabeelshan
/
rlhf-gpt2-pipeline
like
0
Text Generation
Transformers
Safetensors
Dahoas/synthetic-instruct-gptj-pairwise
English
gpt2
rlhf
reinforcement-learning
ppo
reward-model
instruction-tuning
Eval Results (legacy)
License:
apache-2.0
Model card
Files
Files and versions
xet
Community
Deploy
Use this model
main
rlhf-gpt2-pipeline
1.01 GB
Ctrl+K
Ctrl+K
2 contributors
History:
9 commits
nabeelshan
Update README.md
341dcc6
verified
7 months ago
ppo_aligned_final
Add tokenizer files
7 months ago
reward_model_final
Change RM Adapter extension
7 months ago
sft_full_final
Added SFT, Reward Model, and PPO-Aligned Model
7 months ago
.gitattributes
Safe
1.52 kB
initial commit
7 months ago
README.md
6.3 kB
Update README.md
7 months ago