Create README.md
c606b3d
Psychology-Alpaca-RM
- PEFT adapter layers for a reward model based on
decapoda-research/llama-7b-hf.
- Trained with a small subset (110 data points) of
samhog/cgpt-pairs with 10K prompts, each with two answers (one 'good', one 'bad')