Psychology-Alpaca-RM

PEFT adapter layers for a reward model based on decapoda-research/llama-7b-hf.
Trained with a small subset (110 data points) of samhog/cgpt-pairs with 10K prompts, each with two answers (one 'good', one 'bad')