samhog's picture
Create README.md
c606b3d

Psychology-Alpaca-RM

  • PEFT adapter layers for a reward model based on decapoda-research/llama-7b-hf.
  • Trained with a small subset (110 data points) of samhog/cgpt-pairs with 10K prompts, each with two answers (one 'good', one 'bad')