| GPT-R [Ronin] | |
| This is an experimental model containing a parameter-wise 60/40 blend (weighted average) of the weights of ppo_hh_gpt-j and GPT-JT-6B-v1. | |
| - Intended Merge Value - | |
| As with fine-tuning, merging weights does not add information but transforms it, therefore it is important to consider trade-offs. | |
| GPT-Ronin combines ppo_hh_gpt-j and GPT-JT; both technical | |
| achievements are blended with the intent to elevate the strengths of | |
| both. Datasets of both are linked below to assist in exploratory speculation on which datasets in what quantity and configuration have | |
| the largest impact on the usefulness of a model without the expense of | |
| fine-tuning. Blend was done in FP32 and output in FP16. | |
| -Intended Use- | |
| Research purposes only, intended for responsible use. | |
| Express a task in natural language, and GPT-R will do the thing. | |
| Try telling it "Write an article about X but put Y spin on it.", | |
| "Write a five step numbered guide on how to do X.", or any other | |
| basic instructions. It does its best. | |
| Can also be used as a base to merge with conversational, | |
| story writing, or adventure themed models of the same class | |
| (GPT-J & 6b NeoX) and parameter size (6b) to experiment with | |
| the morphology of model weights based on the value added | |
| by instruct. | |
| Merge tested using KoboldAI with Nucleus Sampling Top-P set to 0.7, Temperature at 0.5, and Repetition Penalty at 1.14; extra samplers | |
| disabled. | |
| -Credits to- | |
| Core Model: | |
| https://huggingface.co/EleutherAI/gpt-j-6B | |
| Author: | |
| https://www.eleuther.ai/ | |
| Model1; 60% ppo_hh_gpt-j: | |
| https://huggingface.co/reciprocate/ppo_hh_gpt-j | |
| Author Repo: | |
| https://huggingface.co/reciprocate | |
| Related; CarperAI: | |
| https://huggingface.co/CarperAI | |
| Dataset is a variant of the Helpful Harmless assistant themed | |
| dataset and Proximal Policy Optimization, specific datasets | |
| used are unknown; listed repo datasets include: | |
| https://huggingface.co/datasets/reciprocate/summarize_eval_ilql | |
| https://huggingface.co/datasets/reciprocate/hh_eval_ilql | |
| PPO explained: | |
| https://paperswithcode.com/method/ppo | |
| Potential HH-type datasets utilized: | |
| https://huggingface.co/HuggingFaceH4 | |
| https://huggingface.co/datasets/Anthropic/hh-rlhf | |
| Model2; 40% GPT-JT-6B-V1: | |
| https://huggingface.co/togethercomputer/GPT-JT-6B-v1 | |
| Author Repo: | |
| https://huggingface.co/togethercomputer | |
| Related; BigScience: | |
| https://huggingface.co/bigscience | |
| Datasets: | |
| https://huggingface.co/datasets/the_pile | |
| https://huggingface.co/datasets/bigscience/P3 | |
| https://github.com/allenai/natural-instructions | |
| https://ai.googleblog.com/2022/05/language-models-perform-reasoning-via.html |