digitous
/

GPT-R

+GPT-R [Ronin]
+This is an experimental model containing a parameter-wise 60/40 blend (weighted average) of the weights of ppo_hh_gpt-j and GPT-JT-6B-v1.
+- Intended Merge Value -
+As with fine-tuning, merging weights does not add information but transforms it, therefore it is important to consider trade-offs.
+GPT-Ronin combines ppo_hh_gpt-j and GPT-JT; both technical
+achievements are blended with the intent to elevate the strengths of
+both. Datasets of both are linked below to assist in exploratory speculation on which datasets in what quantity and configuration have
+the largest impact on the usefulness of a model without the expense of
+fine-tuning. Blend was done in FP32 and output in FP16.
+-Intended Use-
+Research purposes only, intended for responsible use.
+Express a task in natural language, and GPT-R will do the thing.
+Try telling it "Write an article about X but put Y spin on it.",
+"Write a five step numbered guide on how to do X.", or any other
+basic instructions. It does its best.
+Can also be used as a base to merge with conversational,
+story writing, or adventure themed models of the same class
+(GPT-J & 6b NeoX) and parameter size (6b) to experiment with
+the morphology of model weights based on the value added
+by instruct.
+Merge tested using KoboldAI with Nucleus Sampling Top-P set to 0.7, Temperature at 0.5, and Repetition Penalty at 1.14; extra samplers
+disabled.
+-Credits to-
+Core Model:
+https://huggingface.co/EleutherAI/gpt-j-6B
+Author:
+https://www.eleuther.ai/
+Model1; 60% ppo_hh_gpt-j:
+https://huggingface.co/reciprocate/ppo_hh_gpt-j
+Author Repo:
+https://huggingface.co/reciprocate
+Related; CarperAI:
+https://huggingface.co/CarperAI
+Dataset is a variant of the Helpful Harmless assistant themed
+dataset and Proximal Policy Optimization, specific datasets
+used are unknown; listed repo datasets include:
+https://huggingface.co/datasets/reciprocate/summarize_eval_ilql
+https://huggingface.co/datasets/reciprocate/hh_eval_ilql
+PPO explained:
+https://paperswithcode.com/method/ppo
+Potential HH-type datasets utilized:
+https://huggingface.co/HuggingFaceH4
+https://huggingface.co/datasets/Anthropic/hh-rlhf
+Model2; 40% GPT-JT-6B-V1:
+https://huggingface.co/togethercomputer/GPT-JT-6B-v1
+Author Repo:
+https://huggingface.co/togethercomputer
+Related; BigScience:
+https://huggingface.co/bigscience
+Datasets:
+https://huggingface.co/datasets/the_pile
+https://huggingface.co/datasets/bigscience/P3
+https://github.com/allenai/natural-instructions
+https://ai.googleblog.com/2022/05/language-models-perform-reasoning-via.html