GPT-Usenet-2

An 81-million parameter LLM using GPT-2 encodings. Trained using 10GB of USENET posts along with over 1 GB of miscellaneous BBS posts, digitized books, and text documents. Supervised fine-tuning should be performed before use.

Purpose of GPT-Usenet-2

LLMs are all currently focused on becoming larger and larger, able to do more and more. However, this just makes them jack of all trades, master of none. GPT-Usenet takes a different approach. Instead of trying to do everything perfectly, GPT-Usenet offers a digital stem cell, which can then be finetuned into a single, specialized role and run in parallel with copies of itself.

Technical Information


Layers	10
Heads	10
Embeddings	640
Context Window	1024 tokens
Tokenizer	GPT-2 BPE

Training Information


Training Loss	around 2.0254
Validation Loss	around 1.9795
Device	Google Colab L4, Google Colab A100
Training Time	16 Hours

Example Syntax


From:	The username who sent this message
Sender:	The group that username belongs to
Newsgroups:	The broad subject field of the email.
Subject:	The subject of the message.
Write the SFT response here. First, Prefix the first sentence with > to signify that it is a Reasoning sentence.
--	The stop tokens

From:user
Sender:usergroup
Newsgroups:motorskills.papercraft
Subject:Paper airplanes

>Provide detailed steps on building a paper airplane.

Instructions: ...

--

For finetuning, your data should be in the .mbox format.

Downloads last month: -; Downloads are not tracked for this model. How to track