GPT-Usenet-2
An 81-million parameter LLM using GPT-2 encodings. Trained using 10GB of USENET posts along with over 1 GB of miscellaneous BBS posts, digitized books, and text documents. Supervised fine-tuning should be performed before use.
Purpose of GPT-Usenet-2
LLMs are all currently focused on becoming larger and larger, able to do more and more. However, this just makes them jack of all trades, master of none. GPT-Usenet takes a different approach. Instead of trying to do everything perfectly, GPT-Usenet offers a digital stem cell, which can then be finetuned into a single, specialized role and run in parallel with copies of itself.
Technical Information
| Layers | 10 |
| Heads | 10 |
| Embeddings | 640 |
| Context Window | 1024 tokens |
| Tokenizer | GPT-2 BPE |
Training Information
| Training Loss | around 2.0254 |
| Validation Loss | around 1.9795 |
| Device | Google Colab L4, Google Colab A100 |
| Training Time | 16 Hours |
Example Syntax
| From: | The username who sent this message |
| Sender: | The group that username belongs to |
| Newsgroups: | The broad subject field of the email. |
| Subject: | The subject of the message. |
| Write the SFT response here. First, Prefix the first sentence with > to signify that it is a Reasoning sentence. | |
| -- | The stop tokens |
From:user
Sender:usergroup
Newsgroups:motorskills.papercraft
Subject:Paper airplanes
>Provide detailed steps on building a paper airplane.
Instructions: ...
--
For finetuning, your data should be in the .mbox format.
