Update README.md
Browse files
README.md
CHANGED
|
@@ -9,12 +9,12 @@ pipeline_tag: text-generation
|
|
| 9 |
|
| 10 |

|
| 11 |
|
| 12 |
-
## GPT-Usenet
|
| 13 |
An 81-million parameter LLM using GPT-2 encodings.
|
| 14 |
-
Trained using 10GB of USENET posts
|
| 15 |
Supervised fine-tuning should be performed before use.
|
| 16 |
|
| 17 |
-
## Purpose of GPT-Usenet
|
| 18 |
LLMs are all currently focused on becoming larger and larger, able to do more and more. However, this just makes them jack of all trades, master of none. GPT-Usenet takes a different approach. Instead of trying to do everything perfectly, GPT-Usenet offers a digital stem cell, which can then be finetuned into a single, specialized role and run in parallel with copies of itself.
|
| 19 |
|
| 20 |
## Technical Information
|
|
@@ -23,19 +23,9 @@ LLMs are all currently focused on becoming larger and larger, able to do more an
|
|
| 23 |
|Layers |10|
|
| 24 |
|Heads |10|
|
| 25 |
|Embeddings |640|
|
| 26 |
-
|Context Window |
|
| 27 |
|Tokenizer |GPT-2 BPE|
|
| 28 |
|
| 29 |
-
|
| 30 |
-
## Training Information
|
| 31 |
-
| | |
|
| 32 |
-
|---------------------------------|----:|
|
| 33 |
-
|Training Loss |2.3256|
|
| 34 |
-
|Validation Loss |2.3651|
|
| 35 |
-
|Device |Google Colab L4|
|
| 36 |
-
|Training Time |16 Hours|
|
| 37 |
-
|
| 38 |
-
|
| 39 |
## Example Syntax
|
| 40 |
|
| 41 |
| | |
|
|
|
|
| 9 |
|
| 10 |

|
| 11 |
|
| 12 |
+
## GPT-Usenet-3
|
| 13 |
An 81-million parameter LLM using GPT-2 encodings.
|
| 14 |
+
Trained using 10GB of USENET posts, over 1 GB of miscellaneous BBS posts, digitized books, and text documents, and 1.1 GB of multilingual text.
|
| 15 |
Supervised fine-tuning should be performed before use.
|
| 16 |
|
| 17 |
+
## Purpose of GPT-Usenet-3
|
| 18 |
LLMs are all currently focused on becoming larger and larger, able to do more and more. However, this just makes them jack of all trades, master of none. GPT-Usenet takes a different approach. Instead of trying to do everything perfectly, GPT-Usenet offers a digital stem cell, which can then be finetuned into a single, specialized role and run in parallel with copies of itself.
|
| 19 |
|
| 20 |
## Technical Information
|
|
|
|
| 23 |
|Layers |10|
|
| 24 |
|Heads |10|
|
| 25 |
|Embeddings |640|
|
| 26 |
+
|Context Window |8192 tokens|
|
| 27 |
|Tokenizer |GPT-2 BPE|
|
| 28 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 29 |
## Example Syntax
|
| 30 |
|
| 31 |
| | |
|