π§ JAT-GPT: Just Another Tiny GPT
Welcome to JAT-GPT, the world's most underwhelming large language model β clocking in at a mighty 17.9 million parameters (yes, million, not billion β stop laughing).
π¦ Model Details
- Model type: GPT2-based decoder-only transformer
- Architecture: GPT-2
- Library: Hugging Face π€ Transformers
- Parameters: 17.9 million (size isn't everything... right?)
- Training Objective: Learn to predict the next word β and sometimes even the right one!
- Pretrained on: A secret* dataset (*"secret" means the dataset was just some text I could find lying around)
- Training Purpose: Solely educational. Also for flexing on friends who havenβt trained a language model from scratch.
π Capabilities
- Can generate small sentences
- "Please lower your expectations."
- Can hallucinate confidently, but in a very short and polite way.
- Can generate random words after few tokens.
π Limitations
- Not very smart.
- Only Pretrained.
- Understands context... if it fits within few tokens.
- Cannot replace ChatGPT. (But look how cute it is!)
π€· Why Train This?
"Because I could." β :-)
- To understand the internals of language modeling.
- To cry less when training real models later.
- To appreciate just how powerful modern LLMs are by comparison.
π οΈ Usage
from transformers import GPT2Tokenizer, GPT2LMHeadModel
tokenizer = GPT2Tokenizer.from_pretrained("itsme-nishanth/JAT-GPT")
model = GPT2LMHeadModel.from_pretrained("itsme-nishanth/JAT-GPT")
input_ids = tokenizer.encode("Once upon a time", return_tensors="pt")
output = model.generate(input_ids, max_length=20, do_sample=True)
print(tokenizer.decode(output[0]))
- Downloads last month
- 64