SimpleLM
Custom decoder-only Transformer language model. Architecture is defined in
modeling_simple_lm.py (bundled in this repo) and loaded via
trust_remote_code=True.
Source checkpoint: checkpoints/lm_checkpoint_001_loss_2.pt
This model is a pre-trained only "casual chat focused" LLM that was trained from scratch on a very small dataset of conversations (found on Kaggle and mixed with OpenAssistant/oasst2). This particular one is an early checkpoint and will be revised soon. Alltogether about 10M tokens (1B+ would have been more ideal for a model this size).
Architecture
| field | value |
|---|---|
| vocab_size | 32000 |
| context_length | 512 |
| d_model | 768 |
| n_layers | 12 |
| n_heads | 8 |
| d_ff | 2048 |
| activation | gelu |
| bias | True |
| tie_word_embeddings | True |
Tokenizer source: TinyLlama/TinyLlama-1.1B-Chat-v1.0
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
repo = "etanlightstone/simple-lm-v1"
tok = AutoTokenizer.from_pretrained(repo)
model = AutoModelForCausalLM.from_pretrained(repo, trust_remote_code=True)
prompt = "Once upon a time"
ids = tok(prompt, return_tensors="pt").input_ids
out = model.generate(ids, max_new_tokens=80, do_sample=True, top_k=50, temperature=0.9)
print(tok.decode(out[0], skip_special_tokens=True))
Training settings
{
"batch_size": 10,
"batch_size_note": "per GPU when using torchrun",
"world_size": 1,
"learning_rate": 0.0003,
"weight_decay": 0.01,
"num_epochs": 5,
"max_steps": null,
"grad_clip": 1.0,
"seed": 42,
"docs_dir": "/home/etan/simple_llm/docs"
}
- Downloads last month
- 28