SimpleLM

Custom decoder-only Transformer language model. Architecture is defined in modeling_simple_lm.py (bundled in this repo) and loaded via trust_remote_code=True.

Source checkpoint: checkpoints/lm_checkpoint_001_loss_2.pt

This model is a pre-trained only "casual chat focused" LLM that was trained from scratch on a very small dataset of conversations (found on Kaggle and mixed with OpenAssistant/oasst2). This particular one is an early checkpoint and will be revised soon. Alltogether about 10M tokens (1B+ would have been more ideal for a model this size).

Architecture

field value
vocab_size 32000
context_length 512
d_model 768
n_layers 12
n_heads 8
d_ff 2048
activation gelu
bias True
tie_word_embeddings True

Tokenizer source: TinyLlama/TinyLlama-1.1B-Chat-v1.0

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

repo = "etanlightstone/simple-lm-v1"
tok   = AutoTokenizer.from_pretrained(repo)
model = AutoModelForCausalLM.from_pretrained(repo, trust_remote_code=True)

prompt = "Once upon a time"
ids = tok(prompt, return_tensors="pt").input_ids
out = model.generate(ids, max_new_tokens=80, do_sample=True, top_k=50, temperature=0.9)
print(tok.decode(out[0], skip_special_tokens=True))

Training settings

{
  "batch_size": 10,
  "batch_size_note": "per GPU when using torchrun",
  "world_size": 1,
  "learning_rate": 0.0003,
  "weight_decay": 0.01,
  "num_epochs": 5,
  "max_steps": null,
  "grad_clip": 1.0,
  "seed": 42,
  "docs_dir": "/home/etan/simple_llm/docs"
}
Downloads last month
28
Safetensors
Model size
91.1M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support