Exp-1 / README.md
GODELEV's picture
step 5764 | val_ppl=11.03
b9ae463 verified
|
Raw
History Blame
858 Bytes
metadata
language: en
tags:
  - causal-lm
  - gqa
  - rope
  - swiglu
license: apache-2.0

Exp-1

Small language model (9.9M parameters) trained from scratch.

Architecture

Property Value
Layers 11
Hidden size 256
Intermediate size 704
Attention heads 8 (GQA kv=2)
Max sequence length 1024
Vocab size 8192
Tied embeddings True
Total parameters 9.853M

Training

  • Tokens seen: 6,038,089,728
  • Val loss: 2.4005
  • Val PPL: 11.03

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("GODELEV/Exp-1")
model = AutoModelForCausalLM.from_pretrained("GODELEV/Exp-1")
inputs = tokenizer("Hello", return_tensors="pt")
output = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(output[0], skip_special_tokens=True))