TinyLM-5M

Tiny 5.23M-parameter causal language model trained on HuggingFaceFW/fineweb-edu / sample-100BT.

Architecture:

6 transformer blocks
RMSNorm
RoPE
SwiGLU
SDPA grouped-query attention
tied token/lm-head embedding
one all-to-all depth-attention residual communicator over layer states

Training snapshot:

steps: 50000
context length: 512
batch size: 512
eval dataset: Salesforce/wikitext / wikitext-103-raw-v1 / validation
eval stride: 256
eval tokens: 365,255
eval UTF-8 bytes: 1,151,766
WikiText validation loss: 3.2929
WikiText validation perplexity: 26.9218
WikiText validation BPB: 1.5066
tokenizer: AxiomicLabs/GPT-S-5M

This is custom PyTorch code with a Transformers remote-code wrapper. Use trust_remote_code=True for AutoModelForCausalLM, or load the included hyper.py directly as shown below.

Transformers Loading

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

REPO_ID = "User01110/TinyLM-5M"
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
DTYPE = torch.float16 if DEVICE == "cuda" else torch.float32

tokenizer = AutoTokenizer.from_pretrained(REPO_ID, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    REPO_ID,
    trust_remote_code=True,
    torch_dtype=DTYPE,
).to(DEVICE)
model.eval()

Inference

Install:

pip install torch transformers huggingface_hub safetensors

import importlib.util
from pathlib import Path

import torch
from huggingface_hub import snapshot_download

REPO_ID = "User01110/TinyLM-5M"
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
DTYPE = torch.float16 if DEVICE == "cuda" else torch.float32
PROMPT = "The future of AI is"
MAX_NEW_TOKENS = 128
DO_SAMPLE = True
TEMPERATURE = 0.8
TOP_K = 50
TOP_P = 0.95
REPETITION_PENALTY = 1.2
NO_REPEAT_NGRAM_SIZE = 4
SKIP_SPECIAL_TOKENS = True

repo_dir = Path(snapshot_download(REPO_ID))

spec = importlib.util.spec_from_file_location("hyper_model", repo_dir / "hyper.py")
hyper_model = importlib.util.module_from_spec(spec)
assert spec.loader is not None
spec.loader.exec_module(hyper_model)

tokenizer, model, cfg = hyper_model.load_from_hub(REPO_ID, device=DEVICE, dtype=DTYPE)
print(
    hyper_model.generate(
        model,
        tokenizer,
        PROMPT,
        max_new_tokens=MAX_NEW_TOKENS,
        do_sample=DO_SAMPLE,
        temperature=TEMPERATURE,
        top_k=TOP_K,
        top_p=TOP_P,
        repetition_penalty=REPETITION_PENALTY,
        no_repeat_ngram_size=NO_REPEAT_NGRAM_SIZE,
        skip_special_tokens=SKIP_SPECIAL_TOKENS,
    )
)

Downloads last month: 629

Safetensors

Model size

6.28M params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support