TinyLM-5M

Tiny 5.23M-parameter causal language model trained on HuggingFaceFW/fineweb-edu / sample-100BT.

Architecture:

  • 6 transformer blocks
  • RMSNorm
  • RoPE
  • SwiGLU
  • SDPA grouped-query attention
  • tied token/lm-head embedding
  • one all-to-all depth-attention residual communicator over layer states

Training snapshot:

  • steps: 50000
  • context length: 512
  • batch size: 512
  • eval dataset: Salesforce/wikitext / wikitext-103-raw-v1 / validation
  • eval stride: 256
  • eval tokens: 365,255
  • eval UTF-8 bytes: 1,151,766
  • WikiText validation loss: 3.2929
  • WikiText validation perplexity: 26.9218
  • WikiText validation BPB: 1.5066
  • tokenizer: AxiomicLabs/GPT-S-5M

This is custom PyTorch code with a Transformers remote-code wrapper. Use trust_remote_code=True for AutoModelForCausalLM, or load the included hyper.py directly as shown below.

Transformers Loading

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

REPO_ID = "User01110/TinyLM-5M"
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
DTYPE = torch.float16 if DEVICE == "cuda" else torch.float32

tokenizer = AutoTokenizer.from_pretrained(REPO_ID, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    REPO_ID,
    trust_remote_code=True,
    torch_dtype=DTYPE,
).to(DEVICE)
model.eval()

Inference

Install:

pip install torch transformers huggingface_hub safetensors
import importlib.util
from pathlib import Path

import torch
from huggingface_hub import snapshot_download

REPO_ID = "User01110/TinyLM-5M"
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
DTYPE = torch.float16 if DEVICE == "cuda" else torch.float32
PROMPT = "The future of AI is"
MAX_NEW_TOKENS = 128
DO_SAMPLE = True
TEMPERATURE = 0.8
TOP_K = 50
TOP_P = 0.95
REPETITION_PENALTY = 1.2
NO_REPEAT_NGRAM_SIZE = 4
SKIP_SPECIAL_TOKENS = True

repo_dir = Path(snapshot_download(REPO_ID))

spec = importlib.util.spec_from_file_location("hyper_model", repo_dir / "hyper.py")
hyper_model = importlib.util.module_from_spec(spec)
assert spec.loader is not None
spec.loader.exec_module(hyper_model)

tokenizer, model, cfg = hyper_model.load_from_hub(REPO_ID, device=DEVICE, dtype=DTYPE)
print(
    hyper_model.generate(
        model,
        tokenizer,
        PROMPT,
        max_new_tokens=MAX_NEW_TOKENS,
        do_sample=DO_SAMPLE,
        temperature=TEMPERATURE,
        top_k=TOP_K,
        top_p=TOP_P,
        repetition_penalty=REPETITION_PENALTY,
        no_repeat_ngram_size=NO_REPEAT_NGRAM_SIZE,
        skip_special_tokens=SKIP_SPECIAL_TOKENS,
    )
)
Downloads last month
629
Safetensors
Model size
6.28M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support