---
license: apache-2.0
library_name: pytorch
tags:
- language-model
- causal-lm
- tiny
- custom-code
- fineweb-edu
- depth-communication
---

# TinyLM-5M

Tiny 5.23M-parameter causal language model trained on
`HuggingFaceFW/fineweb-edu` / `sample-100BT`.

Architecture:

- 6 transformer blocks
- RMSNorm
- RoPE
- SwiGLU
- SDPA grouped-query attention
- tied token/lm-head embedding
- one all-to-all depth-attention residual communicator over layer states

Training snapshot:

- steps: `50000`
- context length: `512`
- batch size: `512`
- eval dataset: `Salesforce/wikitext` / `wikitext-103-raw-v1` / `validation`
- eval stride: `256`
- eval tokens: `365,255`
- eval UTF-8 bytes: `1,151,766`
- WikiText validation loss: `3.2929`
- WikiText validation perplexity: `26.9218`
- WikiText validation BPB: `1.5066`
- tokenizer: `AxiomicLabs/GPT-S-5M`

This is custom PyTorch code with a Transformers remote-code wrapper. Use
`trust_remote_code=True` for `AutoModelForCausalLM`, or load the included
`hyper.py` directly as shown below.

## Transformers Loading

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

REPO_ID = "User01110/TinyLM-5M"
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
DTYPE = torch.float16 if DEVICE == "cuda" else torch.float32

tokenizer = AutoTokenizer.from_pretrained(REPO_ID, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    REPO_ID,
    trust_remote_code=True,
    torch_dtype=DTYPE,
).to(DEVICE)
model.eval()
```

## Inference

Install:

```bash
pip install torch transformers huggingface_hub safetensors
```

```python
import importlib.util
from pathlib import Path

import torch
from huggingface_hub import snapshot_download

REPO_ID = "User01110/TinyLM-5M"
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
DTYPE = torch.float16 if DEVICE == "cuda" else torch.float32
PROMPT = "The future of AI is"
MAX_NEW_TOKENS = 128
DO_SAMPLE = True
TEMPERATURE = 0.8
TOP_K = 50
TOP_P = 0.95
REPETITION_PENALTY = 1.2
NO_REPEAT_NGRAM_SIZE = 4
SKIP_SPECIAL_TOKENS = True

repo_dir = Path(snapshot_download(REPO_ID))

spec = importlib.util.spec_from_file_location("hyper_model", repo_dir / "hyper.py")
hyper_model = importlib.util.module_from_spec(spec)
assert spec.loader is not None
spec.loader.exec_module(hyper_model)

tokenizer, model, cfg = hyper_model.load_from_hub(REPO_ID, device=DEVICE, dtype=DTYPE)
print(
    hyper_model.generate(
        model,
        tokenizer,
        PROMPT,
        max_new_tokens=MAX_NEW_TOKENS,
        do_sample=DO_SAMPLE,
        temperature=TEMPERATURE,
        top_k=TOP_K,
        top_p=TOP_P,
        repetition_penalty=REPETITION_PENALTY,
        no_repeat_ngram_size=NO_REPEAT_NGRAM_SIZE,
        skip_special_tokens=SKIP_SPECIAL_TOKENS,
    )
)
```