TinyLM-5M
Tiny 5.23M-parameter causal language model trained on
HuggingFaceFW/fineweb-edu / sample-100BT.
Architecture:
- 6 transformer blocks
- RMSNorm
- RoPE
- SwiGLU
- SDPA grouped-query attention
- tied token/lm-head embedding
- one all-to-all depth-attention residual communicator over layer states
Training snapshot:
- steps:
50000 - context length:
512 - batch size:
512 - eval dataset:
Salesforce/wikitext/wikitext-103-raw-v1/validation - eval stride:
256 - eval tokens:
365,255 - eval UTF-8 bytes:
1,151,766 - WikiText validation loss:
3.2929 - WikiText validation perplexity:
26.9218 - WikiText validation BPB:
1.5066 - tokenizer:
AxiomicLabs/GPT-S-5M
This is custom PyTorch code with a Transformers remote-code wrapper. Use
trust_remote_code=True for AutoModelForCausalLM, or load the included
hyper.py directly as shown below.
Transformers Loading
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
REPO_ID = "User01110/TinyLM-5M"
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
DTYPE = torch.float16 if DEVICE == "cuda" else torch.float32
tokenizer = AutoTokenizer.from_pretrained(REPO_ID, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
REPO_ID,
trust_remote_code=True,
torch_dtype=DTYPE,
).to(DEVICE)
model.eval()
Inference
Install:
pip install torch transformers huggingface_hub safetensors
import importlib.util
from pathlib import Path
import torch
from huggingface_hub import snapshot_download
REPO_ID = "User01110/TinyLM-5M"
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
DTYPE = torch.float16 if DEVICE == "cuda" else torch.float32
PROMPT = "The future of AI is"
MAX_NEW_TOKENS = 128
DO_SAMPLE = True
TEMPERATURE = 0.8
TOP_K = 50
TOP_P = 0.95
REPETITION_PENALTY = 1.2
NO_REPEAT_NGRAM_SIZE = 4
SKIP_SPECIAL_TOKENS = True
repo_dir = Path(snapshot_download(REPO_ID))
spec = importlib.util.spec_from_file_location("hyper_model", repo_dir / "hyper.py")
hyper_model = importlib.util.module_from_spec(spec)
assert spec.loader is not None
spec.loader.exec_module(hyper_model)
tokenizer, model, cfg = hyper_model.load_from_hub(REPO_ID, device=DEVICE, dtype=DTYPE)
print(
hyper_model.generate(
model,
tokenizer,
PROMPT,
max_new_tokens=MAX_NEW_TOKENS,
do_sample=DO_SAMPLE,
temperature=TEMPERATURE,
top_k=TOP_K,
top_p=TOP_P,
repetition_penalty=REPETITION_PENALTY,
no_repeat_ngram_size=NO_REPEAT_NGRAM_SIZE,
skip_special_tokens=SKIP_SPECIAL_TOKENS,
)
)
- Downloads last month
- 629
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support