--- license: apache-2.0 library_name: pytorch tags: - language-model - causal-lm - tiny - custom-code - fineweb-edu - depth-communication --- # TinyLM-5M Tiny 5.23M-parameter causal language model trained on `HuggingFaceFW/fineweb-edu` / `sample-100BT`. Architecture: - 6 transformer blocks - RMSNorm - RoPE - SwiGLU - SDPA grouped-query attention - tied token/lm-head embedding - one all-to-all depth-attention residual communicator over layer states Training snapshot: - steps: `50000` - context length: `512` - batch size: `512` - eval dataset: `Salesforce/wikitext` / `wikitext-103-raw-v1` / `validation` - eval stride: `256` - eval tokens: `365,255` - eval UTF-8 bytes: `1,151,766` - WikiText validation loss: `3.2929` - WikiText validation perplexity: `26.9218` - WikiText validation BPB: `1.5066` - tokenizer: `AxiomicLabs/GPT-S-5M` This is custom PyTorch code with a Transformers remote-code wrapper. Use `trust_remote_code=True` for `AutoModelForCausalLM`, or load the included `hyper.py` directly as shown below. ## Transformers Loading ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer REPO_ID = "User01110/TinyLM-5M" DEVICE = "cuda" if torch.cuda.is_available() else "cpu" DTYPE = torch.float16 if DEVICE == "cuda" else torch.float32 tokenizer = AutoTokenizer.from_pretrained(REPO_ID, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( REPO_ID, trust_remote_code=True, torch_dtype=DTYPE, ).to(DEVICE) model.eval() ``` ## Inference Install: ```bash pip install torch transformers huggingface_hub safetensors ``` ```python import importlib.util from pathlib import Path import torch from huggingface_hub import snapshot_download REPO_ID = "User01110/TinyLM-5M" DEVICE = "cuda" if torch.cuda.is_available() else "cpu" DTYPE = torch.float16 if DEVICE == "cuda" else torch.float32 PROMPT = "The future of AI is" MAX_NEW_TOKENS = 128 DO_SAMPLE = True TEMPERATURE = 0.8 TOP_K = 50 TOP_P = 0.95 REPETITION_PENALTY = 1.2 NO_REPEAT_NGRAM_SIZE = 4 SKIP_SPECIAL_TOKENS = True repo_dir = Path(snapshot_download(REPO_ID)) spec = importlib.util.spec_from_file_location("hyper_model", repo_dir / "hyper.py") hyper_model = importlib.util.module_from_spec(spec) assert spec.loader is not None spec.loader.exec_module(hyper_model) tokenizer, model, cfg = hyper_model.load_from_hub(REPO_ID, device=DEVICE, dtype=DTYPE) print( hyper_model.generate( model, tokenizer, PROMPT, max_new_tokens=MAX_NEW_TOKENS, do_sample=DO_SAMPLE, temperature=TEMPERATURE, top_k=TOP_K, top_p=TOP_P, repetition_penalty=REPETITION_PENALTY, no_repeat_ngram_size=NO_REPEAT_NGRAM_SIZE, skip_special_tokens=SKIP_SPECIAL_TOKENS, ) ) ```