# TinyLLM (custom) This repo uses a custom TinyLLM architecture. Load with: ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained( "REPO_ID", trust_remote_code=True, ) tokenizer = AutoTokenizer.from_pretrained("REPO_ID", trust_remote_code=True) ``` Quick test: ```python from transformers import AutoModelForCausalLM, AutoTokenizer repo_id = "REPO_ID" tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained(repo_id, trust_remote_code=True) prompt = "Once upon a time," inputs = tokenizer(prompt, return_tensors="pt") inputs.pop("attention_mask", None) # tinyllm uses causal attention internally out = model.generate( **inputs, max_new_tokens=64, temperature=1.0, top_k=50, top_p=0.9, do_sample=True, pad_token_id=tokenizer.eos_token_id, eos_token_id=tokenizer.eos_token_id, ) print(tokenizer.decode(out[0], skip_special_tokens=True)) ```