--- language: - en license: apache-2.0 tags: - gpt2 - causal-lm - text-generation - from-scratch - fineweb - undertrained library_name: transformers pipeline_tag: text-generation --- # Llara ## Introduction Llara1.1 is a 124M parameter (33M params more than llara1.0) autoregressive language model trained from scratch on English web text. It follows the GPT-2 Small architecture and is trained entirely from random initialisation — no pretrained weights, no distillation, no fine-tuning of an existing model. but it does use GPT's tokenizer (sorta) The name **Llara** is original and unrelated to LLaMA or LoRA. **Note**: The model is stil undertrained according to `The Chinchilla Laws (2022)` --- ## Improvements * Incressed context length to 512 tokens * Better and clearner training data * Able to form cohirent sentences even at 20 max tokens * Better GPT config --- ## Model Details | Property | Value | |---|---| | Architecture | GPT-2 (decoder-only transformer) | | Parameters | ~124.0M | | Context length | 512 tokens | | Embedding dim | - | | Layers | 12 | | Attention heads | 12 | | Vocabulary | 50,257 (GPT-2 BPE) | | Training data | FineWeb (HuggingFaceFW/fineweb) + Custom dataset | | Training docs | 131M tokens | | Epochs | 1.1 | | Precision | fp16 | --- ## Usage ```python from transformers import GPT2LMHeadModel, AutoTokenizer, pipeline model = GPT2LMHeadModel.from_pretrained("helloadhavan/llara1.1-100M-base") tokenizer = AutoTokenizer.from_pretrained("helloadhavan/llara1.1-100M-base") gen = pipeline("text-generation", model=model, tokenizer=tokenizer) output = gen( "Once upon a time", max_new_tokens=20, do_sample=True, temperature=0.8, top_p=0.95, repetition_penalty=1.1, ) print(output[0]["generated_text"]) ``` --- ## Limitations - Llara is trained on English web text only and performs poorly on other languages. - Like all autoregressive LMs trained on web data, it may reproduce biases, factual errors, or inappropriate content present in the training corpus. - It is a research model trained from scratch and is not instruction-tuned or aligned — it should not be used in production or user-facing applications without further fine-tuning and safety work. - At 124M parameters and 2M training documents, it is significantly smaller and less trained than models like GPT-2 (which saw 40GB of text). Outputs may be incoherent on complex prompts. --- ## Intended Use Llara is intended for: - Research and experimentation with small language models - Learning how GPT-style models are trained from scratch - A base for fine-tuning on downstream tasks --- ## Training Framework Trained using [Hugging Face Transformers](https://github.com/huggingface/transformers) `Trainer` on a single GPU. --- ## License Apache 2.0
Note: i am a AI hobbyist, not an AI engineer