---
language:
- en
license: apache-2.0
tags:
- gpt2
- causal-lm
- text-generation
- from-scratch
- fineweb
- undertrained
library_name: transformers
pipeline_tag: text-generation
---
# Llara
## Introduction
Llara1.1 is a 124M parameter (33M params more than llara1.0) autoregressive language model trained from scratch on English web text. It follows the GPT-2 Small architecture and is trained entirely from random initialisation — no pretrained weights, no distillation, no fine-tuning of an existing model.
but it does use GPT's tokenizer (sorta)
The name **Llara** is original and unrelated to LLaMA or LoRA.
**Note**: The model is stil undertrained according to `The Chinchilla Laws (2022)`
---
## Improvements
* Incressed context length to 512 tokens
* Better and clearner training data
* Able to form cohirent sentences even at 20 max tokens
* Better GPT config
---
## Model Details
| Property | Value |
|---|---|
| Architecture | GPT-2 (decoder-only transformer) |
| Parameters | ~124.0M |
| Context length | 512 tokens |
| Embedding dim | - |
| Layers | 12 |
| Attention heads | 12 |
| Vocabulary | 50,257 (GPT-2 BPE) |
| Training data | FineWeb (HuggingFaceFW/fineweb) + Custom dataset |
| Training docs | 131M tokens |
| Epochs | 1.1 |
| Precision | fp16 |
---
## Usage
```python
from transformers import GPT2LMHeadModel, AutoTokenizer, pipeline
model = GPT2LMHeadModel.from_pretrained("helloadhavan/llara1.1-100M-base")
tokenizer = AutoTokenizer.from_pretrained("helloadhavan/llara1.1-100M-base")
gen = pipeline("text-generation", model=model, tokenizer=tokenizer)
output = gen(
"Once upon a time",
max_new_tokens=20,
do_sample=True,
temperature=0.8,
top_p=0.95,
repetition_penalty=1.1,
)
print(output[0]["generated_text"])
```
---
## Limitations
- Llara is trained on English web text only and performs poorly on other languages.
- Like all autoregressive LMs trained on web data, it may reproduce biases, factual errors, or inappropriate content present in the training corpus.
- It is a research model trained from scratch and is not instruction-tuned or aligned — it should not be used in production or user-facing applications without further fine-tuning and safety work.
- At 124M parameters and 2M training documents, it is significantly smaller and less trained than models like GPT-2 (which saw 40GB of text). Outputs may be incoherent on complex prompts.
---
## Intended Use
Llara is intended for:
- Research and experimentation with small language models
- Learning how GPT-style models are trained from scratch
- A base for fine-tuning on downstream tasks
---
## Training Framework
Trained using [Hugging Face Transformers](https://github.com/huggingface/transformers) `Trainer` on a single GPU.
---
## License
Apache 2.0
Note: i am a AI hobbyist, not an AI engineer