Text Generation
Transformers
Safetensors
English
gpt2
causal-lm
from-scratch
fineweb
undertrained
text-generation-inference
Instructions to use helloadhavan/llara1.0-100M-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use helloadhavan/llara1.0-100M-base with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="helloadhavan/llara1.0-100M-base")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("helloadhavan/llara1.0-100M-base") model = AutoModelForCausalLM.from_pretrained("helloadhavan/llara1.0-100M-base") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use helloadhavan/llara1.0-100M-base with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "helloadhavan/llara1.0-100M-base" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "helloadhavan/llara1.0-100M-base", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/helloadhavan/llara1.0-100M-base
- SGLang
How to use helloadhavan/llara1.0-100M-base with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "helloadhavan/llara1.0-100M-base" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "helloadhavan/llara1.0-100M-base", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "helloadhavan/llara1.0-100M-base" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "helloadhavan/llara1.0-100M-base", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use helloadhavan/llara1.0-100M-base with Docker Model Runner:
docker model run hf.co/helloadhavan/llara1.0-100M-base
| language: | |
| - en | |
| license: apache-2.0 | |
| tags: | |
| - gpt2 | |
| - causal-lm | |
| - text-generation | |
| - from-scratch | |
| - fineweb | |
| - undertrained | |
| library_name: transformers | |
| pipeline_tag: text-generation | |
| # Llara | |
| <img src="data:image/svg+xml;base64,PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiPz4KPHN2ZyB2ZXJzaW9uPSIxLjEiIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8yMDAwL3N2ZyIgd2lkdGg9IjUwMCIgaGVpZ2h0PSIyMDAiIHN0eWxlPSJiYWNrZ3JvdW5kLWNvbG9yOiAjRkZGRkZGOyI+CiAgPGRlZnM+CiAgICA8c3R5bGUgdHlwZT0idGV4dC9jc3MiPgogICAgICBAaW1wb3J0IHVybCgnaHR0cHM6Ly9mb250cy5nb29nbGVhcGlzLmNvbS9jc3MyP2ZhbWlseT1JQk0rUGxleCtTYW5zOml0YWwsd2dodEAwLDEwMC4uNzAwOzEsMTAwLi43MDAnKTsKICAgICAgCiAgICAgIC5jdXN0b20tdGV4dCB7CiAgICAgICAgZm9udC1mYW1pbHk6ICdJQk0gUGxleCBTYW5zJywnUm9ib3RvJywgc2Fucy1zZXJpZjsKICAgICAgICBmb250LXNpemU6IDcwcHg7CiAgICAgICAgZmlsbDogIzAwMDAwMDsKICAgICAgICBmb250LXdlaWdodDogNjAwOyAgCiAgICAgIH0KICAgIDwvc3R5bGU+CiAgPC9kZWZzPgo8cGF0aCBkPSJNMCAwIEM2NiAwIDEzMiAwIDIwMCAwIEMyMDAgNjYgMjAwIDEzMiAyMDAgMjAwIEMxMzQgMjAwIDY4IDIwMCAwIDIwMCBDMCAxMzQgMCA2OCAwIDAgWiAiIGZpbGw9IiNGQUZBRkEiIHRyYW5zZm9ybT0idHJhbnNsYXRlKDAsMCkiLz4KPHBhdGggZD0iTTAgMCBDMzkuMjcgMCA3OC41NCAwIDExOSAwIEMxMTkgMzkuMjcgMTE5IDc4LjU0IDExOSAxMTkgQzEwNi4xMyAxMTkgOTMuMjYgMTE5IDgwIDExOSBDODAgOTIuOTMgODAgNjYuODYgODAgNDAgQzUzLjYgNDAgMjcuMiA0MCAwIDQwIEMwIDI2LjggMCAxMy42IDAgMCBaICIgZmlsbD0iIzAxMDEwMSIgdHJhbnNmb3JtPSJ0cmFuc2xhdGUoNDAsNDApIi8+CjxwYXRoIGQ9Ik0wIDAgQzEzLjIgMCAyNi40IDAgNDAgMCBDNDAgMTIuODcgNDAgMjUuNzQgNDAgMzkgQzI2LjggMzkgMTMuNiAzOSAwIDM5IEMwIDI2LjEzIDAgMTMuMjYgMCAwIFogIiBmaWxsPSIjMDIwMjAyIiB0cmFuc2Zvcm09InRyYW5zbGF0ZSg0MCwxMjApIi8+CiAgPHRleHQgeD0iMjAwIiB5PSIxMzUiIGNsYXNzPSJjdXN0b20tdGV4dCI+TGxhcmExLjA8L3RleHQ+Cjwvc3ZnPgo="> | |
| Llara is a 91.4M parameter autoregressive language model trained from scratch on English web text. It follows the GPT-2 Small architecture and is trained entirely from random initialisation — no pretrained weights, no distillation, no fine-tuning of an existing model. | |
| but it does use GPT's tokenizer | |
| The name **Llara** is original and unrelated to LLaMA or LoRA. | |
| **Note**: The model is undertrained according to `The Chinchilla Laws (2022)` | |
| --- | |
| ## Model Details | |
| | Property | Value | | |
| |---|---| | |
| | Architecture | GPT-2 (decoder-only transformer) | | |
| | Parameters | ~90-100M | | |
| | Context length | 256 tokens | | |
| | Embedding dim | 768 | | |
| | Layers | 12 | | |
| | Attention heads | 12 | | |
| | Vocabulary | 50,257 (GPT-2 BPE) | | |
| | Training data | FineWeb (HuggingFaceFW/fineweb) + Custom dataset | | |
| | Training docs | 256,000,000 tokens | | |
| | Epochs | 1 | | |
| | Precision | fp16 | | |
| --- | |
| ## Training | |
| Llara was trained on 1 million documents sampled from [FineWeb](https://huggingface.co/datasets/HuggingFaceFW/fineweb), a large-scale curated English web dataset. Documents were tokenised with the GPT-2 BPE tokeniser and packed into non-overlapping 1024-token blocks. | |
| **Training configuration:** | |
| | Hyperparameter | Value | | |
| |---|---| | |
| | Optimiser | AdamW | | |
| | Learning rate | 3e-4 | | |
| | LR schedule | Cosine decay | | |
| | Warmup steps | 2,000 | | |
| | Weight decay | 0.1 | | |
| | Effective batch size | 32 | | |
| | Gradient accumulation | 8 steps | | |
| | Dropout | 0.1 (residual, embedding, attention) | | |
| Gradient checkpointing was enabled throughout training to reduce memory usage. | |
| --- | |
| ## Usage | |
| ```python | |
| from transformers import GPT2LMHeadModel, AutoTokenizer, pipeline | |
| model = GPT2LMHeadModel.from_pretrained("helloadhavan/llara1.0-100M-base") | |
| tokenizer = AutoTokenizer.from_pretrained("helloadhavan/llara1.0-100M-base") | |
| gen = pipeline("text-generation", model=model, tokenizer=tokenizer) | |
| output = gen( | |
| "The history of artificial intelligence", | |
| max_new_tokens=200, | |
| do_sample=True, | |
| temperature=0.8, | |
| top_p=0.95, | |
| repetition_penalty=1.1, | |
| ) | |
| print(output[0]["generated_text"]) | |
| ``` | |
| --- | |
| ## Limitations | |
| - Llara is trained on English web text only and performs poorly on other languages. | |
| - Like all autoregressive LMs trained on web data, it may reproduce biases, factual errors, or inappropriate content present in the training corpus. | |
| - It is a research model trained from scratch and is not instruction-tuned or aligned — it should not be used in production or user-facing applications without further fine-tuning and safety work. | |
| - At 95M parameters and 256k training documents, it is significantly smaller and less trained than models like GPT-2 (which saw 40GB of text). Outputs may be incoherent on complex prompts. | |
| --- | |
| ## Intended Use | |
| Llara is intended for: | |
| - Research and experimentation with small language models | |
| - Learning how GPT-style models are trained from scratch | |
| - A base for fine-tuning on downstream tasks | |
| --- | |
| ## Training Framework | |
| Trained using [Hugging Face Transformers](https://github.com/huggingface/transformers) `Trainer` on a single GPU. | |
| --- | |
| ## License | |
| Apache 2.0 | |
| <div> | |
| <blockquote><strong>Note:</strong> i am a AI hobbyist, not an AI engineer</blockquote> | |
| </div> |