Instructions to use helloadhavan/llara1.0-100M-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use helloadhavan/llara1.0-100M-base with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="helloadhavan/llara1.0-100M-base")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("helloadhavan/llara1.0-100M-base")
model = AutoModelForCausalLM.from_pretrained("helloadhavan/llara1.0-100M-base")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use helloadhavan/llara1.0-100M-base with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "helloadhavan/llara1.0-100M-base"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "helloadhavan/llara1.0-100M-base",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/helloadhavan/llara1.0-100M-base

SGLang

How to use helloadhavan/llara1.0-100M-base with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "helloadhavan/llara1.0-100M-base" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "helloadhavan/llara1.0-100M-base",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "helloadhavan/llara1.0-100M-base" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "helloadhavan/llara1.0-100M-base",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use helloadhavan/llara1.0-100M-base with Docker Model Runner:
```
docker model run hf.co/helloadhavan/llara1.0-100M-base
```

llara1.0-100M-base / README.md

helloadhavan

Update README.md

23d5376 verified 13 days ago

preview code

Raw

History Blame Contribute Delete

4.84 kB

	---
	language:
	- en
	license: apache-2.0
	tags:
	- gpt2
	- causal-lm
	- text-generation
	- from-scratch
	- fineweb
	- undertrained
	library_name: transformers
	pipeline_tag: text-generation
	---

	# Llara

	<img src="data:image/svg+xml;base64,PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiPz4KPHN2ZyB2ZXJzaW9uPSIxLjEiIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8yMDAwL3N2ZyIgd2lkdGg9IjUwMCIgaGVpZ2h0PSIyMDAiIHN0eWxlPSJiYWNrZ3JvdW5kLWNvbG9yOiAjRkZGRkZGOyI+CiAgPGRlZnM+CiAgICA8c3R5bGUgdHlwZT0idGV4dC9jc3MiPgogICAgICBAaW1wb3J0IHVybCgnaHR0cHM6Ly9mb250cy5nb29nbGVhcGlzLmNvbS9jc3MyP2ZhbWlseT1JQk0rUGxleCtTYW5zOml0YWwsd2dodEAwLDEwMC4uNzAwOzEsMTAwLi43MDAnKTsKICAgICAgCiAgICAgIC5jdXN0b20tdGV4dCB7CiAgICAgICAgZm9udC1mYW1pbHk6ICdJQk0gUGxleCBTYW5zJywnUm9ib3RvJywgc2Fucy1zZXJpZjsKICAgICAgICBmb250LXNpemU6IDcwcHg7CiAgICAgICAgZmlsbDogIzAwMDAwMDsKICAgICAgICBmb250LXdlaWdodDogNjAwOyAgCiAgICAgIH0KICAgIDwvc3R5bGU+CiAgPC9kZWZzPgo8cGF0aCBkPSJNMCAwIEM2NiAwIDEzMiAwIDIwMCAwIEMyMDAgNjYgMjAwIDEzMiAyMDAgMjAwIEMxMzQgMjAwIDY4IDIwMCAwIDIwMCBDMCAxMzQgMCA2OCAwIDAgWiAiIGZpbGw9IiNGQUZBRkEiIHRyYW5zZm9ybT0idHJhbnNsYXRlKDAsMCkiLz4KPHBhdGggZD0iTTAgMCBDMzkuMjcgMCA3OC41NCAwIDExOSAwIEMxMTkgMzkuMjcgMTE5IDc4LjU0IDExOSAxMTkgQzEwNi4xMyAxMTkgOTMuMjYgMTE5IDgwIDExOSBDODAgOTIuOTMgODAgNjYuODYgODAgNDAgQzUzLjYgNDAgMjcuMiA0MCAwIDQwIEMwIDI2LjggMCAxMy42IDAgMCBaICIgZmlsbD0iIzAxMDEwMSIgdHJhbnNmb3JtPSJ0cmFuc2xhdGUoNDAsNDApIi8+CjxwYXRoIGQ9Ik0wIDAgQzEzLjIgMCAyNi40IDAgNDAgMCBDNDAgMTIuODcgNDAgMjUuNzQgNDAgMzkgQzI2LjggMzkgMTMuNiAzOSAwIDM5IEMwIDI2LjEzIDAgMTMuMjYgMCAwIFogIiBmaWxsPSIjMDIwMjAyIiB0cmFuc2Zvcm09InRyYW5zbGF0ZSg0MCwxMjApIi8+CiAgPHRleHQgeD0iMjAwIiB5PSIxMzUiIGNsYXNzPSJjdXN0b20tdGV4dCI+TGxhcmExLjA8L3RleHQ+Cjwvc3ZnPgo=">


	Llara is a 91.4M parameter autoregressive language model trained from scratch on English web text. It follows the GPT-2 Small architecture and is trained entirely from random initialisation — no pretrained weights, no distillation, no fine-tuning of an existing model.
	but it does use GPT's tokenizer

	The name Llara is original and unrelated to LLaMA or LoRA.

	Note: The model is undertrained according to `The Chinchilla Laws (2022)`

	---

	## Model Details

	\| Property \| Value \|
	\|---\|---\|
	\| Architecture \| GPT-2 (decoder-only transformer) \|
	\| Parameters \| ~90-100M \|
	\| Context length \| 256 tokens \|
	\| Embedding dim \| 768 \|
	\| Layers \| 12 \|
	\| Attention heads \| 12 \|
	\| Vocabulary \| 50,257 (GPT-2 BPE) \|
	\| Training data \| FineWeb (HuggingFaceFW/fineweb) + Custom dataset \|
	\| Training docs \| 256,000,000 tokens \|
	\| Epochs \| 1 \|
	\| Precision \| fp16 \|

	---

	## Training

	Llara was trained on 1 million documents sampled from [FineWeb](https://huggingface.co/datasets/HuggingFaceFW/fineweb), a large-scale curated English web dataset. Documents were tokenised with the GPT-2 BPE tokeniser and packed into non-overlapping 1024-token blocks.

	Training configuration:

	\| Hyperparameter \| Value \|
	\|---\|---\|
	\| Optimiser \| AdamW \|
	\| Learning rate \| 3e-4 \|
	\| LR schedule \| Cosine decay \|
	\| Warmup steps \| 2,000 \|
	\| Weight decay \| 0.1 \|
	\| Effective batch size \| 32 \|
	\| Gradient accumulation \| 8 steps \|
	\| Dropout \| 0.1 (residual, embedding, attention) \|

	Gradient checkpointing was enabled throughout training to reduce memory usage.

	---

	## Usage

	```python
	from transformers import GPT2LMHeadModel, AutoTokenizer, pipeline

	model = GPT2LMHeadModel.from_pretrained("helloadhavan/llara1.0-100M-base")
	tokenizer = AutoTokenizer.from_pretrained("helloadhavan/llara1.0-100M-base")

	gen = pipeline("text-generation", model=model, tokenizer=tokenizer)

	output = gen(
	"The history of artificial intelligence",
	max_new_tokens=200,
	do_sample=True,
	temperature=0.8,
	top_p=0.95,
	repetition_penalty=1.1,
	)

	print(output[0]["generated_text"])
	```

	---

	## Limitations

	- Llara is trained on English web text only and performs poorly on other languages.
	- Like all autoregressive LMs trained on web data, it may reproduce biases, factual errors, or inappropriate content present in the training corpus.
	- It is a research model trained from scratch and is not instruction-tuned or aligned — it should not be used in production or user-facing applications without further fine-tuning and safety work.
	- At 95M parameters and 256k training documents, it is significantly smaller and less trained than models like GPT-2 (which saw 40GB of text). Outputs may be incoherent on complex prompts.

	---

	## Intended Use

	Llara is intended for:

	- Research and experimentation with small language models
	- Learning how GPT-style models are trained from scratch
	- A base for fine-tuning on downstream tasks

	---

	## Training Framework

	Trained using [Hugging Face Transformers](https://github.com/huggingface/transformers) `Trainer` on a single GPU.

	---

	## License

	Apache 2.0

	<div>
	<blockquote><strong>Note:</strong> i am a AI hobbyist, not an AI engineer</blockquote>
	</div>