Instructions to use Eeppa/TinyBuddy-500K with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Eeppa/TinyBuddy-500K with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Eeppa/TinyBuddy-500K", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("Eeppa/TinyBuddy-500K", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Eeppa/TinyBuddy-500K with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Eeppa/TinyBuddy-500K" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Eeppa/TinyBuddy-500K", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Eeppa/TinyBuddy-500K
- SGLang
How to use Eeppa/TinyBuddy-500K with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Eeppa/TinyBuddy-500K" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Eeppa/TinyBuddy-500K", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Eeppa/TinyBuddy-500K" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Eeppa/TinyBuddy-500K", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Eeppa/TinyBuddy-500K with Docker Model Runner:
docker model run hf.co/Eeppa/TinyBuddy-500K
TinyBuddy-500K
⚠️ Educational / experimental model. TinyBuddy-500K is a from-scratch tiny Llama-style language model (~547K parameters) trained on a synthetic slice of TinyStories-style text. It is not a useful assistant — it is a working demonstration of training extremely small models from scratch. See the Limitations section.
Model description
TinyBuddy-500K is a very small decoder-only Transformer language model trained on synthetic children's stories in the style of TinyStories. The architecture follows the LLaMA design (RMSNorm, Grouped Query Attention, SiLU MLP, tied embeddings).
| Hyperparameter | Value |
|---|---|
| Parameters | 547,296 (~547K) |
| Layers | 2 |
| Attention heads | 4 |
| Key-Value heads (GQA) | 2 |
| Hidden size | 96 |
| MLP intermediate size | 384 |
| Context length | 512 |
| Vocab size | 2,048 (BPE trained from scratch) |
| Norm | RMSNorm |
| Activation | SiLU |
| Position embeddings | Learned absolute |
| Weight tying | Yes (tied embeddings) |
| Precision | float32 |
Training details
- Data: Synthetic TinyStories-style corpus (~128K tokens)
- Tokenizer: Custom byte-level BPE with 2048 vocabulary
- Optimizer: AdamW
- Steps: ~300 steps on CPU
- Hardware: Single CPU core
- Final loss: ~0.17
Usage
This model uses custom modeling code, so you must pass trust_remote_code=True.
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
repo = "Eeppa/TinyBuddy-500K"
tokenizer = AutoTokenizer.from_pretrained(repo, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(repo, trust_remote_code=True)
model.eval()
prompt = "Once upon a time, there was a little girl named Lily."
input_ids = tokenizer.encode(prompt, return_tensors="pt")
out = model.generate(input_ids, max_new_tokens=60, temperature=0.8, top_k=50)
print(tokenizer.decode(out[0], skip_special_tokens=True))
Limitations
This model is extremely small and was trained for a very short time on limited data.
What works:
- Basic English patterns and short sentence structure
- Simple story-like generation
What's broken:
- Very limited coherence (usually breaks after 1–2 sentences)
- High repetition
- Poor long-range consistency
- No real reasoning or factual knowledge
This model exists purely for educational purposes to explore the lower limits of language model size.
License
MIT
Citation
@misc{tinybuddy500k,
title = {TinyBuddy-500K: An educational ~500K parameter Llama-style model trained on TinyStories},
year = {2026},
note = {Educational demonstration of extremely small language models.}
}
Built with Llama.
- Downloads last month
- 14