Instructions to use Sakatepon/Brujula-15M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Sakatepon/Brujula-15M with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Sakatepon/Brujula-15M", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("Sakatepon/Brujula-15M", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Sakatepon/Brujula-15M with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Sakatepon/Brujula-15M" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Sakatepon/Brujula-15M", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Sakatepon/Brujula-15M
- SGLang
How to use Sakatepon/Brujula-15M with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Sakatepon/Brujula-15M" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Sakatepon/Brujula-15M", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Sakatepon/Brujula-15M" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Sakatepon/Brujula-15M", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Sakatepon/Brujula-15M with Docker Model Runner:
docker model run hf.co/Sakatepon/Brujula-15M
Brújula-15M
The tiny champion of the Brújula family — a 15.5M-parameter decoder-only language model, trained entirely on a single consumer GPU (one Intel Arc B580, ~5h16m) from scratch on FineWeb-Edu. Brújula ("compass" in Spanish) is a minimal DeepSeek-style architecture: Multi-head Latent Attention (MLA) + RoPE + SquaredReLU FFN, tied embeddings, hybrid Muon + AdamW optimizer.
A base completion model (not instruction-tuned). At 15M it's a research/education artifact — surprisingly fluent short continuations for its size, but not a knowledge source.
Results
Perplexity (lower is better), fixed local harness at context length 1024:
| Model | FineWeb-Edu val PPL | WikiText-103 PPL |
|---|---|---|
| Brújula-15M (this model) | 78.05 | 190.74 |
It won't compete with much larger models on absolute perplexity — the point is that this is a complete, from-scratch LM that fits and trains on one consumer GPU. See the family below.
The Brújula family
| Model | Params | FineWeb val | WikiText | Notes |
|---|---|---|---|---|
| Brújula-15M | 15.5M | 78.05 | 190.74 | this model — tiny champion, fully local |
| Brújula-18M | 18M | 46.26 | 108.72 | Brújula-15M depth-grown via G_stack (4→8 layers) |
| Brújula-150M | 153.6M | 21.44 | 36.08 | the flagship (beats GPT-2 small on FineWeb val) |
Usage
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
repo = "Sakatepon/Brujula-15M"
tok = AutoTokenizer.from_pretrained(repo)
model = AutoModelForCausalLM.from_pretrained(repo, trust_remote_code=True).eval()
ids = tok("The mitochondria is the", return_tensors="pt").input_ids
out = model.generate(ids, max_new_tokens=64, do_sample=True, temperature=0.8, top_p=0.95, repetition_penalty=1.2)
print(tok.decode(out[0], skip_special_tokens=True))
It's tiny — use sampling + a continuation cue ("X is the … " rather than a bare noun); greedy tends to repetition-loop.
Architecture
| Type | decoder-only, causal LM |
| Hidden / Layers / Heads | n_embd=256 / n_layer=4 / n_head=4 |
| Context length | 1024 |
| Attention | Multi-head Latent Attention (MLA), kv-compress 32 / q-compress 64 |
| Position / FFN / Norm | RoPE / SquaredReLU / RMSNorm (pre-norm), tied embeddings |
| Vocab | 50257 (GPT-2 BPE) |
| Unique params | 15.5M |
Training
FineWeb-Edu (1.4B tokens), 1 epoch, hybrid Muon + AdamW, bf16, **5h16m on a single Intel
Arc B580**. Fully local — no cloud.
Limitations
- Base completion model — not instruction-tuned, no safety tuning.
- English only, educational-web distribution (FineWeb-Edu).
- At 15M it produces plausible short prose but unreliable facts; best on cued, definitional prompts.
- Short context (1024); no KV-cache in this reference implementation.
License & attribution
- Model + code: Apache-2.0. Training data: FineWeb-Edu (ODC-BY).
- Built on ideas from: DeepSeek-V2 (MLA), Muon optimizer, Primer (SquaredReLU), GPT-2 (BPE tokenizer).
- Downloads last month
- 19