Instructions to use Sakatepon/Brujula-150M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Sakatepon/Brujula-150M with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Sakatepon/Brujula-150M", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("Sakatepon/Brujula-150M", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Sakatepon/Brujula-150M with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Sakatepon/Brujula-150M" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Sakatepon/Brujula-150M", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Sakatepon/Brujula-150M
- SGLang
How to use Sakatepon/Brujula-150M with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Sakatepon/Brujula-150M" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Sakatepon/Brujula-150M", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Sakatepon/Brujula-150M" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Sakatepon/Brujula-150M", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Sakatepon/Brujula-150M with Docker Model Runner:
docker model run hf.co/Sakatepon/Brujula-150M
Brújula-150M
A 153.6M-parameter decoder-only language model, trained from scratch on FineWeb-Edu as part of a solo, consumer-GPU hobby project. Brújula ("compass" in Spanish) is a deliberately minimal DeepSeek-style architecture: Multi-head Latent Attention (MLA) + RoPE + SquaredReLU FFN, with tied embeddings, trained with a hybrid Muon + AdamW optimizer.
This is a base completion model — it continues text; it is not instruction-tuned and not safety-tuned. It writes fluent, on-topic educational-web prose; it is not a reliable source of facts.
Results
Perplexity (lower is better), evaluated on a fixed local harness at context length 1024:
| Model | FineWeb-Edu val PPL | WikiText-103 PPL |
|---|---|---|
| Brújula-150M (this model) | 21.44 | 36.08 |
| GPT-2 small (124M, same harness) | 26.61 | 29.31 |
Honest reading: Brújula-150M beats GPT-2 small on FineWeb-Edu val (its home turf), but loses on WikiText-103 — a data-distribution effect (FineWeb-Edu is narrower than GPT-2's WebText), not a capacity problem. It's the main known limitation, stated openly, not hidden.
The Brújula family
| Model | Params | FineWeb val | WikiText | Notes |
|---|---|---|---|---|
| Brújula-15M | 15.5M | 78.05 | 190.74 | tiny champion, trained locally on one Arc B580 |
| Brújula-18M | 18M | 46.26 | 108.72 | Brújula-15M depth-grown via G_stack (4→8 layers) |
| Brújula-150M | 153.6M | 21.44 | 36.08 | this model (the flagship) |
Usage
Custom architecture code, so pass trust_remote_code=True:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
repo = "Sakatepon/Brujula-150M"
tok = AutoTokenizer.from_pretrained(repo)
model = AutoModelForCausalLM.from_pretrained(repo, trust_remote_code=True).eval()
ids = tok("The mitochondria is the", return_tensors="pt").input_ids
out = model.generate(ids, max_new_tokens=64, do_sample=True, temperature=0.8, top_p=0.95)
print(tok.decode(out[0], skip_special_tokens=True))
Tip: it's a small base model — use sampling and cued/definitional prompts ("X is the … ") rather than bare nouns. Greedy decoding tends to repetition-loop.
Architecture
| Type | decoder-only, causal LM |
Hidden size (n_embd) |
768 |
Layers (n_layer) |
20 |
Heads (n_head) |
12 |
| Context length | 1024 |
| Attention | Multi-head Latent Attention (MLA), kv-compress 64 / q-compress 192 |
| Position | RoPE (GPT-NeoX convention) |
| FFN | SquaredReLU (w2(relu(w1 x)^2)); Norm RMSNorm (pre-norm); Embeddings tied |
| Vocab | 50257 (GPT-2 BPE) |
| Unique params | 153.6M |
Training
FineWeb-Edu (~5B tokens), 1 epoch (76,293 steps), hybrid Muon (matrix weights) + AdamW (embeddings/norms), peak LR 1.7e-3, batch 64, bf16, ~12.3h on a single cloud GPU (the 15M/18M siblings train locally on one Intel Arc B580).
The story
Brújula is the honest output of a part-time, single-consumer-GPU project. It started from an
over-engineered 331M model (with a bolted-on "engram" retrieval mechanism + dropout +
torch.compile + gradient checkpointing) that displayed low loss but evaluated ~6× worse
in clean inference — a broken train/eval stack. The fix was ruthless minimalism: strip every
unvalidated component, keep MLA + RoPE + SquaredReLU + tied embeddings, A/B each change
honestly. A 15M minimal model then beat the 331M one by ~2.5×. Adding the Muon optimizer and
scaling to 150M produced this. The journey is the point; the perplexity is just where it landed.
Limitations
- Base completion model — not instruction-tuned, no safety tuning.
- English only, educational-web distribution (FineWeb-Edu); weaker out-of-distribution (the WikiText gap).
- Not a knowledge base — at 150M it produces plausible prose but unreliable facts.
- Short context (1024); no KV-cache in this reference implementation.
License & attribution
- Model + code: Apache-2.0.
- Training data: FineWeb-Edu (ODC-BY).
- Built on ideas from: DeepSeek-V2 (MLA), Muon optimizer, Primer (SquaredReLU), GPT-2 (BPE tokenizer).
- Downloads last month
- 18