Instructions to use mazenlhm/boxMind-1.5B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use mazenlhm/boxMind-1.5B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="mazenlhm/boxMind-1.5B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("mazenlhm/boxMind-1.5B")
model = AutoModelForCausalLM.from_pretrained("mazenlhm/boxMind-1.5B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use mazenlhm/boxMind-1.5B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "mazenlhm/boxMind-1.5B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "mazenlhm/boxMind-1.5B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/mazenlhm/boxMind-1.5B

SGLang

How to use mazenlhm/boxMind-1.5B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "mazenlhm/boxMind-1.5B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "mazenlhm/boxMind-1.5B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "mazenlhm/boxMind-1.5B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "mazenlhm/boxMind-1.5B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use mazenlhm/boxMind-1.5B with Docker Model Runner:
```
docker model run hf.co/mazenlhm/boxMind-1.5B
```

boxMind 1.5B v1

A 1.5B-parameter decoder-only language model trained from scratch — released as a research substrate for studying small language models.

boxMind is a fully reproducible 1.5B model trained from random weights, documented end-to-end and open-sourced under Apache 2.0. It exists because the research community needs small models built for science, not products: every architectural choice, every training script, and every checkpoint is open and traceable.

Why a research tool?

Small language models (SLMs) are useful research proxies for larger models — same modern architecture, much faster iteration, far less compute per experiment. Lessons learned at 1.5B transfer up. boxMind is built to serve that role:

Fine-tuning base for ablation studies. The SFT corpus is documented per-dataset, so the contribution of each component can be measured by leave-one-out retraining.
Identity-rebrand case study. We successfully overwrote one identity with another using a small SFT pass on 70 hand-crafted examples — corpus and procedure open for replication.
Distillation target. boxMind's known weaknesses (code, math, creative writing) are documented; researchers can measure how much distillation from larger teachers recovers.
Educational reference. The full from-scratch training pipeline is traceable — useful for teaching modern LLM training without depending on pre-trained bases.

This is not a frontier model. It's a research artefact: a faithful, reproducible from-scratch build of a modern small LLM, released so anyone can study, fine-tune, or extend it.

TL;DR

1.39 B parameters, LLaMA-compatible architecture (GQA, SwiGLU, RoPE, RMSNorm, pre-LN)
Trained from random weights — no pre-trained base
Instruction-tuned with an open curated corpus + open identity-rebrand recipe
WikiText-2 perplexity 13.80 — beats GPT-2 XL (1.5 B) at 17.48
Apache 2.0 — commercial use, modification, and redistribution permitted
Loads via stock LlamaForCausalLM — no trust_remote_code required

Built by Mazen Lahham on the boxMind.ai project.

Architecture

100% LLaMA-compatible. Loads with stock LlamaForCausalLM — no trust_remote_code needed.

Hyperparameter	Value
Total parameters	1,393,674,240 (~1.4 B)
Hidden layers	28
Hidden size	2048
FFN intermediate size	5632 (SwiGLU, 8/3 × hidden, rounded to 256)
Attention heads (query)	16
Key / value heads	4 (GQA — 4 Q heads share each KV head)
Head dimension	128
Vocabulary	32,000 (SentencePiece BPE)
Max context length	2048 tokens
Norm	RMSNorm (pre-LN, eps = 1e-6)
Activation	SiLU (SwiGLU FFN)
Position encoding	RoPE, theta = 10,000
Tie word embeddings	No
Native dtype	bfloat16

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "mazenlhm/boxMind-1.5B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, dtype=torch.bfloat16).to("cuda")

prompt = "[INST] What is photosynthesis? [/INST]"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs, max_new_tokens=200, temperature=0.3, top_p=0.9,
    repetition_penalty=1.05, do_sample=True,
)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Prompt format

[INST] {your question} [/INST]

Single-turn only (the SFT corpus was single-turn). Multi-turn is unverified.

Recommended sampling

Setting	Default	Notes
`temperature`	0.3	Near-greedy. Higher (0.7+) drifts factual answers.
`top_p`	0.9
`repetition_penalty`	1.05
`max_new_tokens`	200	Most answers are short.

For identity questions ("Who are you?", "Who made you?") use temperature = 0.0 (greedy) — at higher T the creator name occasionally drifts.

Training (high level)

boxMind was trained from scratch in three stages:

Pretraining. Standard causal language modelling on curated web text. Modest pretrain budget by 2024-2025 standards.
Instruction tuning. Supervised fine-tuning on an open curated corpus of single-turn instruction-response pairs.
Identity finalisation. A short SFT pass on 70 hand-crafted identity examples to lock in the boxMind name and creator attribution.

Full training scripts, configurations, and checkpoints are available in the boxMind.ai.

Evaluation

WikiText-2 perplexity

Model	Params	WikiText-2 PPL ↓
boxMind 1.5B v1	1.4 B	13.80
GPT-2 XL (2019)	1.5 B	17.48

boxMind beats GPT-2 XL — the previous 1.5B reference — by ~3.7 perplexity points. The improvement comes from six years of architectural progress at the same parameter scale: boxMind uses GQA attention, SwiGLU FFN, RoPE positional encoding, and RMSNorm with pre-LN, where GPT-2 XL used MHA, GELU, learned positional embeddings, and LayerNorm. Same parameter budget, modern conventions.

What we don't claim

boxMind does not match modern same-class 2024-2025 small chat models. They were trained at 10-100× more pretrain tokens per parameter. The gap to those models is pretrain compute budget, not architecture or recipe.
boxMind does not include successful RLHF / DPO training.
boxMind does not have heavy code, math, or multilingual pretraining coverage.
English only, 2048-token context, single-turn instruction format verified.

boxMind is a useful research substrate at 1.5B parameters, not a frontier-class assistant. The contribution is the fully open, fully reproducible recipe — not the benchmark score.

Known limitations

Modest pretrain budget. Expect factual errors on details, weak math/reasoning, and weak creative output (haikus, poems, fiction often loop or collapse to a one-liner).
Creator name drift at high temperature. Use temperature = 0.0 for deterministic creator questions.
English only. No multilingual pretraining.
2048-token context. No long-context fine-tune.
Single-turn SFT only. Multi-turn chat is unverified.
No RLHF / DPO. Instruction-following quality is bounded by SFT.
Not safety-tuned. May produce incorrect, biased, or offensive content. Not suitable for high-stakes decisions or production use without your own evaluation and safety layer.

Files in this release

File	Purpose
`model.safetensors`	Model weights, bf16
`config.json`	LLaMA-format architecture config
`generation_config.json`	Recommended sampling defaults
`tokenizer.model`	SentencePiece BPE (32 K vocab)
`tokenizer_config.json`	LlamaTokenizer wrapper config + chat template
`special_tokens_map.json`	BOS / EOS / UNK / PAD mapping
`README.md`	This model card
`LICENSE`	Apache 2.0

Citation

@misc{boxmind2026,
  author       = {Mazen Lahham},
  title        = {boxMind 1.5B v1: An open-source from-scratch language model for SLM research},
  year         = {2026},
  publisher    = {HuggingFace},
  url          = {https://huggingface.co/mazenlhm/boxMind-1.5B}
}

License

Apache License 2.0. Commercial use, modification, and redistribution permitted with attribution.

Acknowledgments

Built on open architecture conventions established by the LLaMA family. Training data sources and full attribution are documented in the boxMind.ai.

Built by Mazen Lahham — boxMindLLM project.

Downloads last month: 72

Safetensors

Model size

1B params

Tensor type

BF16