Instructions to use mazenlhm/boxMind-1.5B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use mazenlhm/boxMind-1.5B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="mazenlhm/boxMind-1.5B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("mazenlhm/boxMind-1.5B") model = AutoModelForCausalLM.from_pretrained("mazenlhm/boxMind-1.5B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use mazenlhm/boxMind-1.5B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "mazenlhm/boxMind-1.5B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mazenlhm/boxMind-1.5B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/mazenlhm/boxMind-1.5B
- SGLang
How to use mazenlhm/boxMind-1.5B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "mazenlhm/boxMind-1.5B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mazenlhm/boxMind-1.5B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "mazenlhm/boxMind-1.5B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mazenlhm/boxMind-1.5B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use mazenlhm/boxMind-1.5B with Docker Model Runner:
docker model run hf.co/mazenlhm/boxMind-1.5B
boxMind 1.5B v1
A 1.5B-parameter decoder-only language model trained from scratch β released as a research substrate for studying small language models.
boxMind is a fully reproducible 1.5B model trained from random weights, documented end-to-end and open-sourced under Apache 2.0. It exists because the research community needs small models built for science, not products: every architectural choice, every training script, and every checkpoint is open and traceable.
Why a research tool?
Small language models (SLMs) are useful research proxies for larger models β same modern architecture, much faster iteration, far less compute per experiment. Lessons learned at 1.5B transfer up. boxMind is built to serve that role:
- Fine-tuning base for ablation studies. The SFT corpus is documented per-dataset, so the contribution of each component can be measured by leave-one-out retraining.
- Identity-rebrand case study. We successfully overwrote one identity with another using a small SFT pass on 70 hand-crafted examples β corpus and procedure open for replication.
- Distillation target. boxMind's known weaknesses (code, math, creative writing) are documented; researchers can measure how much distillation from larger teachers recovers.
- Educational reference. The full from-scratch training pipeline is traceable β useful for teaching modern LLM training without depending on pre-trained bases.
This is not a frontier model. It's a research artefact: a faithful, reproducible from-scratch build of a modern small LLM, released so anyone can study, fine-tune, or extend it.
TL;DR
- 1.39 B parameters, LLaMA-compatible architecture (GQA, SwiGLU, RoPE, RMSNorm, pre-LN)
- Trained from random weights β no pre-trained base
- Instruction-tuned with an open curated corpus + open identity-rebrand recipe
- WikiText-2 perplexity 13.80 β beats GPT-2 XL (1.5 B) at 17.48
- Apache 2.0 β commercial use, modification, and redistribution permitted
- Loads via stock
LlamaForCausalLMβ notrust_remote_coderequired
Built by Mazen Lahham on the boxMind.ai project.
Architecture
100% LLaMA-compatible. Loads with stock LlamaForCausalLM β no trust_remote_code needed.
| Hyperparameter | Value |
|---|---|
| Total parameters | 1,393,674,240 (~1.4 B) |
| Hidden layers | 28 |
| Hidden size | 2048 |
| FFN intermediate size | 5632 (SwiGLU, 8/3 Γ hidden, rounded to 256) |
| Attention heads (query) | 16 |
| Key / value heads | 4 (GQA β 4 Q heads share each KV head) |
| Head dimension | 128 |
| Vocabulary | 32,000 (SentencePiece BPE) |
| Max context length | 2048 tokens |
| Norm | RMSNorm (pre-LN, eps = 1e-6) |
| Activation | SiLU (SwiGLU FFN) |
| Position encoding | RoPE, theta = 10,000 |
| Tie word embeddings | No |
| Native dtype | bfloat16 |
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "mazenlhm/boxMind-1.5B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, dtype=torch.bfloat16).to("cuda")
prompt = "[INST] What is photosynthesis? [/INST]"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs, max_new_tokens=200, temperature=0.3, top_p=0.9,
repetition_penalty=1.05, do_sample=True,
)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
Prompt format
[INST] {your question} [/INST]
Single-turn only (the SFT corpus was single-turn). Multi-turn is unverified.
Recommended sampling
| Setting | Default | Notes |
|---|---|---|
temperature |
0.3 | Near-greedy. Higher (0.7+) drifts factual answers. |
top_p |
0.9 | |
repetition_penalty |
1.05 | |
max_new_tokens |
200 | Most answers are short. |
For identity questions ("Who are you?", "Who made you?") use temperature = 0.0 (greedy) β at higher T the creator name occasionally drifts.
Training (high level)
boxMind was trained from scratch in three stages:
- Pretraining. Standard causal language modelling on curated web text. Modest pretrain budget by 2024-2025 standards.
- Instruction tuning. Supervised fine-tuning on an open curated corpus of single-turn instruction-response pairs.
- Identity finalisation. A short SFT pass on 70 hand-crafted identity examples to lock in the boxMind name and creator attribution.
Full training scripts, configurations, and checkpoints are available in the boxMind.ai.
Evaluation
WikiText-2 perplexity
| Model | Params | WikiText-2 PPL β |
|---|---|---|
| boxMind 1.5B v1 | 1.4 B | 13.80 |
| GPT-2 XL (2019) | 1.5 B | 17.48 |
boxMind beats GPT-2 XL β the previous 1.5B reference β by ~3.7 perplexity points. The improvement comes from six years of architectural progress at the same parameter scale: boxMind uses GQA attention, SwiGLU FFN, RoPE positional encoding, and RMSNorm with pre-LN, where GPT-2 XL used MHA, GELU, learned positional embeddings, and LayerNorm. Same parameter budget, modern conventions.
What we don't claim
- boxMind does not match modern same-class 2024-2025 small chat models. They were trained at 10-100Γ more pretrain tokens per parameter. The gap to those models is pretrain compute budget, not architecture or recipe.
- boxMind does not include successful RLHF / DPO training.
- boxMind does not have heavy code, math, or multilingual pretraining coverage.
- English only, 2048-token context, single-turn instruction format verified.
boxMind is a useful research substrate at 1.5B parameters, not a frontier-class assistant. The contribution is the fully open, fully reproducible recipe β not the benchmark score.
Known limitations
- Modest pretrain budget. Expect factual errors on details, weak math/reasoning, and weak creative output (haikus, poems, fiction often loop or collapse to a one-liner).
- Creator name drift at high temperature. Use
temperature = 0.0for deterministic creator questions. - English only. No multilingual pretraining.
- 2048-token context. No long-context fine-tune.
- Single-turn SFT only. Multi-turn chat is unverified.
- No RLHF / DPO. Instruction-following quality is bounded by SFT.
- Not safety-tuned. May produce incorrect, biased, or offensive content. Not suitable for high-stakes decisions or production use without your own evaluation and safety layer.
Files in this release
| File | Purpose |
|---|---|
model.safetensors |
Model weights, bf16 |
config.json |
LLaMA-format architecture config |
generation_config.json |
Recommended sampling defaults |
tokenizer.model |
SentencePiece BPE (32 K vocab) |
tokenizer_config.json |
LlamaTokenizer wrapper config + chat template |
special_tokens_map.json |
BOS / EOS / UNK / PAD mapping |
README.md |
This model card |
LICENSE |
Apache 2.0 |
Citation
@misc{boxmind2026,
author = {Mazen Lahham},
title = {boxMind 1.5B v1: An open-source from-scratch language model for SLM research},
year = {2026},
publisher = {HuggingFace},
url = {https://huggingface.co/mazenlhm/boxMind-1.5B}
}
License
Apache License 2.0. Commercial use, modification, and redistribution permitted with attribution.
Acknowledgments
Built on open architecture conventions established by the LLaMA family. Training data sources and full attribution are documented in the boxMind.ai.
Built by Mazen Lahham β boxMindLLM project.
- Downloads last month
- 72