Nanochat Moroccan Instruct 0.7B
Nanochat Moroccan Instruct 0.7B is a Hugging Face transformers export of a 702M-parameter nanochat model tuned for Moroccan Darija instruction following and chat. It starts from the Nanochat Moroccan Base 0.7B checkpoint and is intended for research, experimentation, and low-cost Darija assistant work.
The model is strongest when prompted in Moroccan Darija. It is not meant to be a general English-first benchmark model, and it should be evaluated primarily in the language setting it was built for.
Model Overview
Nanochat Moroccan Instruct 0.7B has the following characteristics:
- Type: Causal language model
- Training stage: Pretraining + supervised instruction tuning
- Number of parameters: 701,893,188
- Number of layers: 18
- Embedding dimension: 1152
- Number of attention heads: 9
- Number of KV heads: 9
- Context length: 2048
- Window pattern:
SSSL - Effective tokenizer size: 32,768
Architecture details:
- Rotary position embeddings
- Grouped-query attention support
- ReLU squared MLP
- RMSNorm without learned affine parameters
- Untied token embedding and LM head
- Sliding-window attention with a full-context final layer
- No linear biases
Training Data
This model was instruction-tuned for Moroccan Darija chat and assistant-style interaction.
Instruction-tuning datasets:
Lyte/Moroccan-Darija-Instruct-573KGemMaroc/TULU-3-50k-darija-english
Pretraining data reference:
Lyte/darija-pretraining-corpus
The intended domain is Moroccan Darija. It can respond outside that domain, but it was not trained as a broad English-centric assistant.
Quickstart
Use a recent transformers release and load the model with trust_remote_code=True.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "KandirResearch/Nanochat-Moroccan-Instruct-0.7B"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
trust_remote_code=True,
torch_dtype=torch.bfloat16,
device_map="auto",
)
messages = [{"role": "user", "content": "كيفاش نطيب الطاجين؟ وضح ليا مزيان عافاك"}]
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_dict=True,
return_tensors="pt",
)
inputs = {k: v.to(model.device) for k, v in inputs.items()}
input_len = inputs["input_ids"].shape[-1]
outputs = model.generate(
**inputs,
max_new_tokens=256,
do_sample=True,
temperature=0.6,
top_k=200,
top_p=0.85,
min_p=0.01,
repetition_penalty=1.1,
use_cache=True,
)
generated_tokens = outputs[0][input_len:]
print(tokenizer.decode(generated_tokens, skip_special_tokens=True))
# باش طيب الطاجين خاصك غير تنقص البوطة بزاف وتستعمل العطرية كاملة، وتستف الخضرة ديالك فوق منها باش تجيك المريقة خاثرة ومغمرة.
Demo
You can try it here:
Lyte/Nanochat-Moroccco-Instruct
Example of a multi-turn conversation:
Inference Notes
This is a small chat model. Sampling settings matter.
- For the optimal generation use the following parameters:
temperature=0.6,top_p=0.85,min_p=0.01,top_k=200,repetition_penalty=1.1. - Avoid evaluating it mainly through English coding or math prompts.
- If you see repetition, increase
repetition_penalty=1.2first and tune sampling.
Evaluation
Darija Benchmarks
The following results were obtained with the Hugging Face export and lm-evaluation-harness using the local Darija task configs.
0-shot
| Benchmark | Metric | Score |
|---|---|---|
| DarijaMMLU | acc | 28.26 |
| ArabicMMLU subset | acc | 30.45 |
| MMLU subset | acc | 23.93 |
| DarijaHellaSwag | acc | 27.54 |
| DarijaHellaSwag | acc_norm | 29.51 |
3-shot
| Benchmark | Metric | Score |
|---|---|---|
| DarijaMMLU | acc | 28.87 |
| ArabicMMLU subset | acc | 30.36 |
| MMLU subset | acc | 25.94 |
| DarijaHellaSwag | acc | 28.09 |
| DarijaHellaSwag | acc_norm | 31.12 |
These are modest scores, but they are consistent with a 702M model focused on Moroccan Darija instruction-following rather than broad benchmark optimization.
Original nanochat Chat Eval
These scores come from the original nanochat evaluation setup used during release.
| Metric | Score |
|---|---|
| ARC Easy | 26.56 |
| ARC Challenge | 25.34 |
| MMLU | 24.37 |
| GSM8K | 0.08 |
| HumanEval | 0.61 |
| SpellingBee | 0.00 |
Intended Use
This model is a good fit for:
- Moroccan Darija chat and prompting
- Darija instruction-following experiments
- Lightweight research on underrepresented language varieties
- Fine-tuning and evaluation work in the nanochat ecosystem
It is a poor fit for:
- high-stakes advice
- factual applications without verification
- code-heavy or math-heavy use cases
- judging Darija quality through English-first benchmarks alone
Limitations
- This is still a small model and it can repeat, drift, or hallucinate.
- It is not reliable for medical, legal, or financial advice.
- It is weaker on coding, formal mathematics, and broad academic QA than larger instruction models.
- Safety behavior is limited and should be evaluated before any user-facing deployment.
Training Notes
SFT training summary from the original release:
- Total SFT training time: 2.26 minutes
- Final validation BPB: 0.3743
- Best validation BPB: 0.3743
Safety
This release is intended for research and experimentation. Before using it in any product or user-facing workflow, evaluate it in the exact language, prompt style, and deployment setting you care about.
In particular, test for:
- hallucinations and made-up facts
- repetition loops
- unsafe or offensive completions
- inconsistent instruction following
- robustness across spelling variation and code-switching
Credits
Built on top of karpathy/nanochat.
Training adaptation, data work, export, and release by Lyte.
Data sources referenced in the release process:
Lyte/darija-pretraining-corpusLyte/Moroccan-Darija-Instruct-573KGemMaroc/TULU-3-50k-darija-english
Citation
If you use this model, please cite:
@misc{nanochat_moroccan_instruct_0.7B,
author = {Lyte},
title = {Nanochat Moroccan Instruct 0.7B},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/KandirResearch/Nanochat-Moroccan-Instruct-0.7B}}
}
- Downloads last month
- 123
Model tree for KandirResearch/Nanochat-Moroccan-Instruct-0.7B
Base model
KandirResearch/Nanochat-Moroccan-Base-0.7B