You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Nanochat Moroccan Instruct 0.7B

Nanochat Moroccan Instruct 0.7B is a Hugging Face transformers export of a 702M-parameter nanochat model tuned for Moroccan Darija instruction following and chat. It starts from the Nanochat Moroccan Base 0.7B checkpoint and is intended for research, experimentation, and low-cost Darija assistant work.

The model is strongest when prompted in Moroccan Darija. It is not meant to be a general English-first benchmark model, and it should be evaluated primarily in the language setting it was built for.

Model Overview

Nanochat Moroccan Instruct 0.7B has the following characteristics:

Type: Causal language model
Training stage: Pretraining + supervised instruction tuning
Number of parameters: 701,893,188
Number of layers: 18
Embedding dimension: 1152
Number of attention heads: 9
Number of KV heads: 9
Context length: 2048
Window pattern: SSSL
Effective tokenizer size: 32,768

Architecture details:

Rotary position embeddings
Grouped-query attention support
ReLU squared MLP
RMSNorm without learned affine parameters
Untied token embedding and LM head
Sliding-window attention with a full-context final layer
No linear biases

Training Data

This model was instruction-tuned for Moroccan Darija chat and assistant-style interaction.

Instruction-tuning datasets:

Lyte/Moroccan-Darija-Instruct-573K
GemMaroc/TULU-3-50k-darija-english

Pretraining data reference:

Lyte/darija-pretraining-corpus

The intended domain is Moroccan Darija. It can respond outside that domain, but it was not trained as a broad English-centric assistant.

Quickstart

Use a recent transformers release and load the model with trust_remote_code=True.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "KandirResearch/Nanochat-Moroccan-Instruct-0.7B"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [{"role": "user", "content": "كيفاش نطيب الطاجين؟ وضح ليا مزيان عافاك"}]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_dict=True,
    return_tensors="pt",
)
inputs = {k: v.to(model.device) for k, v in inputs.items()}
input_len = inputs["input_ids"].shape[-1]

outputs = model.generate(
    **inputs,
    max_new_tokens=256,
    do_sample=True,
    temperature=0.6,
    top_k=200,
    top_p=0.85,
    min_p=0.01,
    repetition_penalty=1.1,
    use_cache=True,
)

generated_tokens = outputs[0][input_len:]
print(tokenizer.decode(generated_tokens, skip_special_tokens=True))
# باش طيب الطاجين خاصك غير تنقص البوطة بزاف وتستعمل العطرية كاملة، وتستف الخضرة ديالك فوق منها باش تجيك المريقة خاثرة ومغمرة.

Demo

You can try it here:

Lyte/Nanochat-Moroccco-Instruct

Example of a multi-turn conversation:

Inference Notes

This is a small chat model. Sampling settings matter.

For the optimal generation use the following parameters: temperature=0.6, top_p=0.85, min_p=0.01, top_k=200, repetition_penalty=1.1.
Avoid evaluating it mainly through English coding or math prompts.
If you see repetition, increase repetition_penalty=1.2 first and tune sampling.

Evaluation

Darija Benchmarks

The following results were obtained with the Hugging Face export and lm-evaluation-harness using the local Darija task configs.

0-shot

Benchmark	Metric	Score
DarijaMMLU	acc	28.26
ArabicMMLU subset	acc	30.45
MMLU subset	acc	23.93
DarijaHellaSwag	acc	27.54
DarijaHellaSwag	acc_norm	29.51

3-shot

Benchmark	Metric	Score
DarijaMMLU	acc	28.87
ArabicMMLU subset	acc	30.36
MMLU subset	acc	25.94
DarijaHellaSwag	acc	28.09
DarijaHellaSwag	acc_norm	31.12

These are modest scores, but they are consistent with a 702M model focused on Moroccan Darija instruction-following rather than broad benchmark optimization.

Original nanochat Chat Eval

These scores come from the original nanochat evaluation setup used during release.

Metric	Score
ARC Easy	26.56
ARC Challenge	25.34
MMLU	24.37
GSM8K	0.08
HumanEval	0.61
SpellingBee	0.00

Intended Use

This model is a good fit for:

Moroccan Darija chat and prompting
Darija instruction-following experiments
Lightweight research on underrepresented language varieties
Fine-tuning and evaluation work in the nanochat ecosystem

It is a poor fit for:

high-stakes advice
factual applications without verification
code-heavy or math-heavy use cases
judging Darija quality through English-first benchmarks alone

Limitations

This is still a small model and it can repeat, drift, or hallucinate.
It is not reliable for medical, legal, or financial advice.
It is weaker on coding, formal mathematics, and broad academic QA than larger instruction models.
Safety behavior is limited and should be evaluated before any user-facing deployment.

Training Notes

SFT training summary from the original release:

Total SFT training time: 2.26 minutes
Final validation BPB: 0.3743
Best validation BPB: 0.3743

Safety

This release is intended for research and experimentation. Before using it in any product or user-facing workflow, evaluate it in the exact language, prompt style, and deployment setting you care about.

In particular, test for:

hallucinations and made-up facts
repetition loops
unsafe or offensive completions
inconsistent instruction following
robustness across spelling variation and code-switching

Credits

Built on top of karpathy/nanochat.

Training adaptation, data work, export, and release by Lyte.

Data sources referenced in the release process:

Lyte/darija-pretraining-corpus
Lyte/Moroccan-Darija-Instruct-573K
GemMaroc/TULU-3-50k-darija-english

Citation

If you use this model, please cite:

@misc{nanochat_moroccan_instruct_0.7B,
  author = {Lyte},
  title = {Nanochat Moroccan Instruct 0.7B},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/KandirResearch/Nanochat-Moroccan-Instruct-0.7B}}
}