You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Nanochat Moroccan Instruct 0.7B

Nanochat Moroccan Instruct 0.7B is a Hugging Face transformers export of a 702M-parameter nanochat model tuned for Moroccan Darija instruction following and chat. It starts from the Nanochat Moroccan Base 0.7B checkpoint and is intended for research, experimentation, and low-cost Darija assistant work.

The model is strongest when prompted in Moroccan Darija. It is not meant to be a general English-first benchmark model, and it should be evaluated primarily in the language setting it was built for.

Model Overview

Nanochat Moroccan Instruct 0.7B has the following characteristics:

  • Type: Causal language model
  • Training stage: Pretraining + supervised instruction tuning
  • Number of parameters: 701,893,188
  • Number of layers: 18
  • Embedding dimension: 1152
  • Number of attention heads: 9
  • Number of KV heads: 9
  • Context length: 2048
  • Window pattern: SSSL
  • Effective tokenizer size: 32,768

Architecture details:

  • Rotary position embeddings
  • Grouped-query attention support
  • ReLU squared MLP
  • RMSNorm without learned affine parameters
  • Untied token embedding and LM head
  • Sliding-window attention with a full-context final layer
  • No linear biases

Training Data

This model was instruction-tuned for Moroccan Darija chat and assistant-style interaction.

Instruction-tuning datasets:

  • Lyte/Moroccan-Darija-Instruct-573K
  • GemMaroc/TULU-3-50k-darija-english

Pretraining data reference:

  • Lyte/darija-pretraining-corpus

The intended domain is Moroccan Darija. It can respond outside that domain, but it was not trained as a broad English-centric assistant.

Quickstart

Use a recent transformers release and load the model with trust_remote_code=True.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "KandirResearch/Nanochat-Moroccan-Instruct-0.7B"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [{"role": "user", "content": "كيفاش نطيب الطاجين؟ وضح ليا مزيان عافاك"}]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_dict=True,
    return_tensors="pt",
)
inputs = {k: v.to(model.device) for k, v in inputs.items()}
input_len = inputs["input_ids"].shape[-1]

outputs = model.generate(
    **inputs,
    max_new_tokens=256,
    do_sample=True,
    temperature=0.6,
    top_k=200,
    top_p=0.85,
    min_p=0.01,
    repetition_penalty=1.1,
    use_cache=True,
)

generated_tokens = outputs[0][input_len:]
print(tokenizer.decode(generated_tokens, skip_special_tokens=True))
# باش طيب الطاجين خاصك غير تنقص البوطة بزاف وتستعمل العطرية كاملة، وتستف الخضرة ديالك فوق منها باش تجيك المريقة خاثرة ومغمرة.

Demo

You can try it here:

Lyte/Nanochat-Moroccco-Instruct

Example of a multi-turn conversation:

image

Inference Notes

This is a small chat model. Sampling settings matter.

  • For the optimal generation use the following parameters: temperature=0.6, top_p=0.85, min_p=0.01, top_k=200, repetition_penalty=1.1.
  • Avoid evaluating it mainly through English coding or math prompts.
  • If you see repetition, increase repetition_penalty=1.2 first and tune sampling.

Evaluation

Darija Benchmarks

The following results were obtained with the Hugging Face export and lm-evaluation-harness using the local Darija task configs.

0-shot

Benchmark Metric Score
DarijaMMLU acc 28.26
ArabicMMLU subset acc 30.45
MMLU subset acc 23.93
DarijaHellaSwag acc 27.54
DarijaHellaSwag acc_norm 29.51

3-shot

Benchmark Metric Score
DarijaMMLU acc 28.87
ArabicMMLU subset acc 30.36
MMLU subset acc 25.94
DarijaHellaSwag acc 28.09
DarijaHellaSwag acc_norm 31.12

These are modest scores, but they are consistent with a 702M model focused on Moroccan Darija instruction-following rather than broad benchmark optimization.

Original nanochat Chat Eval

These scores come from the original nanochat evaluation setup used during release.

Metric Score
ARC Easy 26.56
ARC Challenge 25.34
MMLU 24.37
GSM8K 0.08
HumanEval 0.61
SpellingBee 0.00

Intended Use

This model is a good fit for:

  • Moroccan Darija chat and prompting
  • Darija instruction-following experiments
  • Lightweight research on underrepresented language varieties
  • Fine-tuning and evaluation work in the nanochat ecosystem

It is a poor fit for:

  • high-stakes advice
  • factual applications without verification
  • code-heavy or math-heavy use cases
  • judging Darija quality through English-first benchmarks alone

Limitations

  • This is still a small model and it can repeat, drift, or hallucinate.
  • It is not reliable for medical, legal, or financial advice.
  • It is weaker on coding, formal mathematics, and broad academic QA than larger instruction models.
  • Safety behavior is limited and should be evaluated before any user-facing deployment.

Training Notes

SFT training summary from the original release:

  • Total SFT training time: 2.26 minutes
  • Final validation BPB: 0.3743
  • Best validation BPB: 0.3743

Safety

This release is intended for research and experimentation. Before using it in any product or user-facing workflow, evaluate it in the exact language, prompt style, and deployment setting you care about.

In particular, test for:

  • hallucinations and made-up facts
  • repetition loops
  • unsafe or offensive completions
  • inconsistent instruction following
  • robustness across spelling variation and code-switching

Credits

Built on top of karpathy/nanochat.

Training adaptation, data work, export, and release by Lyte.

Data sources referenced in the release process:

  • Lyte/darija-pretraining-corpus
  • Lyte/Moroccan-Darija-Instruct-573K
  • GemMaroc/TULU-3-50k-darija-english

Citation

If you use this model, please cite:

@misc{nanochat_moroccan_instruct_0.7B,
  author = {Lyte},
  title = {Nanochat Moroccan Instruct 0.7B},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/KandirResearch/Nanochat-Moroccan-Instruct-0.7B}}
}
Downloads last month
123
Safetensors
Model size
0.7B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for KandirResearch/Nanochat-Moroccan-Instruct-0.7B

Finetuned
(1)
this model

Space using KandirResearch/Nanochat-Moroccan-Instruct-0.7B 1

Collection including KandirResearch/Nanochat-Moroccan-Instruct-0.7B