nanochat-darija-73m-instruct

Instruction-tuned NanoChat causal language model for Moroccan Darija.

This repo is exported in Hugging Face Transformers format with custom model code. Load it with trust_remote_code=True.

Preview Checkpoint Notice

This is a pilot/test checkpoint, not the final full-data model. It was trained to validate the Darija data pipeline, tokenizer, NanoChat architecture export, and SFT workflow before a larger billion-plus-token training run.

The cleaned base corpus contains 5M Darija rows and approximately 2B tokens with the included tokenizer. That number describes the available cleaned corpus; this checkpoint was intentionally trained on a much smaller/shorter schedule.

Model Details

  • Parameters: 73.5M (73,531,538)
  • Context length: 2048
  • Vocab size: 32768
  • Layers: 6
  • Hidden size: 384
  • Attention heads: 3
  • Checkpoint tag: d6_target12
  • Checkpoint step: 10000
  • Export dtype: bfloat16
  • Base checkpoint: Lyte/nanochat-darija-73m-base

Training

Continued with supervised fine-tuning on Moroccan Darija instruction data.

The instruction-tuned variant is small and experimental. It is useful for lightweight Darija chat tests, but it is not reliable for math, factuality, code debugging, translation fidelity, or safety-critical decisions.

Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "Lyte/nanochat-darija-73m-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [{"role": "user", "content": "جاوبني بالدارجة: شنو هي أحسن طريقة نتعلم بها البرمجة؟"}]
inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
if not hasattr(inputs, "shape"):
    inputs = inputs["input_ids"]
outputs = model.generate(
    inputs,
    max_new_tokens=256,
    temperature=0.3,
    top_k=300,
    top_p=0.95,
    repetition_penalty=1.1,
    do_sample=True,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=False))

Files

  • model.safetensors: model weights
  • config.json: NanoChat architecture config
  • generation_config.json: default sampling config
  • tokenizer.json, tokenizer_config.json, special_tokens_map.json: tokenizer files
  • configuration_nanochat.py, modeling_nanochat.py: custom Transformers code
  • nanochat_export.json: source checkpoint metadata

Limitations

This is a tiny model. Expect fluent-looking but wrong answers, repetition on some prompts, and brittle instruction following. Use it as a research artifact or local baseline, not as a production assistant.

Downloads last month
34
Safetensors
Model size
73.5M params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Lyte/nanochat-darija-73m-instruct

Finetuned
(1)
this model