JULIAN-100M-Instruct

Instruction-tuned version of JULIAN-100M - A 109M parameter bilingual (English/French) language model fine-tuned for instruction following.

Model Description

JULIAN-100M-Instruct is the instruction-tuned version of JULIAN-100M, fine-tuned on ~143K instruction-response pairs in English and French.

Attribute Value
Parameters 109 million
Architecture GPT-style decoder-only transformer
Context Length 2048 tokens
Languages English (65%), French (35%)
Training TPU v5e-32 with JAX/Flax
Fine-tuning 3000 steps, ~2.7 epochs

Training Data

Fine-tuned on 142,812 instruction-response pairs:

Dataset Examples Language
Alpaca 49,613 English
Code Alpaca 19,520 English
Dolly 14,537 English
GPT4All 9,370 English
Alpaca French 49,772 French

Usage

With JAX/Flax

import jax.numpy as jnp
from huggingface_hub import hf_hub_download
import sentencepiece as spm
import numpy as np

# Load tokenizer
tokenizer_path = hf_hub_download(
    repo_id="JulianKrgd/JULIAN-100M-Instruct",
    filename="tokenizer/julian_24k.model"
)
tokenizer = spm.SentencePieceProcessor()
tokenizer.Load(tokenizer_path)

# Load checkpoint
checkpoint_path = hf_hub_download(
    repo_id="JulianKrgd/JULIAN-100M-Instruct",
    filename="checkpoint_instruct.npz"
)

# Format prompt (ChatML)
prompt = """<|im_start|>user
What is machine learning?<|im_end|>
<|im_start|>assistant
"""

# Tokenize and generate...

Prompt Format

The model uses ChatML format:

<|im_start|>user
Your question here<|im_end|>
<|im_start|>assistant

Limitations

This is a small educational model with significant limitations:

  • Factual errors: May generate incorrect information (e.g., "Paris is in New York")
  • Repetitions: Tends to repeat phrases in loops
  • Limited knowledge: 100M parameters cannot store much factual knowledge
  • Hallucinations: Confidently generates false statements
  • Not for production: This is a demonstration model, not a reliable assistant

Recommended minimum for usable responses: 7B+ parameters

Training Details

  • Base model: JULIAN-100M (trained on 4.45B Wikipedia tokens)
  • Fine-tuning: Supervised Fine-Tuning (SFT) with loss masking
  • Loss masking: Only computed loss on assistant responses
  • Learning rate: 2e-5 with warmup
  • Batch size: 128 global
  • Hardware: Google Cloud TPU v5e-32 (32 cores)
  • Training time: ~17 minutes

Model Architecture

- Vocabulary: 24,000 tokens (SentencePiece BPE)
- Embedding dim: 640
- Layers: 12
- Attention heads: 10
- MLP dim: 2560 (4x)
- Techniques: RMSNorm, RoPE, SwiGLU, bfloat16

Files

  • checkpoint_instruct.npz - Model weights (NPZ format)
  • tokenizer/julian_24k.model - SentencePiece tokenizer
  • src/model/ - Model code (JAX/Flax)

Citation

@misc{julian-100m-instruct,
  author = {Julian Kerignard},
  title = {JULIAN-100M-Instruct: A Small Bilingual Instruction-Tuned Language Model},
  year = {2025},
  publisher = {HuggingFace},
  url = {https://huggingface.co/JulianKrgd/JULIAN-100M-Instruct}
}

Acknowledgments

  • Trained with Google TPU Research Cloud (TRC) program
  • Based on instruction datasets from the open-source community

License

MIT License

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for JulianKrgd/JULIAN-100M-Instruct

Finetuned
(1)
this model

Datasets used to train JulianKrgd/JULIAN-100M-Instruct