JULIAN-100M-Instruct
Instruction-tuned version of JULIAN-100M - A 109M parameter bilingual (English/French) language model fine-tuned for instruction following.
Model Description
JULIAN-100M-Instruct is the instruction-tuned version of JULIAN-100M, fine-tuned on ~143K instruction-response pairs in English and French.
| Attribute | Value |
|---|---|
| Parameters | 109 million |
| Architecture | GPT-style decoder-only transformer |
| Context Length | 2048 tokens |
| Languages | English (65%), French (35%) |
| Training | TPU v5e-32 with JAX/Flax |
| Fine-tuning | 3000 steps, ~2.7 epochs |
Training Data
Fine-tuned on 142,812 instruction-response pairs:
| Dataset | Examples | Language |
|---|---|---|
| Alpaca | 49,613 | English |
| Code Alpaca | 19,520 | English |
| Dolly | 14,537 | English |
| GPT4All | 9,370 | English |
| Alpaca French | 49,772 | French |
Usage
With JAX/Flax
import jax.numpy as jnp
from huggingface_hub import hf_hub_download
import sentencepiece as spm
import numpy as np
# Load tokenizer
tokenizer_path = hf_hub_download(
repo_id="JulianKrgd/JULIAN-100M-Instruct",
filename="tokenizer/julian_24k.model"
)
tokenizer = spm.SentencePieceProcessor()
tokenizer.Load(tokenizer_path)
# Load checkpoint
checkpoint_path = hf_hub_download(
repo_id="JulianKrgd/JULIAN-100M-Instruct",
filename="checkpoint_instruct.npz"
)
# Format prompt (ChatML)
prompt = """<|im_start|>user
What is machine learning?<|im_end|>
<|im_start|>assistant
"""
# Tokenize and generate...
Prompt Format
The model uses ChatML format:
<|im_start|>user
Your question here<|im_end|>
<|im_start|>assistant
Limitations
This is a small educational model with significant limitations:
- Factual errors: May generate incorrect information (e.g., "Paris is in New York")
- Repetitions: Tends to repeat phrases in loops
- Limited knowledge: 100M parameters cannot store much factual knowledge
- Hallucinations: Confidently generates false statements
- Not for production: This is a demonstration model, not a reliable assistant
Recommended minimum for usable responses: 7B+ parameters
Training Details
- Base model: JULIAN-100M (trained on 4.45B Wikipedia tokens)
- Fine-tuning: Supervised Fine-Tuning (SFT) with loss masking
- Loss masking: Only computed loss on assistant responses
- Learning rate: 2e-5 with warmup
- Batch size: 128 global
- Hardware: Google Cloud TPU v5e-32 (32 cores)
- Training time: ~17 minutes
Model Architecture
- Vocabulary: 24,000 tokens (SentencePiece BPE)
- Embedding dim: 640
- Layers: 12
- Attention heads: 10
- MLP dim: 2560 (4x)
- Techniques: RMSNorm, RoPE, SwiGLU, bfloat16
Files
checkpoint_instruct.npz- Model weights (NPZ format)tokenizer/julian_24k.model- SentencePiece tokenizersrc/model/- Model code (JAX/Flax)
Citation
@misc{julian-100m-instruct,
author = {Julian Kerignard},
title = {JULIAN-100M-Instruct: A Small Bilingual Instruction-Tuned Language Model},
year = {2025},
publisher = {HuggingFace},
url = {https://huggingface.co/JulianKrgd/JULIAN-100M-Instruct}
}
Acknowledgments
- Trained with Google TPU Research Cloud (TRC) program
- Based on instruction datasets from the open-source community
License
MIT License
Model tree for JulianKrgd/JULIAN-100M-Instruct
Base model
JulianKrgd/JULIAN-100M