Instructions to use JulianKrgd/Julian-600M-40B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use JulianKrgd/Julian-600M-40B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="JulianKrgd/Julian-600M-40B")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("JulianKrgd/Julian-600M-40B") model = AutoModelForCausalLM.from_pretrained("JulianKrgd/Julian-600M-40B") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use JulianKrgd/Julian-600M-40B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "JulianKrgd/Julian-600M-40B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "JulianKrgd/Julian-600M-40B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/JulianKrgd/Julian-600M-40B
- SGLang
How to use JulianKrgd/Julian-600M-40B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "JulianKrgd/Julian-600M-40B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "JulianKrgd/Julian-600M-40B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "JulianKrgd/Julian-600M-40B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "JulianKrgd/Julian-600M-40B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use JulianKrgd/Julian-600M-40B with Docker Model Runner:
docker model run hf.co/JulianKrgd/Julian-600M-40B
Julian 600M - 40B Tokens
A 600M parameter decoder-only language model trained from scratch on 39.3B tokens using JAX/Flax on Google Cloud TPUs.
Model Description
Julian is a causal language model designed for text generation, trained on a mix of English (70%) and French (30%) data. The architecture follows modern best practices with RoPE positional embeddings, SwiGLU activations, and RMSNorm.
Architecture
| Component | Configuration |
|---|---|
| Parameters | 599.9M |
| Layers | 18 |
| Hidden Size | 1280 |
| Attention Heads | 16 |
| Head Dimension | 80 |
| Intermediate Size | 5120 (SwiGLU) |
| Vocabulary | 50,000 (SentencePiece) |
| Context Length | 2048 |
| Positional Encoding | RoPE (θ=10000) |
| Normalization | RMSNorm (pre-norm) |
Training Details
| Metric | Value |
|---|---|
| Total Tokens | 39.32B |
| Training Steps | 300,000 |
| Batch Size | 256 (global) |
| Learning Rate | 3e-4 → 3e-5 (cosine decay) |
| Hardware | TPU v5litepod-32 |
| Framework | JAX + Flax |
| Precision | bfloat16 |
| Final Loss | 2.33 |
| Final Perplexity | 10.3 |
Training Data
| Source | Proportion | Tokens |
|---|---|---|
| Wikipedia EN | ~25% | ~10B |
| Wikipedia FR | ~10% | ~4B |
| OSCAR (EN/FR) | ~40% | ~16B |
| The Stack (Code) | ~15% | ~6B |
| Gutenberg Books | ~10% | ~4B |
Benchmark Results
Evaluated using lm-evaluation-harness (0-shot).
| Benchmark | Score |
|---|---|
| HellaSwag | 53.5% |
| PIQA | 66.8% |
| LAMBADA | 37.3% |
Comparison with Similar Models
| Model | Params | Tokens | HellaSwag | PIQA | Year |
|---|---|---|---|---|---|
| SmolLM2-135M | 135M | 2T | 43.3% | 67.4% | 2025 |
| Pythia-410M | 410M | 300B | 40.9% | 66.8% | 2023 |
| SmolLM2-360M | 360M | 4T | 54.5% | 71.7% | 2025 |
| Qwen2.5-0.5B | 490M | ~18T | 52.1% | 69.9% | 2024 |
| Julian 600M | 600M | 39B | 53.5% | 66.8% | 2025 |
| Qwen3-0.6B-Base | 600M | 36T | 41.1% | 70.0% | 2025 |
| Pythia-1B | 1B | 300B | 49.7% | 70.7% | 2023 |
| SmolLM2-1.7B | 1.7B | 11T | 68.7% | 76.9% | 2025 |
💡 Key insight: With only 39B tokens (ratio 1:65), Julian 600M surpasses Qwen3-0.6B-Base (41.1% HellaSwag, trained on 36T tokens, same 600M params) and matches SmolLM2-360M (54.5%, trained on 4T tokens). Julian achieves this with 100-900x less training data, highlighting exceptional data efficiency.
Sources: SmolLM2 (HuggingFace, 2025), Qwen3 (Alibaba, 2025), Qwen2.5 (Alibaba, 2024), Pythia (EleutherAI, 2023)
Usage
With Transformers (after conversion)
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("JulianKrgd/julian-600m-40b")
tokenizer = AutoTokenizer.from_pretrained("JulianKrgd/julian-600m-40b")
prompt = "La France est"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
With JAX/Flax (native)
import jax
from julian_model import JulianLM, JULIAN_600M
import orbax.checkpoint as ocp
# Load model
model = JulianLM(JULIAN_600M)
checkpointer = ocp.PyTreeCheckpointer()
params = checkpointer.restore("path/to/checkpoint")
# Generate
# See src/inference/generate.py for full example
Training Curves
Loss Progression
| Step | Tokens | Loss | Perplexity |
|---|---|---|---|
| 0 | 0 | 10.5 | 36,316 |
| 50K | 6.5B | 3.20 | 24.5 |
| 100K | 13.1B | 2.70 | 14.9 |
| 150K | 19.6B | 2.50 | 12.2 |
| 200K | 26.2B | 2.40 | 11.0 |
| 250K | 32.8B | 2.35 | 10.5 |
| 300K | 39.3B | 2.33 | 10.3 |
Compute Budget
| Metric | Value |
|---|---|
| TPU Type | v5litepod-32 |
| TPU Hours | ~120h |
| Total FLOPS | ~2.4e19 |
| Throughput | ~1.1M tok/s |
| Training Time | 5 days |
Training Configuration
# Hyperparameters
learning_rate: 3e-4 → 3e-5 (cosine decay)
warmup_steps: 3000
batch_size: 256 (global)
sequence_length: 2048
weight_decay: 0.1
gradient_clipping: 1.0
precision: bfloat16
# Optimizer
optimizer: AdamW
beta1: 0.9
beta2: 0.95
epsilon: 1e-8
Julian Model Family
| Model | Type | Training | HellaSwag | PIQA | LAMBADA | Status |
|---|---|---|---|---|---|---|
| julian-600m-10b | Base | 10B tokens | 45.8% | 67.6% | 35.0% | ✅ Released |
| julian-600m-40b | Base | 39B tokens | 53.5% | 66.8% | 37.3% | ✅ Current |
| julian-600m-10b-instruct-v0.1 | SFT | 10B + 185K ex | 42.7% | 66.2% | 34.6% | ✅ Released |
| julian-600m-40b-sft-2.5M | SFT | 39B + 2.5M ex | ~ | ~ | ~ | 🔄 Coming |
Limitations
- Context Length: Limited to 2048 tokens
- Languages: Primarily English and French
- Benchmarks: Evaluated on HellaSwag, PIQA, LAMBADA
- Safety: Not instruction-tuned or safety-aligned
Citation
@misc{julian2025,
author = {Julian Kerignard},
title = {Julian: A 600M Parameter Language Model Trained on 40B Tokens},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/JulianKrgd/julian-600m-40b}
}
License
Apache 2.0
Acknowledgments
- Google Cloud TPU Research Program for compute resources
- JAX/Flax team for the excellent ML framework
- Hugging Face for model hosting
- Downloads last month
- 3