Instructions to use JulianKrgd/Julian-600M-40B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use JulianKrgd/Julian-600M-40B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="JulianKrgd/Julian-600M-40B")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("JulianKrgd/Julian-600M-40B")
model = AutoModelForCausalLM.from_pretrained("JulianKrgd/Julian-600M-40B")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use JulianKrgd/Julian-600M-40B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "JulianKrgd/Julian-600M-40B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "JulianKrgd/Julian-600M-40B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/JulianKrgd/Julian-600M-40B

SGLang

How to use JulianKrgd/Julian-600M-40B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "JulianKrgd/Julian-600M-40B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "JulianKrgd/Julian-600M-40B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "JulianKrgd/Julian-600M-40B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "JulianKrgd/Julian-600M-40B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use JulianKrgd/Julian-600M-40B with Docker Model Runner:
```
docker model run hf.co/JulianKrgd/Julian-600M-40B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Julian 600M - 40B Tokens

A 600M parameter decoder-only language model trained from scratch on 39.3B tokens using JAX/Flax on Google Cloud TPUs.

Model Description

Julian is a causal language model designed for text generation, trained on a mix of English (70%) and French (30%) data. The architecture follows modern best practices with RoPE positional embeddings, SwiGLU activations, and RMSNorm.

Architecture

Component	Configuration
Parameters	599.9M
Layers	18
Hidden Size	1280
Attention Heads	16
Head Dimension	80
Intermediate Size	5120 (SwiGLU)
Vocabulary	50,000 (SentencePiece)
Context Length	2048
Positional Encoding	RoPE (θ=10000)
Normalization	RMSNorm (pre-norm)

Training Details

Metric	Value
Total Tokens	39.32B
Training Steps	300,000
Batch Size	256 (global)
Learning Rate	3e-4 → 3e-5 (cosine decay)
Hardware	TPU v5litepod-32
Framework	JAX + Flax
Precision	bfloat16
Final Loss	2.33
Final Perplexity	10.3

Training Data

Source	Proportion	Tokens
Wikipedia EN	~25%	~10B
Wikipedia FR	~10%	~4B
OSCAR (EN/FR)	~40%	~16B
The Stack (Code)	~15%	~6B
Gutenberg Books	~10%	~4B

Benchmark Results

Evaluated using lm-evaluation-harness (0-shot).

Benchmark	Score
HellaSwag	53.5%
PIQA	66.8%
LAMBADA	37.3%

Comparison with Similar Models

Model	Params	Tokens	HellaSwag	PIQA	Year
SmolLM2-135M	135M	2T	43.3%	67.4%	2025
Pythia-410M	410M	300B	40.9%	66.8%	2023
SmolLM2-360M	360M	4T	54.5%	71.7%	2025
Qwen2.5-0.5B	490M	~18T	52.1%	69.9%	2024
Julian 600M	600M	39B	53.5%	66.8%	2025
Qwen3-0.6B-Base	600M	36T	41.1%	70.0%	2025
Pythia-1B	1B	300B	49.7%	70.7%	2023
SmolLM2-1.7B	1.7B	11T	68.7%	76.9%	2025

💡 Key insight: With only 39B tokens (ratio 1:65), Julian 600M surpasses Qwen3-0.6B-Base (41.1% HellaSwag, trained on 36T tokens, same 600M params) and matches SmolLM2-360M (54.5%, trained on 4T tokens). Julian achieves this with 100-900x less training data, highlighting exceptional data efficiency.

Sources: SmolLM2 (HuggingFace, 2025), Qwen3 (Alibaba, 2025), Qwen2.5 (Alibaba, 2024), Pythia (EleutherAI, 2023)

Usage

With Transformers (after conversion)

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("JulianKrgd/julian-600m-40b")
tokenizer = AutoTokenizer.from_pretrained("JulianKrgd/julian-600m-40b")

prompt = "La France est"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

With JAX/Flax (native)

import jax
from julian_model import JulianLM, JULIAN_600M
import orbax.checkpoint as ocp

# Load model
model = JulianLM(JULIAN_600M)
checkpointer = ocp.PyTreeCheckpointer()
params = checkpointer.restore("path/to/checkpoint")

# Generate
# See src/inference/generate.py for full example

Training Curves

Loss Progression

Step	Tokens	Loss	Perplexity
0	0	10.5	36,316
50K	6.5B	3.20	24.5
100K	13.1B	2.70	14.9
150K	19.6B	2.50	12.2
200K	26.2B	2.40	11.0
250K	32.8B	2.35	10.5
300K	39.3B	2.33	10.3

Compute Budget

Metric	Value
TPU Type	v5litepod-32
TPU Hours	~120h
Total FLOPS	~2.4e19
Throughput	~1.1M tok/s
Training Time	5 days

Training Configuration

# Hyperparameters
learning_rate: 3e-4 → 3e-5 (cosine decay)
warmup_steps: 3000
batch_size: 256 (global)
sequence_length: 2048
weight_decay: 0.1
gradient_clipping: 1.0
precision: bfloat16

# Optimizer
optimizer: AdamW
beta1: 0.9
beta2: 0.95
epsilon: 1e-8

Julian Model Family

Model	Type	Training	HellaSwag	PIQA	LAMBADA	Status
julian-600m-10b	Base	10B tokens	45.8%	67.6%	35.0%	✅ Released
julian-600m-40b	Base	39B tokens	53.5%	66.8%	37.3%	✅ Current
julian-600m-10b-instruct-v0.1	SFT	10B + 185K ex	42.7%	66.2%	34.6%	✅ Released
julian-600m-40b-sft-2.5M	SFT	39B + 2.5M ex	~	~	~	🔄 Coming

Limitations

Context Length: Limited to 2048 tokens
Languages: Primarily English and French
Benchmarks: Evaluated on HellaSwag, PIQA, LAMBADA
Safety: Not instruction-tuned or safety-aligned

Citation

@misc{julian2025,
  author = {Julian Kerignard},
  title = {Julian: A 600M Parameter Language Model Trained on 40B Tokens},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/JulianKrgd/julian-600m-40b}
}

License

Apache 2.0

Acknowledgments

Google Cloud TPU Research Program for compute resources
JAX/Flax team for the excellent ML framework
Hugging Face for model hosting

Downloads last month: 1

Safetensors

Model size

0.6B params

Tensor type

BF16

Model tree for JulianKrgd/Julian-600M-40B

Quantizations

1 model

Datasets used to train JulianKrgd/Julian-600M-40B

Space using JulianKrgd/Julian-600M-40B 1

Collection including JulianKrgd/Julian-600M-40B

600M Models

Collection

600M models • 6 items • Updated Mar 2