Instructions to use EssentialAI/rnj-1.5-instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use EssentialAI/rnj-1.5-instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="EssentialAI/rnj-1.5-instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("EssentialAI/rnj-1.5-instruct")
model = AutoModelForCausalLM.from_pretrained("EssentialAI/rnj-1.5-instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use EssentialAI/rnj-1.5-instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "EssentialAI/rnj-1.5-instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "EssentialAI/rnj-1.5-instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/EssentialAI/rnj-1.5-instruct

SGLang

How to use EssentialAI/rnj-1.5-instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "EssentialAI/rnj-1.5-instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "EssentialAI/rnj-1.5-instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "EssentialAI/rnj-1.5-instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "EssentialAI/rnj-1.5-instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use EssentialAI/rnj-1.5-instruct with Docker Model Runner:
```
docker model run hf.co/EssentialAI/rnj-1.5-instruct
```

Rnj-1.5

EssentialAI

We introduce rnj-1.5-instruct, a long-context follow-up to rnj-1-instruct, that extends the context window from 32k to 160k. For more context and details about the Rnj-1 family, and rnj-1-instruct in particular, please see this page and our blog.

rnj-1.5-instruct extends rnj-1's long context abilities beyond 32k, scoring 77% on RULER on a 128k context window. This release also offers stronger coding abilities on a wider range of harnesses. We improve our SWE-Bench Verified performance on mini-swe-agent by 5% and we outperform, by a significant margin, the best known 8B model results on the SWE-Agent harness, achieving a 40% resolve rate.

The improvements in rnj-1.5 emerge from our work in a few key areas:

Architecture: To ease inference compute and storage costs that grow with sequence length for global self-attention, we explore block-local attention layers [5, 6] which carry fixed compute and storage cost per position. Following [4], we interleave block-local self-attention with global attention, keeping layers predominantly local and a few global layers to enhance associative interactions over long distances. After careful ablations, we discovered a local-global layer pattern LLLGLLLGLLLGLGGGGGLGLLLGLLLGLLLL, where L and G stand for block-local and global self-attention layers respectively, that lumps global layers in the middle and retains the model's global attention capabilities while gaining the latency benefits of local self-attention. Our results support the findings from GLM-5 [1].
Evals: We discovered that RULER's needle-in-a-haystack (NIAH) task has two fundamental issues. (a) The predominant focus on text-based evaluation overlooks measuring long context abilities necessary for coding tasks. (b) The needle is a foreign random string in a coherent English essay, which shifts the task from identifying an indistinguishable needle to picking the "odd one out." We thus created a granular NIAH eval called "lookback evals" from GitHub repositories. In lookback evals, the needles are semantically camouflaged within the haystack. We add a canary marker to the needles to distinguish between the model's recall from its weights versus retrieval capabilities. For granular insights on long-context abilities, we bucket performance for needle positions over (0–8k, 8k–16k, 16k–32k) distances from the end. Rnj-1.5's performance remains consistent across buckets, showcasing stronger retrieval capabilities throughout the long context.
Long context mid-training data:
1. STEM: We convert a large collection of science-focused PDF documents into text using OlmOCR 2 (which gave strongest results on OCR benchmarks) unlocking ~75B high quality tokens [3].
2. Code: Prior work [2] has shown that training on repository-level documents improves long-context code understanding. Following this approach, we concatenate files from code repositories in lexicographic order to expose the model to longer, repository-scale code contexts during mid-training. We also include repository-level fill-in-the-middle examples, using surrounding files and broader project structure to support infilling.
Long context tasks:
1. QA needles — Similar to data created in [8], we take a document, split into paragraphs, and select a range of paragraphs adding up to 2k tokens. We then ask a strong model to create a QA pair that tests the reading comprehension. Task is to generate answer given question and document. We pack the pairs.
2. Practice Problems — we create synthetic QA pairs with reasoning from practice problems in our PDFs. The context contains complete paragraphs, always ending with the page preceding the practice problem page.
3. Paragraph Swaps — Similar to permutation data created in [9], we take (two|three) paragraphs in the document and (swap them|permute them so that each paragraph is in one of the others' place). If they were put back in their proper locations, the document would read more coherently and consistently. The task is to determine which paragraphs were permuted, and (without explanation), output (the paragraph | first five words of the paragraph) that should appear (earlier|later|first|second|third) in the document.
4. Common Word Extraction (Frequency Thresholds) — Given a document (list of words), the task is to output a list of all words (longer than four characters) that occur with frequency above some threshold, sorted in alphabetical order. The threshold is chosen randomly subject to a constraint that the resulting list of words is not too long (≤ 10). Our task has some conceptual similarities to the CWE synthetic aggregation task used in [7]; however, our task does not require model prompting to generate QA pairs and is therefore simpler yet effective.
Expanded SWE trajectories: We created a synthetic task generation pipeline that allowed us to mine ~200k SWE tasks from 3500 GitHub repositories. Each task consists of the issue description, dockerized repository at failing commit, pass-to-pass, and fail-to-pass tests. Three teacher models were used to generate ~600k synthetic trajectories in the mini-swe-agent environment, including both resolved and unresolved trajectories.
Software infrastructure: For vLLM compatibility, we implement our block-local self-attention modifications in Triton.

Benchmark Results

Instruct Model `rnj-1.5-instruct`

_{SWE-Bench Verified number for rnj-1.5-instruct is reported with the SWE-Agent harness. With the mini-swe-agent harness, we report a 25% resolve rate. RULER (128k) is overall RULER score at 128k context length.}

How to use

See the Rnj 1.0 model card. You just need to replace rnj-1.0-instruct with rnj-1.5-instruct. We added support to vLLM in v0.20.0.

Known limitations

See the Rnj 1.0 model card. In addition, we optimized our model for long-context comprehension and not long-context generation. Consequently, rarely we have seen some junk tokens show up with extremely longer generations.

License

This repository and the model weights are licensed under the Apache License, Version 2.0 (Apache 2.0).

Contact

We welcome your questions and feedback. You can contact us at info@essential.ai.

References

[1] Zeng, Aohan, et al. "GLM-5: from vibe coding to agentic engineering." arXiv preprint arXiv:2602.15763 (2026).

[2] B. Hui et al., "Qwen2.5-Coder Technical Report," arXiv preprint arXiv:2409.12186, 2024. https://arxiv.org/abs/2409.12186

[3] J. Poznanski et al., "olmOCR 2: Unit Test Rewards for Document OCR," arXiv preprint arXiv:2510.19817, 2025. https://arxiv.org/abs/2510.19817

[4] I. Beltagy et al., "Longformer: The long-document transformer." arXiv preprint arXiv:2004.05150 (2020). https://arxiv.org/abs/2004.05150

[5] N. Parmar, et al. "Image transformer." International Conference on Machine Learning. PMLR, 2018.

[6] A. Vaswani, et al. "Scaling local self-attention for parameter efficient visual backbones." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.

[7] A. Ettinger et al. "Olmo 3." arXiv preprint arXiv:2512.13961 (2026). https://arxiv.org/abs/2512.13961

[8] A. Dubey, et al., "The Llama 3 Herd of Models." arXiv preprint arXiv:2407.21783 (2024). https://arxiv.org/abs/2407.21783

[9] A. Yang, et al., "Qwen2.5-1M Technical Report." arXiv preprint arXiv:2501.15383 (2025). https://arxiv.org/abs/2501.15383

Citation

@misc{rnj1_5_instruct,
  title  = {{Rnj-1-5-Instruct}},
  author = {{{Essential AI:}} Mike Callahan and Adarsh Chaluvaraju and Aleksa Gordić and Devaansh Gupta and Yash Jain and Philip Monk and Michael Pust and Tim Romanski and Peter Rushton and Ali Shehper and Divya Shivaprasad and Saurabh Srivastava and Anil Thomas and Alok Tripathy and Ameya Velingker and Ashish Vaswani},
  year   = {2026},
  url    = {https://huggingface.co/EssentialAI/rnj-1-5-instruct},
  note   = {Long-context Instruction-tuned model release}
}

Downloads last month: 164

Safetensors

Model size

8B params

Tensor type

F32

Model tree for EssentialAI/rnj-1.5-instruct

Base model

EssentialAI/rnj-1

Finetuned

(6)

this model

Collection including EssentialAI/rnj-1.5-instruct

rnj-1

Collection

6 items • Updated May 26 • 41

Papers for EssentialAI/rnj-1.5-instruct

EssentialAI
/

rnj-1.5-instruct

Rnj-1.5

Links

Benchmark Results

Instruct Model `rnj-1.5-instruct`

How to use

Known limitations

License

Contact

References

Citation

Model tree for EssentialAI/rnj-1.5-instruct

Collection including EssentialAI/rnj-1.5-instruct

rnj-1

Papers for EssentialAI/rnj-1.5-instruct

GLM-5: from Vibe Coding to Agentic Engineering

Olmo 3

olmOCR 2: Unit Test Rewards for Document OCR

Qwen2.5-1M Technical Report

Qwen2.5-Coder Technical Report

Rnj-1.5

Links

Benchmark Results

Instruct Model rnj-1.5-instruct

How to use

Known limitations

License

Contact

References

Citation

Model tree for EssentialAI/rnj-1.5-instruct

Collection including EssentialAI/rnj-1.5-instruct

Papers for EssentialAI/rnj-1.5-instruct

Instruct Model `rnj-1.5-instruct`