arnomatic commited on
Commit
c906a90
·
verified ·
1 Parent(s): 61c31d7

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +438 -199
README.md CHANGED
@@ -1,199 +1,438 @@
1
- ---
2
- library_name: transformers
3
- tags: []
4
- ---
5
-
6
- # Model Card for Model ID
7
-
8
- <!-- Provide a quick summary of what the model is/does. -->
9
-
10
-
11
-
12
- ## Model Details
13
-
14
- ### Model Description
15
-
16
- <!-- Provide a longer summary of what this model is. -->
17
-
18
- This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
19
-
20
- - **Developed by:** [More Information Needed]
21
- - **Funded by [optional]:** [More Information Needed]
22
- - **Shared by [optional]:** [More Information Needed]
23
- - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
- - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
27
-
28
- ### Model Sources [optional]
29
-
30
- <!-- Provide the basic links for the model. -->
31
-
32
- - **Repository:** [More Information Needed]
33
- - **Paper [optional]:** [More Information Needed]
34
- - **Demo [optional]:** [More Information Needed]
35
-
36
- ## Uses
37
-
38
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
-
40
- ### Direct Use
41
-
42
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
-
44
- [More Information Needed]
45
-
46
- ### Downstream Use [optional]
47
-
48
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
-
50
- [More Information Needed]
51
-
52
- ### Out-of-Scope Use
53
-
54
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
-
56
- [More Information Needed]
57
-
58
- ## Bias, Risks, and Limitations
59
-
60
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
-
62
- [More Information Needed]
63
-
64
- ### Recommendations
65
-
66
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
-
68
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
-
70
- ## How to Get Started with the Model
71
-
72
- Use the code below to get started with the model.
73
-
74
- [More Information Needed]
75
-
76
- ## Training Details
77
-
78
- ### Training Data
79
-
80
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
-
82
- [More Information Needed]
83
-
84
- ### Training Procedure
85
-
86
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
-
88
- #### Preprocessing [optional]
89
-
90
- [More Information Needed]
91
-
92
-
93
- #### Training Hyperparameters
94
-
95
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
-
97
- #### Speeds, Sizes, Times [optional]
98
-
99
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
-
101
- [More Information Needed]
102
-
103
- ## Evaluation
104
-
105
- <!-- This section describes the evaluation protocols and provides the results. -->
106
-
107
- ### Testing Data, Factors & Metrics
108
-
109
- #### Testing Data
110
-
111
- <!-- This should link to a Dataset Card if possible. -->
112
-
113
- [More Information Needed]
114
-
115
- #### Factors
116
-
117
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
-
119
- [More Information Needed]
120
-
121
- #### Metrics
122
-
123
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
-
125
- [More Information Needed]
126
-
127
- ### Results
128
-
129
- [More Information Needed]
130
-
131
- #### Summary
132
-
133
-
134
-
135
- ## Model Examination [optional]
136
-
137
- <!-- Relevant interpretability work for the model goes here -->
138
-
139
- [More Information Needed]
140
-
141
- ## Environmental Impact
142
-
143
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
-
145
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
-
147
- - **Hardware Type:** [More Information Needed]
148
- - **Hours used:** [More Information Needed]
149
- - **Cloud Provider:** [More Information Needed]
150
- - **Compute Region:** [More Information Needed]
151
- - **Carbon Emitted:** [More Information Needed]
152
-
153
- ## Technical Specifications [optional]
154
-
155
- ### Model Architecture and Objective
156
-
157
- [More Information Needed]
158
-
159
- ### Compute Infrastructure
160
-
161
- [More Information Needed]
162
-
163
- #### Hardware
164
-
165
- [More Information Needed]
166
-
167
- #### Software
168
-
169
- [More Information Needed]
170
-
171
- ## Citation [optional]
172
-
173
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
-
175
- **BibTeX:**
176
-
177
- [More Information Needed]
178
-
179
- **APA:**
180
-
181
- [More Information Needed]
182
-
183
- ## Glossary [optional]
184
-
185
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
-
187
- [More Information Needed]
188
-
189
- ## More Information [optional]
190
-
191
- [More Information Needed]
192
-
193
- ## Model Card Authors [optional]
194
-
195
- [More Information Needed]
196
-
197
- ## Model Card Contact
198
-
199
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: transformers
4
+ tags:
5
+ - heretic
6
+ - uncensored
7
+ - decensored
8
+ - abliterated
9
+ ---
10
+ # This is a decensored version of [EssentialAI/rnj-1-instruct](https://huggingface.co/EssentialAI/rnj-1-instruct), made using [Heretic](https://github.com/p-e-w/heretic) v1.1.0
11
+
12
+ ## Abliteration parameters
13
+
14
+ | Parameter | Value |
15
+ | :-------- | :---: |
16
+ | **direction_index** | 19.18 |
17
+ | **attn.o_proj.max_weight** | 1.41 |
18
+ | **attn.o_proj.max_weight_position** | 22.97 |
19
+ | **attn.o_proj.min_weight** | 1.37 |
20
+ | **attn.o_proj.min_weight_distance** | 15.60 |
21
+ | **mlp.down_proj.max_weight** | 1.06 |
22
+ | **mlp.down_proj.max_weight_position** | 25.18 |
23
+ | **mlp.down_proj.min_weight** | 0.47 |
24
+ | **mlp.down_proj.min_weight_distance** | 17.95 |
25
+
26
+ ## Performance
27
+
28
+ | Metric | This model | Original model ([EssentialAI/rnj-1-instruct](https://huggingface.co/EssentialAI/rnj-1-instruct)) |
29
+ | :----- | :--------: | :---------------------------: |
30
+ | **KL divergence** | 0.0689 | 0 *(by definition)* |
31
+ | **Refusals** | 11/100 | 94/100 |
32
+
33
+ -----
34
+
35
+ # Rnj-1
36
+
37
+ <p align="center">
38
+ <img src="https://raw.githubusercontent.com/Essential-AI/rnj-1-assets/refs/heads/main/assets/Essential%20Logo%20Color_Color_With%20Space.jpg" width=60% alt="EssentialAI">
39
+ </p>
40
+
41
+ <div align="center" style="line-height: 1;">
42
+
43
+ <!-- Website -->
44
+ <a href="https://essential.ai">
45
+ <img alt="Homepage"
46
+ style="vertical-align: middle;"
47
+ src="https://img.shields.io/badge/%F0%9F%8C%90%20Website-essential.ai-4b9fe1?color=4b9fe1&logoColor=white"/>
48
+ </a>
49
+
50
+ <!-- Blog / Research -->
51
+ <a href="https://www.essential.ai/research/rnj-1">
52
+ <img alt="Research Blog"
53
+ style="vertical-align: middle;"
54
+ src="https://img.shields.io/badge/🧠%20Research-rnj--1-7c5cff?color=7c5cff&logoColor=white"/>
55
+ </a>
56
+
57
+ <!-- HuggingFace -->
58
+ <a href="https://huggingface.co/collections/EssentialAI/rnj-1">
59
+ <img alt="Hugging Face"
60
+ style="vertical-align: middle;"
61
+ src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-rnj--1-ffc107?color=ffc107&logoColor=white"/>
62
+ </a>
63
+
64
+ <br>
65
+
66
+ <!-- Discord -->
67
+ <a href="https://discord.gg/VPEqUNg6tR">
68
+ <img alt="Discord"
69
+ style="vertical-align: middle;"
70
+ src="https://img.shields.io/badge/Discord-Essential%20AI-7289da?logo=discord&logoColor=white&color=7289da"/>
71
+ </a>
72
+
73
+ <!-- X / Twitter -->
74
+ <a href="https://x.com/essential_ai">
75
+ <img alt="Twitter Follow"
76
+ style="vertical-align: middle;"
77
+ src="https://img.shields.io/badge/Twitter-essential__ai-white?logo=x&logoColor=white"/>
78
+ </a>
79
+
80
+ <!-- Together AI -->
81
+ <a href="https://www.together.ai/models/rnj-1-instruct">
82
+ <img alt="Together AI"
83
+ style="vertical-align: middle;"
84
+ src="https://img.shields.io/badge/⚡%20TogetherAI-rnj--1--instruct-00c2a8?color=00c2a8&logoColor=white"/>
85
+ </a>
86
+
87
+ <!-- OpenRouter -->
88
+ <a href="https://openrouter.ai/essentialai/rnj-1-instruct">
89
+ <img alt="OpenRouter"
90
+ style="vertical-align: middle;"
91
+ src="https://img.shields.io/badge/OpenRouter-rnj--1--instruct-1a4b82?logo=openrouter&color=1a4b82&logoColor=white"/>
92
+ </a>
93
+
94
+ <br>
95
+ </div>
96
+
97
+ Rnj-1 is a family of 8B parameter open-weight, dense models trained from scratch by Essential AI, optimized for code and STEM with capabilities on par with SOTA open-weight models. These models perform well across a range of programming languages and boast strong agentic capabilities (e.g., inside agentic frameworks like mini-SWE-agent), while also excelling at tool-calling. They additionally exhibit strong capabilities in math and science. Herein, `rnj-1` refers to the base model, while `rnj-1-instruct` refers to the post-trained instruction tuned model.
98
+
99
+ # Capabilities
100
+
101
+ We evaluate Rnj-1 models against models of comparable size. In addition to accuracy, we also show the FLOPs used in pre-training for each model.
102
+
103
+ ### Benchmark Results
104
+
105
+ ### Base Model `rnj-1`
106
+
107
+ <p align="center">
108
+ <img src="https://raw.githubusercontent.com/Essential-AI/rnj-1-assets/refs/heads/main/assets/Base_Full_Table.png" width="100%" alt="Base Evals"/>
109
+ </p>
110
+
111
+ ### Instruct Model `rnj-1-instruct`
112
+
113
+ `rnj-1-instruct` is strong at code, math, and STEM tasks. It also performs well within agentic frameworks such as mini-swe-agent and has stellar tool use abilities.
114
+
115
+ <p align="center">
116
+ <img src="https://raw.githubusercontent.com/Essential-AI/rnj-1-assets/refs/heads/main/assets/Instruct_Full_Table.png" width="100%" alt="Instrcut Evals"/>
117
+
118
+ <sub><i>We report published numbers when possible, and when unavailable they are internal reproductions.
119
+ Pre-training FLOPs were estimated using 6nt, where n is the number of parameters and t is the token budget.
120
+ All Evals under the Env bucket were evaluated using mini-swe-agent (bash only) scaffolding.
121
+ GPT OSS 20B was evaluated with reasoning_effort=low.
122
+ Qwen 3 8B was evaluated with thinking turned off.</i></sub></p>
123
+
124
+ ### Rnj-1 models are designed to be extended
125
+
126
+ Both `rnj-1` and `rnj-1-instruct` models are being made available for the community to extend and build upon. We deliberately kept post-training limited to allow for further specialization by the community. As an indicator of the untapped potential of the models we report `pass@{1,2,4,8}` (with T=0.2, n=8 generations) for hard codegen, agentic, and math benchmarks on `rnj-1-instruct`. These illustrate the model’s potential for test-time scaling and for further domain-specialization. The base model is similarly capable of specialization to other domains different from our post-training if needed.
127
+
128
+ <p align="center">
129
+ <img src="https://raw.githubusercontent.com/Essential-AI/rnj-1-assets/refs/heads/main/assets/rnj-1-pass-at-k.png" width="80%" alt="Pass at k evals"/>
130
+ </p>
131
+
132
+ Sidenote: Here is a [screen recording](https://vimeo.com/1143712958/c66dda13f3?share=copy&fl=sv&fe=ci) of `rnj-1-instruct` helping us make an early version of this chart.
133
+
134
+ ### Highlights of abilities
135
+
136
+ - **Code generation:** Both `rnj-1-instruct` and `rnj-1` demonstrate strong code generation abilities as measured on tasks like HumanEval+, MBPP+, BigCodeBench, and LiveCodeBench v6. Both models compete with the strongest open weight models, sometimes outperforming even larger models such as GPT OSS 20B. We measured code comprehension abilities using the task of predicting inputs given outputs and vice-versa, Crux-IO. We find our models outperform comparable baselines. For multi-lingual code generation capabilities across programming languages we measure MultiPL-E on 6 languages (C++, TypeScript, Java, JavaScript, Shell, PHP) and we find performance close to the strongest model.
137
+ - **Agentic and Tool Use:** `rnj-1-instruct` dominates the pack on agentic coding, one of our target abilities. SWE-bench performance is indicative of the model’s ability to tackle everyday software engineering tasks. The model is an order of magnitude stronger than comparably sized models on SWE-bench and approaches the capabilities available in much larger models. It scores `20.8%` on SWE-bench Verified in bash-only mode, which is higher than Gemini 2.0 flash and Qwen2.5-Coder 32B Instruct under the same agentic framework ([leaderboard](https://www.swebench.com/bash-only.html)).<br><br>
138
+ There is a surge of interest in developing models’ abilities to write performant code. `rnj-1-instruct` is able to use a profiler to iteratively improve the performance of the code it writes. For instance, on [Enamel](https://github.com/q-rz/enamel/tree/main), which measures abilities to write efficient solutions to algorithmic problems, the model outperforms all other models under the same setting.<br><br>
139
+ Furthermore, `rnj-1-instruct` surpasses comparable models in tool use performance as measured by the Berkeley Functional Calling Leaderboard (BFCL).
140
+ - **Code Infilling** : Having specifically been trained on FIM-ed pre-training data, `rnj-1` exhibits strong infilling abilities, which have been further enhanced during post-training. The base model `rnj-1` scores highly on HE-FIM-Python (avg) at 82.49% and `rnj-1-instruct` achieves 86.21%.
141
+ - **Mathematical Problem Solving:** `rnj-1-instruct` shows strong mathematical abilities across several levels of difficulty from elementary math (GSM8k), high school and undergraduate math (Minerva-MATH), and competition math (AIME ‘24 and ‘25). On harder subjects, it outcompetes or is on par with the strongest model in the pack.
142
+ - **Scientific Reasoning:** `rnj-1-instruct` exhibits long-context reasoning abilities that are needed to solve hard science and technical questions in GPQA-Diamond and SuperGPQA.
143
+
144
+ ### Demos: Rnj-1 models generalize to unseen tasks
145
+
146
+ We show a few examples of end-to-end capabilities that are usually expected of larger models.
147
+
148
+ - **Coding assistant:** `rnj-1-instruct` can operate in agentic mode to create a playable game in a single shot inside of Cline: [screen recording](https://vimeo.com/1143853378/8df3376a1a?share=copy&fl=sv&fe=ci).
149
+ - **Agentic use:** `rnj-1-instruct` functions seamlessy within the agentic framework of mini-swe-agent. Given a task such as fixing an issue described in a pull request (PR), fixing a security vulnerability, or writing performant code, it is able to reason across its full context across multiple turns to solve the task. These lead to “trajectories” which are pairs of “Assistant” and “User” turns. Here are a few recordings that show the model’s reasoning abilities across these turns: 1) a SWE task of identifying coding convention violation: [screen recording](https://vimeo.com/1143841317/44adfbd044?share=copy&fl=sv&fe=ci), 2) fixing a security vulnerability: [screen recording](https://vimeo.com/1143843598/6fca2fe0bb?share=copy&fl=sv&fe=ci), 3) diagnosing code performance bottlenecks by running a profiler in the environment and iteratively improving the code: [screen recording](https://vimeo.com/1143828123/11e4d22ac7?share=copy&fl=sv&fe=ci).
150
+ - **Data analysis in an interactive chat:** `rnj-1-instruct` can work in interactive chat mode to solve a data analysis and visualization task: [screen recording](https://vimeo.com/1143831950/0e7d9c3edc?share=copy&fl=sv&fe=ci).
151
+
152
+ # Architecture
153
+
154
+ Rnj-1's architecture is similar to Gemma 3, except that it uses only global attention, and YaRN for long-context extension.
155
+
156
+ | Hyperparameter | Value |
157
+ |:---:|:---:|
158
+ | **Total Parameters** | 8.3B |
159
+ | **Number of Layers** | 32 |
160
+ | **Model Dimension** | 4096 |
161
+ | **MLP Dimension** | 16384 |
162
+ | **Number of Attention Heads** | 32 |
163
+ | **Number of Key-Value Heads** | 8 |
164
+ | **Attention Head Dimension** | 128 |
165
+ | **Vocabulary Size** | 128K |
166
+ | **Pretrain Context Length** | 8K |
167
+ | **Context Length** | 32K |
168
+ | **Activation Function** | GeGLU |
169
+ | **Tied Embeddings?** | Yes |
170
+
171
+ ### Training Dynamics
172
+
173
+ `rnj-1` was pre-trained on 8.4T tokens with an 8K context length, after which the model’s context window was extended to **32K** through an additional 380B-token mid-training stage. A final 150B-token SFT stage completed the training to produce `rnj-1-instruct`.
174
+
175
+ We used the Muon optimizer throughout all phases. Pre-training followed the WSD learning-rate schedule, consisting of:
176
+
177
+ - Warmup: Linear ramp-up from 0 to 2e-3 over the first 5K steps.
178
+ - Stable phase: Constant learning rate of 2e-3 from 5K → 230K steps.
179
+ - Decay: Cosine decay from 2e-3 → 2e-5 from 230K → 380K steps.
180
+ - Final stable phase: Constant 2e-5 learning rate from 380K → 443.5K steps, concluding pre-training.
181
+
182
+ Both the mid-training (context-extension phase) and SFT were trained at a fixed learning rate of 2e-5.
183
+
184
+ The global batch sizes used were:
185
+
186
+ - 18M tokens for pre-training.
187
+ - 24M tokens for mid-training.
188
+ - 16M tokens for SFT.
189
+
190
+ # Recommendations
191
+
192
+ ### Temperature
193
+
194
+ We recommend using temperatures in the range [0, 0.6] for `rnj-1-instruct`.
195
+
196
+ ### Propensity to write code
197
+
198
+ Rnj-1 models have a strong inclination to write code, even for non-code tasks. This is especially true for `rnj-1-instruct` if the system prompt is omitted. Provide an appropriate system prompt, e.g., “You are a helpful assistant”, along with global task needs to steer the model’s responses in the desired direction.
199
+
200
+ # How to use
201
+
202
+ ## Serverless API and online playgrounds
203
+
204
+ - Together.AI: Rnj-1 Instruct is available via API on the [Together.ai](http://Together.ai) model platform for serverless inference. It’s also available in the Together.ai playground for quick and easy experimentation.
205
+ - HuggingFace: Rnj-1 Instruct is also hosted via [Hugging Face Spaces](https://huggingface.co/spaces/EssentialAI/rnj-1-instruct-space).
206
+
207
+ ## Running Rnj-1 locally
208
+
209
+ ### Running Rnj-1 on your laptop with llama.cpp
210
+
211
+ The easiest way to run Rnj-1 on a laptop is via [llama.cpp](https://github.com/ggml-org/llama.cpp). A pre-quantized checkpoint is available [here](https://huggingface.co/EssentialAI/rnj-1-instruct-GGUF) as well as instructions to get started.
212
+
213
+ ### Use with transformers
214
+
215
+ Rnj-1 is supported starting from transformers `4.51.2`
216
+
217
+ 1. Example code for querying model without tools
218
+
219
+ ```python
220
+ import torch
221
+ from transformers import AutoTokenizer, AutoModelForCausalLM
222
+ import os
223
+
224
+ model_id = "EssentialAI/rnj-1-instruct"
225
+ os.environ["HF_TOKEN"] = <YOUR-HF-TOKEN>
226
+
227
+ print(f"Loading model: {model_id}...")
228
+ model = AutoModelForCausalLM.from_pretrained(
229
+ model_id,
230
+ dtype=torch.bfloat16,
231
+ device_map="auto",
232
+ )
233
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
234
+
235
+ print("Model and tokenizer loaded successfully.")
236
+
237
+ messages = [
238
+ {"role": "system", "content": "You are a helpful AI assistant."}, # Optional system message
239
+ {"role": "user", "content": "Who are you?"}
240
+ ]
241
+
242
+ input_ids = tokenizer.apply_chat_template(
243
+ messages,
244
+ add_generation_prompt=True,
245
+ return_tensors="pt"
246
+ ).to(model.device)
247
+
248
+ # --- Generate Prediction --- #
249
+ print("Generating prediction...")
250
+ output_ids = model.generate(
251
+ input_ids,
252
+ max_new_tokens=50,
253
+ pad_token_id=tokenizer.eos_token_id,
254
+ do_sample=True,
255
+ temperature=0.2,
256
+ top_p=0.95
257
+ )
258
+
259
+ response = tokenizer.decode(output_ids[0][input_ids.shape[-1]:], skip_special_tokens=True)
260
+ print(response)
261
+ ```
262
+
263
+
264
+ 1. Example code for querying with tools
265
+
266
+ Rnj-1 supports tool-calling which can be parsed by `hermes` tool-call parser. The tool calls are formatted inside `<tool_call>` and `</tool_call>` tags.
267
+ An example usage is as follows:
268
+
269
+ ```python
270
+ tools = [
271
+ {
272
+ "type": "function",
273
+ "function": {
274
+ "name": "get_weather",
275
+ "description": "Get the current weather in a given location",
276
+ "parameters": {
277
+ "type": "object",
278
+ "properties": {
279
+ "location": {"type": "string", "description": "City and state, e.g., 'San Francisco, CA'"},
280
+ "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
281
+ },
282
+ "required": ["location", "unit"],
283
+ },
284
+ },
285
+ },
286
+ ]
287
+
288
+ messages = [
289
+ {"role": "system", "content": "You are a helpful AI assistant."}, # Optional system message
290
+ {"role": "user", "content": "What is the weather in San Francisco, CA in Celsius?"}
291
+ ]
292
+
293
+ input_ids = tokenizer.apply_chat_template(
294
+ messages,
295
+ tools=tools,
296
+ add_generation_prompt=True,
297
+ return_tensors="pt"
298
+ ).to(model.device)
299
+
300
+ # --- Generate Prediction --- #
301
+ print("Generating prediction...")
302
+ output_ids = model.generate(
303
+ input_ids,
304
+ max_new_tokens=200,
305
+ pad_token_id=tokenizer.eos_token_id,
306
+ do_sample=True,
307
+ temperature=0.2,
308
+ top_p=0.95
309
+ )
310
+
311
+ response = tokenizer.decode(output_ids[0][input_ids.shape[-1]:], skip_special_tokens=False)
312
+ # NOTE: skip_special_tokens is set to False.
313
+ print(response)
314
+ ```
315
+
316
+
317
+ 1. Example code for fill-in-the-middle (FIM)
318
+
319
+ Rnj-1 supports FIM, we show an example payload to trigger FIM mode for Rnj-1 below:
320
+
321
+ ```python
322
+ PRE = "<|pre_fim|>"
323
+ MID = "<|mid_fim|>"
324
+ SUF = "<|suf_fim|>"
325
+
326
+ prefix = """def binary_search(arr, target):
327
+ lo = 0
328
+ hi = len(arr) - 1
329
+
330
+ while lo <= hi:
331
+ """
332
+
333
+ suffix = """
334
+ return -1
335
+ """
336
+
337
+ input = PRE + prefix + SUF + suffix + MID
338
+
339
+ messages = [
340
+ {"role": "system", "content": "You are a helpful AI assistant."},
341
+ {"role": "user", "content": input}
342
+ ]
343
+
344
+ input_ids = tokenizer.apply_chat_template(
345
+ messages,
346
+ tools=tools,
347
+ add_generation_prompt=True,
348
+ return_tensors="pt"
349
+ ).to(model.device)
350
+
351
+ # --- Generate Prediction --- #
352
+ print("Generating prediction...")
353
+ output_ids = model.generate(
354
+ input_ids,
355
+ max_new_tokens=100,
356
+ pad_token_id=tokenizer.eos_token_id,
357
+ do_sample=True,
358
+ temperature=0.2,
359
+ top_p=0.95
360
+ )
361
+
362
+ response = tokenizer.decode(output_ids[0][input_ids.shape[-1]:], skip_special_tokens=False)
363
+ print(response)
364
+ ```
365
+
366
+
367
+ ### Serving Rnj-1 on GPUs
368
+
369
+ ### **vLLM**
370
+
371
+ On machines that run vLLM, it’s as easy as:
372
+
373
+ ```bash
374
+ vllm serve EssentialAI/rnj-1-instruct
375
+ ```
376
+
377
+ To launch a vLLM server with tool-calling support enabled:
378
+
379
+ ```python
380
+ vllm serve EssentialAI/rnj-1-instruct --enable-auto-tool-choice --tool-call-parser hermes
381
+ ```
382
+
383
+ ### SGLang
384
+
385
+ On machines that run SGLang, it’s as easy as:
386
+
387
+ ```bash
388
+ python3 -m sglang.launch_server --model EssentialAI/rnj-1-instruct
389
+ ```
390
+
391
+ ## IDEs and Agents: Claude Code, Cline, Mini-SWE-Agent
392
+
393
+ ### Use with Cline
394
+
395
+ Rnj-1 works great with Cline, an open source AI coding agent, and is very easy to set up.
396
+
397
+ The Cline extension is available for VS Code / Cursor, JetBrains IDEs (IntelliJ, PyCharm, WebStorm, etc.) and VSCodium / Windsurf.
398
+
399
+ Simply add the Cline extension to your favorite IDE (see instructions [here](https://docs.cline.bot/getting-started/installing-cline)) and then enter the details for your Rnj-1 endpoint (instructions [here](https://docs.cline.bot/getting-started/selecting-your-model)).
400
+
401
+ ### Use with Claude Code
402
+
403
+ To use Rnj-1 with Claude Code, you can use https://github.com/musistudio/claude-code-router. Follow the instructions to set up Claude Code and Claude Code Router at https://github.com/musistudio/claude-code-router/blob/main/README.md.
404
+
405
+ ### Agentic mode with Mini-SWE-Agent
406
+
407
+ Clone the EssentialAI fork of mini-swe-agent ([github](https://github.com/Essential-AI/eai-mini-swe-agent#)). Inside the repo, run the following inside a `virtualenv`:
408
+
409
+ ```python
410
+ git checkout eai
411
+ pip install -e .
412
+ export TOGETHER_API_KEY="..." # set this to your Together.AI access key
413
+
414
+ # use EssentialAI/rnj-1-instruct to solve a performance optimization task
415
+ mini-extra perf-single [--instance <k>]
416
+ # use EssentialAI/rnj-1-instruct to resolve a SWE PR description
417
+ mini-extra swebench-single [--instance <k>]
418
+ ```
419
+
420
+ # Known limitations
421
+
422
+ ### Hallucinations and factual inaccuracies
423
+
424
+ Rnj-1 is primarily a coding and STEM model. Hence, it is not optimized for factual recovery.
425
+
426
+ ### Identity and knowledge cutoff
427
+
428
+ Rnj-1 is trained on online web data, and we have observed that it sometimes confuses its identity with other model providers. We believe this is due to a variety of reasons, including references to language models from other providers, model generated data, etc. We hope to rectify this in our follow-up release.
429
+
430
+ Additionally, Rnj-1 has not been trained or provided with a knowledge cutoff date and may therefore respond with information coming from its training data. If specifically asked for its knowledge cutoff date, the model may hallucinate a date.
431
+
432
+ # **License**
433
+
434
+ This repository and the model weights are licensed under [**the Apache License, Version 2.0 (Apache 2.0)**](https://huggingface.co/EssentialAI/rnj-1-instruct/blob/main/LICENSE).
435
+
436
+ # **Contact**
437
+
438
+ We welcome your questions and feedback. You can contact us at info@essential.ai.