Welcome to the Natarajan Response Engine v1.02, an improved version of the NRE (Natarajan Response Engine Original) designed for powerful reasoning, agentic tasks, versatile developer use cases, and multilingual thinking.
This is NIT's best model yet. It is lightweight, yet extremely powerful in its own ways.
The model was trained on OpenAI's harmony response format (https://github.com/openai/harmony) as the model is based on GPT OSS 20b, the more lightweight variant of the GPT OSS series.
NIT stands for the Natarajan Intelligence Technologies Inc. Check out NatarajanAI, our AI chatbot based on Danny Avila's LibreChat.
Highlights
- Permissive Apache 2.0 license: Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment.
- Configurable reasoning effort: Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs.
- Full chain-of-thought: Gain complete access to the model’s reasoning process, facilitating easier debugging and increased trust in outputs. It’s not intended to be shown to end users.
- Fine-tunable: Fully customize models to your specific use case through parameter fine-tuning.
- Agentic capabilities: Use the model's native capabilities for function calling, [web browsing]
- MXFP4 quantization: The model was fine-tuned with MXFP4 quantization of the MoE weights, making the model run on 16 GB VRAM or lower if unsloth and quantanization is used. All evals were performed with the same MXFP4 quantization.
Inference examples
Transformers
You can use the Natarajan Response Engine v1.02 with Transformers. If you use the Transformers chat template, it will automatically apply the harmony response format. If you use model.generate directly, you need to apply the harmony format manually using the chat template or use our openai-harmony package.
To get started, install the necessary dependencies to setup your environment:
pip install -U transformers kernels torch
Once, setup you can proceed to run the model by running the snippet below:
from transformers import pipeline
import torch
model_id = "openai/gpt-oss-20b"
pipe = pipeline(
"text-generation",
model=model_id,
torch_dtype="auto",
device_map="auto",
)
messages = [
{"role": "user", "content": "Explain quantum mechanics clearly and concisely."},
]
outputs = pipe(
messages,
max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])
Alternatively, you can run the model via Transformers Serve to spin up a OpenAI-compatible webserver:
transformers serve
transformers chat localhost:8000 --model-name-or-path openai/gpt-oss-20b
vLLM
vLLM recommends using uv for Python dependency management. You can use vLLM to spin up an OpenAI-compatible webserver. The following command will automatically download the model and start the server.
uv pip install --pre vllm==0.10.1+gptoss \
--extra-index-url https://wheels.vllm.ai/gpt-oss/ \
--extra-index-url https://download.pytorch.org/whl/nightly/cu128 \
--index-strategy unsafe-best-match
vllm serve openai/gpt-oss-20b
PyTorch / Triton
Ollama
If you are trying to run gpt-oss on consumer hardware, you can use Ollama by running the following commands after installing Ollama.
# gpt-oss-20b
ollama pull gpt-oss:20b
ollama run gpt-oss:20b
LM Studio
If you are using LM Studio you can use the following commands to download.
# gpt-oss-20b
lms get openai/gpt-oss-20b
Download the model
You can download the model weights from the [Hugging Face Hub] directly from Hugging Face CLI at goodgoals/Natarajan-Response-Engine-v1.02:
huggingface-cli download goodgoals/Natarajan-Response-Engine-v1.02 --include "original/*" --local-dir goodgoals/Natarajan-Response-Engine-v1.02/
pip install gpt-oss
python -m gpt_oss.chat model/
#There are gpt-oss commands because the model is based on it
Reasoning levels
You can adjust the reasoning level that suits your task across three levels:
- Low: Fast responses for general dialogue.
- Medium: Balanced speed and detail.
- High: Deep and detailed analysis.
The reasoning level can be set in the system prompts, e.g., "Reasoning: high".
Tool use
The Natarajan Response Engine is excellent for:
- Web browsing (using built-in browsing tools)
- Function calling with defined schemas
- Agentic operations like browser tasks
- Multilingual Tasks
Fine-tuning
The Natarajan Response Engine can be fine tuned the same way gpt-oss 20b is fine tuned
Inference
Sadly, inference and cloud compute support is not here yet. But it will be added in a future model update.
- Downloads last month
- 5