Nemotron-Terminal-8B-MLX-4bit

MLX 4-bit quantized version of nvidia/Nemotron-Terminal-8B for Apple Silicon.

Model Details

Base model: nvidia/Nemotron-Terminal-8B
Architecture: Qwen3 (8B parameters, 36 layers)
Quantization: 4-bit affine (group size 64), ~4.5 bits per weight average
Format: MLX safetensors
Size: ~4.3 GB
Context length: 40,960 tokens

Usage

With mlx-lm

from mlx_lm import load, generate

model, tokenizer = load("DJLougen/Nemotron-Terminal-8B-MLX-4bit")
response = generate(model, tokenizer, prompt="List files in the current directory", max_tokens=256)
print(response)

With LM Studio

Download and place in your LM Studio models directory. The model will appear in the model selector.

About the Base Model

Nemotron-Terminal-8B is an NVIDIA model fine-tuned for terminal/shell command generation and code agent tasks. It excels at translating natural language instructions into shell commands and terminal workflows.

Conversion

Converted from the original HuggingFace safetensors using mlx-lm v0.30.5.

Downloads last month: 62

Safetensors

Model size

1B params

Tensor type

BF16

U32

MLX

Hardware compatibility

4-bit

Model tree for DJLougen/Nemotron-Terminal-8B-MLX-4bit

Base model

nvidia/Nemotron-Terminal-8B

Quantized

(3)

this model