Nemotron-Terminal-8B-MLX-4bit

MLX 4-bit quantized version of nvidia/Nemotron-Terminal-8B for Apple Silicon.

Model Details

  • Base model: nvidia/Nemotron-Terminal-8B
  • Architecture: Qwen3 (8B parameters, 36 layers)
  • Quantization: 4-bit affine (group size 64), ~4.5 bits per weight average
  • Format: MLX safetensors
  • Size: ~4.3 GB
  • Context length: 40,960 tokens

Usage

With mlx-lm

from mlx_lm import load, generate

model, tokenizer = load("DJLougen/Nemotron-Terminal-8B-MLX-4bit")
response = generate(model, tokenizer, prompt="List files in the current directory", max_tokens=256)
print(response)

With LM Studio

Download and place in your LM Studio models directory. The model will appear in the model selector.

About the Base Model

Nemotron-Terminal-8B is an NVIDIA model fine-tuned for terminal/shell command generation and code agent tasks. It excels at translating natural language instructions into shell commands and terminal workflows.

Conversion

Converted from the original HuggingFace safetensors using mlx-lm v0.30.5.

Downloads last month
62
Safetensors
Model size
1B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for DJLougen/Nemotron-Terminal-8B-MLX-4bit

Quantized
(3)
this model