Nemotron-Terminal-8B-MLX-4bit
MLX 4-bit quantized version of nvidia/Nemotron-Terminal-8B for Apple Silicon.
Model Details
- Base model: nvidia/Nemotron-Terminal-8B
- Architecture: Qwen3 (8B parameters, 36 layers)
- Quantization: 4-bit affine (group size 64), ~4.5 bits per weight average
- Format: MLX safetensors
- Size: ~4.3 GB
- Context length: 40,960 tokens
Usage
With mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("DJLougen/Nemotron-Terminal-8B-MLX-4bit")
response = generate(model, tokenizer, prompt="List files in the current directory", max_tokens=256)
print(response)
With LM Studio
Download and place in your LM Studio models directory. The model will appear in the model selector.
About the Base Model
Nemotron-Terminal-8B is an NVIDIA model fine-tuned for terminal/shell command generation and code agent tasks. It excels at translating natural language instructions into shell commands and terminal workflows.
Conversion
Converted from the original HuggingFace safetensors using mlx-lm v0.30.5.
- Downloads last month
- 62
Model size
1B params
Tensor type
BF16
·
U32 ·
Hardware compatibility
Log In to add your hardware
4-bit
Model tree for DJLougen/Nemotron-Terminal-8B-MLX-4bit
Base model
nvidia/Nemotron-Terminal-8B