Mistral-7B-Instruct-v0.3 β€” GGUF (IQ4_NL)


πŸ“Š Performance Metrics

  • Hardware: Intel(R) Xeon(R) CPU @ 2.20GHz (4 vCPUs)
  • Size: 3.87 GB
  • Speed (Generation): 3.12 tokens/sec
  • Speed (Prompt): 5.74 tokens/sec
  • KV Cache Usage: 0.0143 GB
  • Quantization: IQ4_NL

πŸ”· Model Overview

This repository contains a GGUF quantized version of:

  • Base Model: Mistral-7B-Instruct-v0.3
  • Format: GGUF (optimized for llama.cpp inference)
  • Precision: IQ4_NL
  • Efficiency Score: 0.8054 (TPS/GB)

GGUF format provides:

  • Fast loading via memory mapping
  • Single-file model distribution
  • Cross-platform compatibility
  • Efficient inference with llama.cpp

πŸ“¦ Files

File Description
Mistral-7B-Instruct-v0.3-IQ4_NL.gguf Quantized GGUF model file

βš™οΈ Technical Details

Parameter Value
Architecture Mistral-7B-Instruct-v0.3
Format GGUF
Precision IQ4_NL
Runtime llama.cpp
Benchmark Hardware Intel(R) Xeon(R) CPU @ 2.20GHz (4 vCPUs)
Context Latency 57.19s
Memory (KV) 0.0143 GB

⚑ Why GGUF?

GGUF is designed for efficient inference:

  • Optimized for llama.cpp
  • Supports CPU and GPU inference
  • Single-file deployment
  • Memory-mapped loading for speed
  • Ideal for edge / local environments

⚠️ License & Usage

This is a converted derivative model.

  • You must comply with the original model license of Mistral-7B-Instruct-v0.3
  • This is not an official release
  • No additional rights are granted
  • Original ownership remains with the base model creator

πŸš€ Quick Start (llama.cpp)

./llama-cli -m Mistral-7B-Instruct-v0.3-IQ4_NL.gguf -p "Explain AI simply"
Downloads last month
10
GGUF
Model size
7B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support