Jan-code-4b (Quantized)

Description

This model is a 4-bit quantized version of the original janhq/Jan-code-4b model, optimized for reduced memory usage while maintaining performance.

Quantization Details

  • Quantization Type: 4-bit
  • bnb_4bit_quant_type: nf4
  • bnb_4bit_use_double_quant: True
  • bnb_4bit_compute_dtype: bfloat16
  • bnb_4bit_quant_storage: uint8
  • Original Footprint: 8044.94 MB (BFLOAT16)
  • Quantized Footprint: 3372.88 MB (UINT8)
  • Memory Reduction: 58.1%

Usage

from transformers import AutoModel, AutoTokenizer

model_name = "Jan-code-4b-bnb-4bit-nf4"
model = AutoModel.from_pretrained(
    "manu02/Jan-code-4b-bnb-4bit-nf4",
)
tokenizer = AutoTokenizer.from_pretrained("manu02/Jan-code-4b-bnb-4bit-nf4", use_fast=True)
Downloads last month
170
Safetensors
Model size
5B params
Tensor type
F32
·
BF16
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for manu02/Jan-code-4b-bnb-4bit-nf4-dq

Finetuned
janhq/Jan-code-4b
Quantized
(10)
this model