Bielik-11B-v2.3-Instruct-QuIP-2bit

QuIP# (E8P12 lattice codebook) 2-bit quantization of speakleash/Bielik-11B-v2.3-Instruct.

Model Details

Attribute Value
Base model speakleash/Bielik-11B-v2.3-Instruct
Architecture Mistral (50 layers, 4096 hidden, 32 heads, 8 KV heads)
Quantization method QuIP# with E8P12 lattice codebook
Precision 2-bit weights (FP16 base)
Model size 3.26 GB (vs ~22 GB FP16, ~6.7x compression)
Calibration CulturaX-PL (512 samples, 4096 tokens each)

Evaluation (Polish LLM Leaderboard)

Evaluated on 22/23 tasks from the SpeakLeash Open PL LLM Leaderboard (eq_bench excluded due to private dataset).

Metric Score
Normalized avg (22 tasks) 61.10
FP16 baseline 65.71
Retention ~93% of FP16 quality

Full per-task results: Jakubrd4/bielik-q2-sharp

Comparison with IQ2_XXS (llama.cpp)

Metric QuIP# E8P12 IQ2_XXS FP16
Raw avg (22 tasks) 71.92 72.07 75.40
Tasks won (head-to-head) 11/22 11/22 —

QuIP# achieves parity with llama.cpp IQ2_XXS on 22 Polish benchmarks (delta -0.15%).

Usage

Requires quip-sharp for inference:

from lib.utils.unsafe_import import model_from_hf_path

model, tokenizer = model_from_hf_path(
    "Jakubrd4/Bielik-11B-v2.3-Instruct-QuIP-2bit"
)

Note: Bielik uses Mistral architecture. QuIP# expects LlamaConfig, so a patch in model_from_hf_path() is needed to convert MistralConfig to LlamaConfig (map sliding_window -> None, attention_dropout -> 0).

Related Resources

Downloads last month
18
Safetensors
Model size
0.6B params
Tensor type
I64
·
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Jakubrd4/Bielik-11B-v2.3-Instruct-QuIP-2bit

Finetuned
(7)
this model

Paper for Jakubrd4/Bielik-11B-v2.3-Instruct-QuIP-2bit