Tower+ 9B (4-bit bitsandbytes)

This is a 4-bit quantized version of Tower+ 9B using bitsandbytes.

Model Description Original Model: https://huggingface.co/Unbabel/Tower-Plus-9B

Quantization Method: 4-bit NormalFloat (NF4)

Quantization Library: bitsandbytes

Compute Dtype: float16

This model was quantized to reduce memory usage and improve inference efficiency while maintaining high performance. It can be loaded directly with Hugging Face transformers on a GPU.

Usage You can load this model using the transformers library. Ensure you have bitsandbytes and accelerate installed.

Prerequisites

pip install transformers bitsandbytes accelerate

Loading the Model

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "lonq/Tower-Plus-9B-bnb-4bit"

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_id)


# Example generation
input_text = "Once upon a time"
source_name = "English"
target_name = "French"
messages = [
                {"role": "user", "content": f"Translate the following text from {source_name} to {target_name}.\nSource: {input_text}\nTarget:"}
            ]
inputs = self._tokenizer.apply_chat_template(
            messages,
            tokenize=True,
            add_generation_prompt=True,
            return_tensors="pt"
        ).to(self._model.device)

outputs = model.generate(**inputs, max_new_tokens=4096)
new_tokens = outputs[0][inputs.shape[1]:]
print(self._tokenizer.decode(new_tokens, skip_special_tokens=True).strip())

Intended Use This model is intended for efficient inference on consumer-grade GPUs or environments with limited VRAM.

Downloads last month: 2

Safetensors

Model size

10B params

Tensor type

F32

F16

Model tree for lonq/Tower-Plus-9B-bnb-4bit

Base model

google/gemma-2-9b

Finetuned

Unbabel/Tower-Plus-9B

Quantized

(5)

this model