lonq's picture
Update README.md
9fad644 verified
metadata
license: cc-by-nc-sa-4.0
base_model:
  - Unbabel/Tower-Plus-9B
pipeline_tag: translation

Tower+ 9B (4-bit bitsandbytes)

This is a 4-bit quantized version of Tower+ 9B using bitsandbytes.

Model Description Original Model: https://huggingface.co/Unbabel/Tower-Plus-9B

Quantization Method: 4-bit NormalFloat (NF4)

Quantization Library: bitsandbytes

Compute Dtype: float16

This model was quantized to reduce memory usage and improve inference efficiency while maintaining high performance. It can be loaded directly with Hugging Face transformers on a GPU.

Usage You can load this model using the transformers library. Ensure you have bitsandbytes and accelerate installed.

Prerequisites

pip install transformers bitsandbytes accelerate

Loading the Model

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "lonq/Tower-Plus-9B-bnb-4bit"

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_id)


# Example generation
input_text = "Once upon a time"
source_name = "English"
target_name = "French"
messages = [
                {"role": "user", "content": f"Translate the following text from {source_name} to {target_name}.\nSource: {input_text}\nTarget:"}
            ]
inputs = self._tokenizer.apply_chat_template(
            messages,
            tokenize=True,
            add_generation_prompt=True,
            return_tensors="pt"
        ).to(self._model.device)

outputs = model.generate(**inputs, max_new_tokens=4096)
new_tokens = outputs[0][inputs.shape[1]:]
print(self._tokenizer.decode(new_tokens, skip_special_tokens=True).strip())

Intended Use This model is intended for efficient inference on consumer-grade GPUs or environments with limited VRAM.