Tower+ 9B (4-bit bitsandbytes)
This is a 4-bit quantized version of Tower+ 9B using bitsandbytes.
Model Description Original Model: https://huggingface.co/Unbabel/Tower-Plus-9B
Quantization Method: 4-bit NormalFloat (NF4)
Quantization Library: bitsandbytes
Compute Dtype: float16
This model was quantized to reduce memory usage and improve inference efficiency while maintaining high performance. It can be loaded directly with Hugging Face transformers on a GPU.
Usage You can load this model using the transformers library. Ensure you have bitsandbytes and accelerate installed.
Prerequisites
pip install transformers bitsandbytes accelerate
Loading the Model
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "lonq/Tower-Plus-9B-bnb-4bit"
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
# Example generation
input_text = "Once upon a time"
source_name = "English"
target_name = "French"
messages = [
{"role": "user", "content": f"Translate the following text from {source_name} to {target_name}.\nSource: {input_text}\nTarget:"}
]
inputs = self._tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
).to(self._model.device)
outputs = model.generate(**inputs, max_new_tokens=4096)
new_tokens = outputs[0][inputs.shape[1]:]
print(self._tokenizer.decode(new_tokens, skip_special_tokens=True).strip())
Intended Use This model is intended for efficient inference on consumer-grade GPUs or environments with limited VRAM.
- Downloads last month
- 28