Param2-17B-A2.4B-Thinking (8-bit)

This repository contains an 8-bit quantized version of the original model:

bharatgenai/Param2-17B-A2.4B-Thinking

The model is quantized using bitsandbytes 8-bit quantization, providing a balance between memory efficiency and model accuracy.

Model Details

Base Model bharatgenai/Param2-17B-A2.4B-Thinking

Quantization 8-bit

Compute dtype float16

Benefits

Approximate VRAM Requirements

8-bit inference typically requires around 20–24 GB of VRAM.

Installation

pip install transformers accelerate bitsandbytes torch

Example Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "neuralnets/modelname-8bit"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained( model_id, device_map="auto", trust_remote_code=True )

prompt = "Explain the theory of relativity simply."

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=200)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Notes

This repository contains quantized weights derived from the original model.

Original model repository:

License

This model follows the same license as the original model.

Safetensors

Model size

17B params

Tensor type

F32

F16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support