YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Param2-17B-A2.4B-Thinking (8-bit)

This repository contains an 8-bit quantized version of the original model:

bharatgenai/Param2-17B-A2.4B-Thinking

The model is quantized using bitsandbytes 8-bit quantization, providing a balance between memory efficiency and model accuracy.

Model Details

Base Model bharatgenai/Param2-17B-A2.4B-Thinking

Quantization 8-bit

Compute dtype float16

Benefits

  • Reduced memory usage compared to full precision
  • Better accuracy than 4-bit in many tasks
  • Suitable for GPUs with moderate VRAM

Approximate VRAM Requirements

8-bit inference typically requires around 20–24 GB of VRAM.

Installation

pip install transformers accelerate bitsandbytes torch

Example Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "neuralnets/modelname-8bit"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained( model_id, device_map="auto", trust_remote_code=True )

prompt = "Explain the theory of relativity simply."

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=200)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Notes

This repository contains quantized weights derived from the original model.

Original model repository:

https://huggingface.co/bharatgenai/Param2-17B-A2.4B-Thinking

License

This model follows the same license as the original model.

Downloads last month
17
Safetensors
Model size
17B params
Tensor type
F32
·
F16
·
I8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support