YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Param2-17B-A2.4B-Thinking (4-bit)

This repository contains a 4-bit quantized version of the original model:

bharatgenai/Param2-17B-A2.4B-Thinking

The model has been quantized using bitsandbytes NF4 quantization to reduce memory usage while maintaining good inference quality.

Model Details

Base Model bharatgenai/Param2-17B-A2.4B-Thinking

Quantization 4-bit (NF4)

Compute dtype float16

Double quantization enabled

Benefits

  • Significantly reduced VRAM usage
  • Faster inference
  • Suitable for consumer GPUs
  • Maintains strong reasoning ability

Approximate VRAM Requirements

4-bit inference typically requires around 10–12 GB of VRAM.

Installation

pip install transformers accelerate bitsandbytes torch

Example Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "neuralnets/modelname-4bit"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained( model_id, device_map="auto", trust_remote_code=True )

prompt = "Explain quantum computing in simple terms."

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=200)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Notes

This repository only contains quantized weights.
For the original model, visit:

https://huggingface.co/bharatgenai/Param2-17B-A2.4B-Thinking

License

Follows the license of the original model.

Downloads last month
52
Safetensors
Model size
17B params
Tensor type
F32
·
F16
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support