YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Param2-17B-A2.4B-Thinking (4-bit)
This repository contains a 4-bit quantized version of the original model:
bharatgenai/Param2-17B-A2.4B-Thinking
The model has been quantized using bitsandbytes NF4 quantization to reduce memory usage while maintaining good inference quality.
Model Details
Base Model bharatgenai/Param2-17B-A2.4B-Thinking
Quantization 4-bit (NF4)
Compute dtype float16
Double quantization enabled
Benefits
- Significantly reduced VRAM usage
- Faster inference
- Suitable for consumer GPUs
- Maintains strong reasoning ability
Approximate VRAM Requirements
4-bit inference typically requires around 10–12 GB of VRAM.
Installation
pip install transformers accelerate bitsandbytes torch
Example Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "neuralnets/modelname-4bit"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained( model_id, device_map="auto", trust_remote_code=True )
prompt = "Explain quantum computing in simple terms."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Notes
This repository only contains quantized weights.
For the original model, visit:
https://huggingface.co/bharatgenai/Param2-17B-A2.4B-Thinking
License
Follows the license of the original model.
- Downloads last month
- 52