YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Param2-17B-A2.4B-Thinking (8-bit)
This repository contains an 8-bit quantized version of the original model:
bharatgenai/Param2-17B-A2.4B-Thinking
The model is quantized using bitsandbytes 8-bit quantization, providing a balance between memory efficiency and model accuracy.
Model Details
Base Model bharatgenai/Param2-17B-A2.4B-Thinking
Quantization 8-bit
Compute dtype float16
Benefits
- Reduced memory usage compared to full precision
- Better accuracy than 4-bit in many tasks
- Suitable for GPUs with moderate VRAM
Approximate VRAM Requirements
8-bit inference typically requires around 20–24 GB of VRAM.
Installation
pip install transformers accelerate bitsandbytes torch
Example Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "neuralnets/modelname-8bit"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained( model_id, device_map="auto", trust_remote_code=True )
prompt = "Explain the theory of relativity simply."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Notes
This repository contains quantized weights derived from the original model.
Original model repository:
https://huggingface.co/bharatgenai/Param2-17B-A2.4B-Thinking
License
This model follows the same license as the original model.
- Downloads last month
- 17