Gemma-2-2B Trading Summarizer (8-bit Quantized)

Model Description

This is an 8-bit quantized version of the fine-tuned Gemma-2-2B trading journal summarizer. It offers ~50% reduction in model size and memory usage with minimal quality loss.

Quantization Details

Method: bitsandbytes 8-bit quantization
Original Precision: fp16
Quantized Precision: int8
Size Reduction: ~50%
Quality Impact: Typically <2% degradation

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "./gemma-2b-trader-8bit",
    load_in_8bit=True,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("./gemma-2b-trader-8bit")

# Same usage as fp16 version

When to Use This Version

Limited GPU memory (<8GB VRAM)
Faster loading times needed
Deployment on edge devices
When inference speed is more important than marginal quality

When to Use FP16 Version

Maximum quality required
Sufficient GPU memory available
Fine-tuning or further training needed

Downloads last month: 2

Safetensors

Model size

3B params

Tensor type

F32

F16

Model tree for Wezenite/gemma-2b-trader-8bit

Base model

google/gemma-2-2b

Quantized

(64)

this model