trainfarren/john-welbourne-csm-1b - 4BIT Quantized

Inference Speed: 2-4x faster
Memory Usage: 50-75% reduction
Quality: Minimal degradation

This is a 4bit quantized version of trainfarren/john-welbourne-csm-1b for faster inference.

Model Description

This model has been quantized using BitsAndBytesConfig to reduce memory usage and improve inference speed while maintaining quality.

Quantization Details

Quantization Type: 4bit
Original Model: trainfarren/john-welbourne-csm-1b
Expected Speed Improvement: 2-4x faster inference
Memory Reduction: ~50-75% less VRAM usage

Usage

from transformers import CsmForConditionalGeneration, AutoProcessor, BitsAndBytesConfig
import torch

# The model is already quantized, so just load it normally
model = CsmForConditionalGeneration.from_pretrained(
    "john-welbourne-csm-1b-4bit",
    torch_dtype=torch.float16,
    device_map="auto"
)
processor = AutoProcessor.from_pretrained("john-welbourne-csm-1b-4bit")

# Use with your existing CSM inference code