trainfarren/john-welbourne-csm-1b - 4BIT Quantized

This is a 4bit quantized version of trainfarren/john-welbourne-csm-1b for faster inference.

Model Description

This model has been quantized using BitsAndBytesConfig to reduce memory usage and improve inference speed while maintaining quality.

Quantization Details

  • Quantization Type: 4bit
  • Original Model: trainfarren/john-welbourne-csm-1b
  • Expected Speed Improvement: 2-4x faster inference
  • Memory Reduction: ~50-75% less VRAM usage

Usage

from transformers import CsmForConditionalGeneration, AutoProcessor, BitsAndBytesConfig
import torch

# The model is already quantized, so just load it normally
model = CsmForConditionalGeneration.from_pretrained(
    "john-welbourne-csm-1b-4bit",
    torch_dtype=torch.float16,
    device_map="auto"
)
processor = AutoProcessor.from_pretrained("john-welbourne-csm-1b-4bit")

# Use with your existing CSM inference code

Performance

Expected improvements over the original model:

  • Inference Speed: 2-4x faster
  • Memory Usage: 50-75% reduction
  • Quality: Minimal degradation

Original Model

This model is based on trainfarren/john-welbourne-csm-1b.

Downloads last month
-
Safetensors
Model size
2B params
Tensor type
F32
F16
U8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for trainfarren/john-welbourne-csm-1b-4bit

Base model

sesame/csm-1b
Finetuned
unsloth/csm-1b
Quantized
(1)
this model