SmolLM2-135M-4-bit

This repository contains a 4-bit quantized version of the HuggingFaceTB/SmolLM2-135M model, using the q4_0 quantization method from llama.cpp, stored in the GGUF file format. Quantization reduces the model's size and memory footprint while maintaining its core capabilities, making it suitable for deployment on resource-constrained environments such as edge devices, mobile platforms, or lightweight inference tasks.

Quantization Details: Base Model: HuggingFaceTB/SmolLM2-135M Quantization Method: q4_0 (4-bit) Framework Used: llama.cpp File Format: GGUF

Downloads last month
2
GGUF
Model size
0.1B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support