SmolLM2-135M-4-bit

This repository contains a 4-bit quantized version of the HuggingFaceTB/SmolLM2-135M model, using the q4_0 quantization method from llama.cpp, stored in the GGUF file format. Quantization reduces the model's size and memory footprint while maintaining its core capabilities, making it suitable for deployment on resource-constrained environments such as edge devices, mobile platforms, or lightweight inference tasks.

Quantization Details: Base Model: HuggingFaceTB/SmolLM2-135M Quantization Method: q4_0 (4-bit) Framework Used: llama.cpp File Format: GGUF

Downloads last month: 2

GGUF

Model size

0.1B params

Architecture

llama

Hardware compatibility

4-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support