SmolLM2-135M-4-bit
This repository contains a 4-bit quantized version of the HuggingFaceTB/SmolLM2-135M model, using the q4_0 quantization method from llama.cpp, stored in the GGUF file format. Quantization reduces the model's size and memory footprint while maintaining its core capabilities, making it suitable for deployment on resource-constrained environments such as edge devices, mobile platforms, or lightweight inference tasks.
Quantization Details: Base Model: HuggingFaceTB/SmolLM2-135M Quantization Method: q4_0 (4-bit) Framework Used: llama.cpp File Format: GGUF
- Downloads last month
- 2
Hardware compatibility
Log In to add your hardware
4-bit
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support