SmolLM2-360M-Instruct-q4f16_1-MLC
This repository contains the SmolLM2-360M-Instruct model weights converted to MLC format, quantized with q4f16_1.
Model Description
SmolLM2 is a family of compact language models optimized for on-device performance. This 360M version is designed to be extremely fast and efficient for mobile and web deployment while maintaining strong instruction-following capabilities for its size.
- Developer: Hugging Face
- Model Type: Llama-based Causal Language Model
- Parameters: 360M
- Format: MLC (Compiled for WebGPU and optimized for WebLLM)
- Quantization: q4f16_1 (4-bit quantization, float16 scaling)
Usage
This model is intended for use with WebLLM or MLC LLM.
How to use with WebLLM
const modelId = "vikramlingam/SmolLM2-360M-Instruct-q4f16_1-MLC";
// Use this ID in your WebLLM configuration
Reconversion Details
This model was converted from the official text-only HuggingFaceTB/SmolLM2-360M-Instruct source to ensure a clean text generation experience without multimodal artifacts.
- Context Window: 8192 tokens
- WASM binary included: Yes (WebGPU)
- Metadata files included:
tensor-cache.json,ndarray-cache.json
- Downloads last month
- 22
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for vikramlingam/SmolLM2-360M-Instruct-q4f16_1-MLC
Base model
HuggingFaceTB/SmolLM2-360M
Quantized
HuggingFaceTB/SmolLM2-360M-Instruct