SmolLM2-360M-Instruct-q4f16_1-MLC

This repository contains the SmolLM2-360M-Instruct model weights converted to MLC format, quantized with q4f16_1.

Model Description

SmolLM2 is a family of compact language models optimized for on-device performance. This 360M version is designed to be extremely fast and efficient for mobile and web deployment while maintaining strong instruction-following capabilities for its size.

Developer: Hugging Face
Model Type: Llama-based Causal Language Model
Parameters: 360M
Format: MLC (Compiled for WebGPU and optimized for WebLLM)
Quantization: q4f16_1 (4-bit quantization, float16 scaling)

Usage

This model is intended for use with WebLLM or MLC LLM.

How to use with WebLLM

const modelId = "vikramlingam/SmolLM2-360M-Instruct-q4f16_1-MLC";
// Use this ID in your WebLLM configuration

Reconversion Details

This model was converted from the official text-only HuggingFaceTB/SmolLM2-360M-Instruct source to ensure a clean text generation experience without multimodal artifacts.

Context Window: 8192 tokens
WASM binary included: Yes (WebGPU)
Metadata files included: tensor-cache.json, ndarray-cache.json

Downloads last month: 5

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for vikramlingam/SmolLM2-360M-Instruct-q4f16_1-MLC

Base model

HuggingFaceTB/SmolLM2-360M

Quantized

HuggingFaceTB/SmolLM2-360M-Instruct

Finetuned

(138)

this model