SmolLM2-360M-Instruct-q4f16_1-MLC

This repository contains the SmolLM2-360M-Instruct model weights converted to MLC format, quantized with q4f16_1.

Model Description

SmolLM2 is a family of compact language models optimized for on-device performance. This 360M version is designed to be extremely fast and efficient for mobile and web deployment while maintaining strong instruction-following capabilities for its size.

  • Developer: Hugging Face
  • Model Type: Llama-based Causal Language Model
  • Parameters: 360M
  • Format: MLC (Compiled for WebGPU and optimized for WebLLM)
  • Quantization: q4f16_1 (4-bit quantization, float16 scaling)

Usage

This model is intended for use with WebLLM or MLC LLM.

How to use with WebLLM

const modelId = "vikramlingam/SmolLM2-360M-Instruct-q4f16_1-MLC";
// Use this ID in your WebLLM configuration

Reconversion Details

This model was converted from the official text-only HuggingFaceTB/SmolLM2-360M-Instruct source to ensure a clean text generation experience without multimodal artifacts.

  • Context Window: 8192 tokens
  • WASM binary included: Yes (WebGPU)
  • Metadata files included: tensor-cache.json, ndarray-cache.json
Downloads last month
22
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for vikramlingam/SmolLM2-360M-Instruct-q4f16_1-MLC

Finetuned
(120)
this model