TokForge

Website: https://tokforge.ai
Discord: https://discord.gg/Acv3CBtfVm
Google Play: https://play.google.com/store/apps/details?id=dev.tokforge
iOS TestFlight: https://testflight.apple.com/join/jnufjzRr

Runs on-device in the TokForge app.

SmolLM2-1.7B-Instruct-MNN

Pre-converted SmolLM2 1.7B Instruct in MNN format for on-device inference with TokForge.

Original model by HuggingFace — converted to MNN Q4 for mobile deployment.

Model Details


Architecture	LlamaForCausalLM (32 layers, 2048 hidden)
Parameters	1.7B (4-bit quantized)
Format	MNN (Alibaba Mobile Neural Network)
Quantization	W4A16 (4-bit weights, block size 128)
Vocab	49,152 tokens
Source	HuggingFaceTB/SmolLM2-1.7B-Instruct

Description

HuggingFace's own SmolLM2 — an ultra-compact 1.7B model designed for on-device inference. Runs on any phone, even budget devices with 4GB RAM. Surprisingly capable for its tiny size — great for quick Q&A, text completion, and simple tasks where speed matters more than depth.

Files

File	Description
`llm.mnn`	Model computation graph
`llm.mnn.weight`	Quantized weight data (Q4, block=128)
`llm_config.json`	Model config with Jinja chat template
`tokenizer.txt`	Tokenizer vocabulary
`config.json`	MNN runtime config

Usage with TokForge

This model is optimized for TokForge — a free Android app for private, on-device LLM inference.

Download TokForge from the Play Store
Open the app → Models → Download this model
Start chatting — runs 100% locally, no internet required

Recommended Settings

Setting	Value
Backend	OpenCL (Qualcomm) / Vulkan (MediaTek) / CPU (fallback)
Precision	Low
Threads	4
Thinking	Off (or On for thinking-capable models)

Performance

Actual speed varies by device, thermal state, and generation length. Typical ranges for this model size:

Device	SoC	Backend	tok/s
Any modern phone	Any	CPU/OpenCL	~30-50 tok/s
Budget phones (4GB+)	Any	CPU	~15-25 tok/s

Attribution

This is an MNN conversion of SmolLM2 1.7B Instruct by HuggingFace. All credit for the model architecture, training, and fine-tuning goes to the original author(s). This conversion only changes the runtime format for mobile deployment.

Limitations

Intended for TokForge / MNN on-device inference on Android
This is a runtime bundle, not a standard Transformers training checkpoint
Quantization (Q4) may slightly reduce quality compared to the full-precision original
Abliterated/uncensored models have had safety filters removed — use responsibly

Community

Website: tokforge.ai
Discord: Join our Discord
GitHub: TokForge on GitHub

Export Details

Converted using MNN's llmexport pipeline:

python llmexport.py --path HuggingFaceTB/SmolLM2-1.7B-Instruct --export mnn --quant_bit 4 --quant_block 128

Downloads last month: 8

Model tree for darkmaniac7/SmolLM2-1.7B-Instruct-MNN

Base model

HuggingFaceTB/SmolLM2-1.7B

Quantized

HuggingFaceTB/SmolLM2-1.7B-Instruct

Finetuned

(154)

this model

Collection including darkmaniac7/SmolLM2-1.7B-Instruct-MNN

TokForge Popular Models — MNN

Collection

Top open-source models in MNN Q4 for TokForge mobile inference. Google, Meta, Mistral, DeepSeek, HuggingFace. • 7 items • Updated Apr 2 • 1