SmolLM2-135M-Instruct-ArduinoUnoQ-GGUF

Quantized and optimized for the Arduino Uno Q using an NVIDIA DGX Spark.

Optimization Specs

  • Architecture: ARM64 / Adreno 702 GPU
  • Quantization: Q4_0 with --pure alignment for Adreno kernels.
  • Base Model: SmolLM2-135M-Instruct

โš™๏ธ Inference Engine

This model is designed to run with the optimized llama-cli binary found in the official Uno Q Edge AI repository: ๐Ÿ‘‰ GitHub: Arduino-UnoQ-Optimized-Llama-CLI

Usage

bash ./llama-cli -m unoq_optimized.gguf -ngl 99 -t 4 -c 2048

Downloads last month
121
GGUF
Model size
0.1B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for assix-research/SmolLM2-135M-Instruct-ArduinoUnoQ-GGUF

Quantized
(98)
this model