How to use from
llama.cpp
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf dispatchAI/Qwen2.5-0.5B-Instruct-mobile-int4
# Run inference directly in the terminal:
llama cli -hf dispatchAI/Qwen2.5-0.5B-Instruct-mobile-int4
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf dispatchAI/Qwen2.5-0.5B-Instruct-mobile-int4
# Run inference directly in the terminal:
llama cli -hf dispatchAI/Qwen2.5-0.5B-Instruct-mobile-int4
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf dispatchAI/Qwen2.5-0.5B-Instruct-mobile-int4
# Run inference directly in the terminal:
./llama-cli -hf dispatchAI/Qwen2.5-0.5B-Instruct-mobile-int4
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf dispatchAI/Qwen2.5-0.5B-Instruct-mobile-int4
# Run inference directly in the terminal:
./build/bin/llama-cli -hf dispatchAI/Qwen2.5-0.5B-Instruct-mobile-int4
Use Docker
docker model run hf.co/dispatchAI/Qwen2.5-0.5B-Instruct-mobile-int4
Quick Links

Qwen 2.5 0.5B Instruct - Mobile INT4 (GGUF)

Alibaba's Qwen 2.5 0.5B Instruct, the smallest capable general-purpose model. Incredibly fast on phones.

Property Value
Base Qwen/Qwen2.5-0.5B-Instruct
Parameters 494 million
Quantization INT4 GGUF
Size ~398 MB
License Apache 2.0

Performance

  • ~45 tok/s on Samsung S20 FE CPU (fastest in our collection!)
  • ~0.7 GB memory footprint
  • Fits on ANY modern smartphone
  • ~94% quality retention

Use Cases

  • Code generation on mobile IDEs
  • Quick text classification / extraction
  • Embedded assistants in apps
  • Ultra-low-latency responses (<50ms per token)
  • Batch processing at massive scale

Quick Start

huggingface-cli download dispatchAI/Qwen2.5-0.5B-Instruct-mobile-int4 --local-dir ./models
./build/bin/main -m ./models/model.gguf -p "Explain quantum computing simply." -n 128 -t 4
Downloads last month
1,056
GGUF
Model size
0.6B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Spaces using dispatchAI/Qwen2.5-0.5B-Instruct-mobile-int4 3

Collections including dispatchAI/Qwen2.5-0.5B-Instruct-mobile-int4