Instructions to use NexaAI/Qwen2-Audio-7B-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use NexaAI/Qwen2-Audio-7B-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="NexaAI/Qwen2-Audio-7B-GGUF",
	filename="Qwen2-7B-LLM-F16.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use NexaAI/Qwen2-Audio-7B-GGUF with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf NexaAI/Qwen2-Audio-7B-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf NexaAI/Qwen2-Audio-7B-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf NexaAI/Qwen2-Audio-7B-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf NexaAI/Qwen2-Audio-7B-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf NexaAI/Qwen2-Audio-7B-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf NexaAI/Qwen2-Audio-7B-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf NexaAI/Qwen2-Audio-7B-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf NexaAI/Qwen2-Audio-7B-GGUF:Q4_K_M

Use Docker

docker model run hf.co/NexaAI/Qwen2-Audio-7B-GGUF:Q4_K_M

LM Studio
Jan
Ollama
How to use NexaAI/Qwen2-Audio-7B-GGUF with Ollama:
```
ollama run hf.co/NexaAI/Qwen2-Audio-7B-GGUF:Q4_K_M
```

Unsloth Studio

How to use NexaAI/Qwen2-Audio-7B-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for NexaAI/Qwen2-Audio-7B-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for NexaAI/Qwen2-Audio-7B-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for NexaAI/Qwen2-Audio-7B-GGUF to start chatting

Atomic Chat new
Docker Model Runner
How to use NexaAI/Qwen2-Audio-7B-GGUF with Docker Model Runner:
```
docker model run hf.co/NexaAI/Qwen2-Audio-7B-GGUF:Q4_K_M
```

Lemonade

How to use NexaAI/Qwen2-Audio-7B-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull NexaAI/Qwen2-Audio-7B-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.Qwen2-Audio-7B-GGUF-Q4_K_M

List all available models

lemonade list

Qwen2-Audio

We're bringing Qwen2-Audio to run locally on edge devices with Nexa-SDK, offering various GGUF quantization options.

Qwen2-Audio is a SOTA small-scale multimodal model (AudioLM) that handles audio and text inputs, allowing you to have voice interactions without ASR modules. Qwen2-Audio supports English, Chinese, and major European languages,and provides voice chat and audio analysis capabilities for local use cases like:

Speaker identification and response
Speech translation and transcription
Mixed audio and noise detection
Music and sound analysis

Demo

See more demos in our blogs

How to Run Locally On Device

In the following, we demonstrate how to run Qwen2-Audio locally on your device.

Step 1: Install Nexa-SDK (local on-device inference framework)

Install Nexa-SDK

Nexa-SDK is a open-sourced, local on-device inference framework, supporting text generation, image generation, vision-language models (VLM), audio-language models, speech-to-text (ASR), and text-to-speech (TTS) capabilities. Installable via Python Package or Executable Installer.

Step 2: Then run the following code in your terminal

nexa run qwen2audio

This will run default q4_K_M quantization.

For terminal:

Drag and drop your audio file into the terminal (or enter file path on Linux)
Add text prompt to guide analysis or leave empty for direct voice input

or to use with local UI (streamlit):

nexa run qwen2audio -st

Choose Quantizations for your device

Run different quantization versions here and check RAM requirements in our list.

The default q4_K_M version requires 4.2GB of RAM.

Use Cases

Voice Chat

Answer daily questions
Offer suggestions
Speaker identification and response
Speech translation
Detecting background noise and responding accordingly

Audio Analysis

Information Extraction
Audio summary
Speech Transcription and Expansion
Mixed audio and noise detection
Music and sound analysis

Performance Benchmark

Results demonstrate that Qwen2-Audio significantly outperforms either previous SOTAs or Qwen-Audio across all tasks.

Blog

Learn more in our blogs

Join Community

Discord | X(Twitter)

Downloads last month: 2,707

GGUF

Model size

8B params

Architecture

qwen2

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

View +1 variant

Inference Providers NEW

Audio-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support