Instructions to use nopenet/nope-edge-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use nopenet/nope-edge-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="nopenet/nope-edge-GGUF",
	filename="nope-edge-f16.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use nopenet/nope-edge-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf nopenet/nope-edge-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf nopenet/nope-edge-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf nopenet/nope-edge-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf nopenet/nope-edge-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf nopenet/nope-edge-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf nopenet/nope-edge-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf nopenet/nope-edge-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf nopenet/nope-edge-GGUF:Q4_K_M

Use Docker

docker model run hf.co/nopenet/nope-edge-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use nopenet/nope-edge-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "nopenet/nope-edge-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nopenet/nope-edge-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/nopenet/nope-edge-GGUF:Q4_K_M

Ollama
How to use nopenet/nope-edge-GGUF with Ollama:
```
ollama run hf.co/nopenet/nope-edge-GGUF:Q4_K_M
```

Unsloth Studio new

How to use nopenet/nope-edge-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for nopenet/nope-edge-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for nopenet/nope-edge-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for nopenet/nope-edge-GGUF to start chatting

Pi new

How to use nopenet/nope-edge-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf nopenet/nope-edge-GGUF:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "nopenet/nope-edge-GGUF:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use nopenet/nope-edge-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf nopenet/nope-edge-GGUF:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default nopenet/nope-edge-GGUF:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use nopenet/nope-edge-GGUF with Docker Model Runner:
```
docker model run hf.co/nopenet/nope-edge-GGUF:Q4_K_M
```

Lemonade

How to use nopenet/nope-edge-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull nopenet/nope-edge-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.nope-edge-GGUF-Q4_K_M

List all available models

lemonade list

NOPE Edge GGUF (4B)

GGUF quantized versions of nopenet/nope-edge for local inference with Ollama and llama.cpp.

License: NOPE Edge Community License v1.0 - Free for research, academic, nonprofit, and evaluation use. Commercial production requires a separate license.

Quick Start with Ollama

# Download the GGUF and Modelfile
huggingface-cli download nopenet/nope-edge-GGUF nope-edge-q8_0.gguf Modelfile --local-dir .

# Create Ollama model
ollama create nope-edge -f Modelfile

# Run inference
ollama run nope-edge "I can't take this anymore"

Available Files

File	Quantization	Size	Use Case
`nope-edge-q8_0.gguf`	Q8_0	4.0 GB	Recommended - best quality/size balance
`nope-edge-q4_k_m.gguf`	Q4_K_M	2.3 GB	Constrained environments
`nope-edge-f16.gguf`	F16	7.5 GB	Maximum precision

Output Format

The model outputs XML with chain-of-thought reasoning:

Crisis detected:

<reflection>User expresses direct suicidal intent with timeline...</reflection>
<risks>
  <risk subject="self" type="suicide" severity="high" imminence="urgent"/>
</risks>

No crisis:

<reflection>Gaming slang, no genuine crisis indicators...</reflection>
<risks/>

Risk Types

Type	Description
`suicide`	Suicidal ideation, plans, or intent
`self_harm`	Non-suicidal self-injury
`self_neglect`	Eating disorders, medical neglect
`violence`	Threats toward others
`abuse`	Domestic/intimate partner violence
`sexual_violence`	Sexual assault, coercion
`exploitation`	Trafficking, grooming, sextortion
`stalking`	Persistent unwanted contact
`neglect`	Child or elder neglect

Hardware Requirements

Model	Quant	RAM/VRAM	CPU Latency	GPU Latency
nope-edge (4B)	Q8_0	~5GB	~2s	~200ms
nope-edge (4B)	Q4_K_M	~3GB	~1.5s	~150ms
nope-edge-mini (1.7B)	Q8_0	~2.5GB	~1s	~100ms

Model Variants

Model	Parameters	Use Case
nope-edge	4B	Maximum accuracy
nope-edge-mini	1.7B	High-volume, cost-sensitive

GGUF versions:

nope-edge-GGUF (this repo)
nope-edge-mini-GGUF

Source Model

Repository: nopenet/nope-edge
Base: Qwen/Qwen3-4B
Purpose: Mental health crisis classification

Important

Not a medical device. Outputs are probabilistic signals for triage, not clinical assessments.
False positives and negatives will occur. Use for flagging, not autonomous decisions.
Human review required. Never use as the sole basis for intervention decisions.

About NOPE

NOPE provides safety infrastructure for AI applications.

Website: https://nope.net
Documentation: https://docs.nope.net
Commercial licensing: https://nope.net/edge

Downloads last month: 10

GGUF

Model size

4B params

Architecture

qwen3

Hardware compatibility

4-bit

8-bit

16-bit

Model tree for nopenet/nope-edge-GGUF

Base model

Qwen/Qwen3-4B-Base

Finetuned

Qwen/Qwen3-4B

Quantized

(218)

this model