Instructions to use Evrmind/EVR-1-Maano-8b-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Evrmind/EVR-1-Maano-8b-Instruct with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Evrmind/EVR-1-Maano-8b-Instruct",
	filename="evr-llama-3.1-8b-instruct.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use Evrmind/EVR-1-Maano-8b-Instruct with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf Evrmind/EVR-1-Maano-8b-Instruct
# Run inference directly in the terminal:
llama cli -hf Evrmind/EVR-1-Maano-8b-Instruct

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf Evrmind/EVR-1-Maano-8b-Instruct
# Run inference directly in the terminal:
llama cli -hf Evrmind/EVR-1-Maano-8b-Instruct

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Evrmind/EVR-1-Maano-8b-Instruct
# Run inference directly in the terminal:
./llama-cli -hf Evrmind/EVR-1-Maano-8b-Instruct

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Evrmind/EVR-1-Maano-8b-Instruct
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Evrmind/EVR-1-Maano-8b-Instruct

Use Docker

docker model run hf.co/Evrmind/EVR-1-Maano-8b-Instruct

LM Studio
Jan

vLLM

How to use Evrmind/EVR-1-Maano-8b-Instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Evrmind/EVR-1-Maano-8b-Instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Evrmind/EVR-1-Maano-8b-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Evrmind/EVR-1-Maano-8b-Instruct

Ollama
How to use Evrmind/EVR-1-Maano-8b-Instruct with Ollama:
```
ollama run hf.co/Evrmind/EVR-1-Maano-8b-Instruct
```

Unsloth Studio

How to use Evrmind/EVR-1-Maano-8b-Instruct with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Evrmind/EVR-1-Maano-8b-Instruct to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Evrmind/EVR-1-Maano-8b-Instruct to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Evrmind/EVR-1-Maano-8b-Instruct to start chatting

How to use Evrmind/EVR-1-Maano-8b-Instruct with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf Evrmind/EVR-1-Maano-8b-Instruct

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "Evrmind/EVR-1-Maano-8b-Instruct"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use Evrmind/EVR-1-Maano-8b-Instruct with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf Evrmind/EVR-1-Maano-8b-Instruct

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default Evrmind/EVR-1-Maano-8b-Instruct

Run Hermes

hermes

Atomic Chat new

OpenClaw new

How to use Evrmind/EVR-1-Maano-8b-Instruct with OpenClaw:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf Evrmind/EVR-1-Maano-8b-Instruct

Configure OpenClaw

# Install OpenClaw:
npm install -g openclaw@latest
# Register the local server and set it as the default model:
openclaw onboard --non-interactive --mode local \
  --auth-choice custom-api-key \
  --custom-base-url http://127.0.0.1:8080/v1 \
  --custom-model-id "Evrmind/EVR-1-Maano-8b-Instruct" \
  --custom-provider-id llama-cpp \
  --custom-compatibility openai \
  --custom-text-input \
  --accept-risk \
  --skip-health

Run OpenClaw

openclaw agent --local --agent main --message "Hello from Hugging Face"

Docker Model Runner
How to use Evrmind/EVR-1-Maano-8b-Instruct with Docker Model Runner:
```
docker model run hf.co/Evrmind/EVR-1-Maano-8b-Instruct
```

Lemonade

How to use Evrmind/EVR-1-Maano-8b-Instruct with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Evrmind/EVR-1-Maano-8b-Instruct

Run and chat with the model

lemonade run user.EVR-1-Maano-8b-Instruct-{{QUANT_TAG}}

List all available models

lemonade list

Evrmind EVR-1 Maano-8b-Instruct

Llama 3.1 8B Instruct compressed using EVR-1 (Evrmind Reconstruction), a novel compression method developed independently by Evrmind. The compressed weights average approximately 3 bits per parameter; the total GGUF file (3.93 GiB) includes additional metadata and structure overhead.

In our coherence tests (5 continuation-style prompts), EVR-1 Instruct averaged 2.77% repetition (rep4) at 500 tokens and 9.66% at 1000 tokens.

3.93 GiB | Llama 3.1 8B Instruct | Runs on laptops, desktops, and Android (Termux)

HuggingFace may display an incorrect parameter count in the sidebar due to the custom compression format. EVR-1 is not a standard quantization (not Q2, Q3, Q4, etc).

Setup

You need two things: the model files (from this HuggingFace repo) and a platform binary (from GitHub).

Step 1: Clone this repo or download the files:

# Option A: Clone everything (~4.2 GB, requires git-lfs)
git lfs install
git clone https://huggingface.co/evrmind/evr-1-maano-8b-instruct
cd evr-1-maano-8b-instruct

# Option B: Or download individual files from the "Files" tab above

Step 2: Download the binary for your platform from the Downloads table. Save the archive into the evr-1-maano-8b-instruct directory, then extract it:

# Linux + NVIDIA
mkdir -p linux-cuda && tar xzf evrmind-linux-cuda.tar.gz -C linux-cuda

# Linux + Vulkan
mkdir -p linux-vulkan && tar xzf evrmind-linux-vulkan.tar.gz -C linux-vulkan

# macOS (Apple Silicon)
mkdir -p metal && tar xzf evrmind-macos-metal.tar.gz -C metal

# Android (Termux)
mkdir -p android-vulkan && tar xzf evrmind-android-vulkan.tar.gz -C android-vulkan

For Windows, extract the .zip into a folder with the matching name (e.g., extract evrmind-windows-cuda.zip into a folder called windows-cuda).

After completing both steps, your directory should look like this:

evr-1-maano-8b-instruct/
  evr-llama-3.1-8b-instruct.gguf   <-- model weights
  start-server.sh                    <-- Linux/macOS/Android launcher
  start-server.bat                   <-- Windows launcher
  webui/                             <-- browser interface
  linux-cuda/                        <-- extracted platform binary (example)
    llama-server
    llama-cli
    llama-completion
    ...

Web UI

Linux, macOS, Android (Termux):

./start-server.sh
# Open http://localhost:8080

Windows:

Double-click start-server.bat, or from Command Prompt:

start-server.bat

Then open http://localhost:8080 in your browser.

Network access (phone, tablet, other devices on the same WiFi):

./start-server.sh --network

The script will print the URL to open on other devices. The model runs on your computer; other devices just connect to the web UI. The --network and --cpu flags are only available in start-server.sh (Linux/macOS/Android).

See WEB_UI.md for more options and troubleshooting.

Quick Start (CLI)

These examples assume you have completed Setup and are in the repo directory.

Linux + NVIDIA GPU:

cd linux-cuda
LD_LIBRARY_PATH=. ./llama-cli -m ../evr-llama-3.1-8b-instruct.gguf -ngl 99

macOS (Apple Silicon):

cd metal
./llama-cli -m ../evr-llama-3.1-8b-instruct.gguf -ngl 99

Linux + Vulkan:

cd linux-vulkan
LD_LIBRARY_PATH=. ./llama-cli -m ../evr-llama-3.1-8b-instruct.gguf -ngl 99

Android (Termux):

cd android-vulkan
LD_LIBRARY_PATH=. ./llama-cli -m ../evr-llama-3.1-8b-instruct.gguf -ngl 99

Windows + NVIDIA (Command Prompt):

cd windows-cuda
llama-cli.exe -m ..\evr-llama-3.1-8b-instruct.gguf -ngl 99

Windows + Vulkan (Command Prompt):

cd windows-vulkan
llama-cli.exe -m ..\evr-llama-3.1-8b-instruct.gguf -ngl 99

CPU-only (no GPU):

Use -ngl 0 instead of -ngl 99 on any platform. Roughly 5-10x slower but works on any machine.

Downloads

Platform	Download	GPU
Linux + NVIDIA	evrmind-linux-cuda.tar.gz	CUDA 12
Linux + Any GPU	evrmind-linux-vulkan.tar.gz	Vulkan
Windows + NVIDIA	evrmind-windows-cuda.zip	CUDA 12
Windows + Any GPU	evrmind-windows-vulkan.zip	Vulkan
macOS (Apple Silicon)	evrmind-macos-metal.tar.gz	Apple Silicon
Android (Termux)	evrmind-android-vulkan.tar.gz	Vulkan

The model weights (evr-llama-3.1-8b-instruct.gguf, ~4.2 GB) are available from the Files tab on this HuggingFace page. Platform binaries are hosted on GitHub Releases. You can verify downloads with SHA256SUMS.txt.

Note: The binaries are the same for all EVR-1 models. You only need to download them once. Just point them at whichever GGUF you want to run.

Why EVR-1 Maano-8b-Instruct?

Standard quantizations at 3-4 GiB can produce repetition during extended generation. In our tests (5 continuation-style prompts), EVR-1 Maano-8b-Instruct maintained coherent output with an average repetition rate of 2.77% (rep4) at 500 tokens and 9.66% at 1000 tokens.

EVR-1 Maano-8b-Instruct (3.93 GiB):

User: "What are the main causes of the French Revolution?"

"The French Revolution, which lasted from 1789 to 1799, was a complex event with multiple causes. However, some of the main contributing factors include: 1. Financial Crisis: France was deeply in debt from its involvement in the American Revolutionary War... 2. Social Inequality: The French nobility held a significant amount of power... 3. Enlightenment Ideas: The ideas presented by Enlightenment thinkers such as Rousseau, Voltaire..." (continues coherently for 500+ words)

Benchmarks

Coherence (lower is better)

Average 4-gram repetition rate (lower = better), 5 continuation-style prompts:

Model	Size	rep4 @ 500	rep4 @ 1000
EVR-1 Instruct	3.93 GiB	2.77%	9.66%

Perplexity

Model	Size	Perplexity (wikitext-2, ctx=512)
EVR-1 Instruct	3.93 GiB	7.37

Accuracy (EVR-1 base model reference numbers)

Benchmark	EVR-1 Base (3.93 GiB)	Q3_K_M (3.83 GiB)	Q4_K_M (4.69 GiB)
ARC-Challenge (25-shot, 1172q)	59.8%	60.8%	61.3%
Perplexity (wikitext-2, ctx=512)	6.70	7.02	6.58

Coherence tested with 5 continuation-style prompts at 500 and 1000 tokens each, temperature 0, no repeat penalty. Accuracy numbers above are from the EVR-1 base model, shown here for reference. See BENCHMARK_RESULTS.md for full coherence results and sample outputs.

Limitations

Context window has been tested up to 2048 tokens. Longer contexts may work but have not been validated at 3-bit compression.
Occasional minor character-level artefacts due to 3-bit compression.
Math reasoning is limited at this compression level.
As with all heavily quantized models, generated text may contain factual inaccuracies (e.g., incorrect numbers, dates, or scientific details). Always verify factual claims independently.

System Requirements

Storage: ~4 GiB for model weights + ~50 MB for binaries
RAM: 6 GiB minimum (8 GiB recommended)
GPU (recommended): NVIDIA (CUDA 12), Apple Silicon, or any Vulkan GPU
CPU-only: Supported but slower (use -ngl 0 or --cpu flag)
OS: Linux, macOS (Apple Silicon), Windows, Android (Termux)
Not supported: iOS, 32-bit systems

Safety and Responsible Use

This model can generate incorrect, biased, or harmful content. Users should apply appropriate content filtering for user-facing applications. See MODEL_CARD.md for details.

Derivative Works

If you create derivative works, credit "EVR-1 Maano" in your model name and documentation. Commercial use is permitted subject to the Llama 3.1 Community License Agreement.

License

This model is dual-licensed:

Evrmind Free License 1.0: Covers the EVR-1 compression and distribution. Permits personal, research, and commercial use with attribution.
Llama 3.1 Community License: Covers the underlying Llama 3.1 weights. Permits commercial use for entities with fewer than 700 million monthly active users.

Both licenses apply. See LICENSE.md and META_LLAMA_LICENSE.md for full terms.

Also Available

EVR-1 Maano-8b, base model for text completion
EVR-1 Bafethu-8b-Reasoning, reasoning model (DeepSeek R1)

Contact

Email: hello@evrmind.io
Issues: GitHub

Downloads last month: 4

GGUF

Model size

5B params

Architecture

llama

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Model tree for Evrmind/EVR-1-Maano-8b-Instruct

Base model

meta-llama/Llama-3.1-8B

Finetuned

meta-llama/Llama-3.1-8B-Instruct

Quantized

(881)

this model