How to use from
llama.cpp
Install from brew
brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Aryanne/ereb-test:Q4_0
# Run inference directly in the terminal:
llama-cli -hf Aryanne/ereb-test:Q4_0
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Aryanne/ereb-test:Q4_0
# Run inference directly in the terminal:
llama-cli -hf Aryanne/ereb-test:Q4_0
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Aryanne/ereb-test:Q4_0
# Run inference directly in the terminal:
./llama-cli -hf Aryanne/ereb-test:Q4_0
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Aryanne/ereb-test:Q4_0
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Aryanne/ereb-test:Q4_0
Use Docker
docker model run hf.co/Aryanne/ereb-test:Q4_0
Quick Links

Another trial of merging models with different sizes, still under testing, should be more stable, but I have no ideia if it's improving or degrading the base model.

Recipe:

merge_method: task_anysize
base_model: princeton-nlp/Sheared-LLaMA-2.7B-ShareGPT
models:
  - model: KoboldAI/Mistral-7B-Erebus-v3
    parameters:
      weight: 0.5
dtype: bfloat16 

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 41.85
AI2 Reasoning Challenge (25-Shot) 40.70
HellaSwag (10-Shot) 71.04
MMLU (5-Shot) 28.06
TruthfulQA (0-shot) 47.40
Winogrande (5-shot) 63.93
GSM8k (5-shot) 0.00
Downloads last month
246
Safetensors
Model size
3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including Aryanne/ereb-test

Evaluation results