Instructions to use P0x0/Epos-8b-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use P0x0/Epos-8b-GGUF with Transformers:

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("P0x0/Epos-8b-GGUF", dtype="auto")

llama-cpp-python

How to use P0x0/Epos-8b-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="P0x0/Epos-8b-GGUF",
	filename="epos-8b-q5_k_m.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use P0x0/Epos-8b-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf P0x0/Epos-8b-GGUF:Q5_K_M
# Run inference directly in the terminal:
llama-cli -hf P0x0/Epos-8b-GGUF:Q5_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf P0x0/Epos-8b-GGUF:Q5_K_M
# Run inference directly in the terminal:
llama-cli -hf P0x0/Epos-8b-GGUF:Q5_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf P0x0/Epos-8b-GGUF:Q5_K_M
# Run inference directly in the terminal:
./llama-cli -hf P0x0/Epos-8b-GGUF:Q5_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf P0x0/Epos-8b-GGUF:Q5_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf P0x0/Epos-8b-GGUF:Q5_K_M

Use Docker

docker model run hf.co/P0x0/Epos-8b-GGUF:Q5_K_M

LM Studio
Jan
Ollama
How to use P0x0/Epos-8b-GGUF with Ollama:
```
ollama run hf.co/P0x0/Epos-8b-GGUF:Q5_K_M
```

Unsloth Studio new

How to use P0x0/Epos-8b-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for P0x0/Epos-8b-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for P0x0/Epos-8b-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for P0x0/Epos-8b-GGUF to start chatting

Docker Model Runner
How to use P0x0/Epos-8b-GGUF with Docker Model Runner:
```
docker model run hf.co/P0x0/Epos-8b-GGUF:Q5_K_M
```

Lemonade

How to use P0x0/Epos-8b-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull P0x0/Epos-8b-GGUF:Q5_K_M

Run and chat with the model

lemonade run user.Epos-8b-GGUF-Q5_K_M

List all available models

lemonade list

Epos-8B

Epos-8B is a fine-tuned version of the base model Llama-3.1-8B from Meta, optimized for storytelling, dialogue generation, and creative writing. The model specializes in generating rich narratives, immersive prose, and dynamic character interactions, making it ideal for creative tasks.

Model Details

Model Description

Epos-8B is an 8 billion parameter language model fine-tuned for storytelling and narrative tasks. Inspired by the grandeur of epic tales, it is designed to produce high-quality, engaging content that evokes the depth and imagination of ancient myths and modern storytelling traditions.

Developed by: P0x0
Funded by: P0x0
Shared by: P0x0
Model type: Transformer-based Language Model
Language(s) (NLP): Primarily English
License: Apache 2.0
Finetuned from model: meta-llama/Llama-3.1-8B

Model Sources

Repository: Epos-8B on Hugging Face

Uses

Direct Use

Epos-8B is ideal for:

Storytelling: Generate detailed, immersive, and engaging narratives.
Dialogue Creation: Create realistic and dynamic character interactions for stories or games.

How to Get Started with the Model

To run the quantized version of the model, you can use KoboldCPP, which allows you to run quantized GGUF models locally.

Steps:

Download KoboldCPP.
Follow the setup instructions provided in the repository.
Download the GGUF variant of Epos-8B from Epos-8B-GGUF.
Load the model in KoboldCPP and start generating!

Downloads last month: 3

GGUF

Model size

8B params

Architecture

llama

Hardware compatibility

5-bit

6-bit

8-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for P0x0/Epos-8b-GGUF

Base model

meta-llama/Llama-3.1-8B

Quantized

(322)

this model