How to use from
llama.cpp
Install from brew
brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf krory/GenBook-Deepseek-R1.Llama-8B-GGUF:
# Run inference directly in the terminal:
llama-cli -hf krory/GenBook-Deepseek-R1.Llama-8B-GGUF:
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf krory/GenBook-Deepseek-R1.Llama-8B-GGUF:
# Run inference directly in the terminal:
llama-cli -hf krory/GenBook-Deepseek-R1.Llama-8B-GGUF:
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf krory/GenBook-Deepseek-R1.Llama-8B-GGUF:
# Run inference directly in the terminal:
./llama-cli -hf krory/GenBook-Deepseek-R1.Llama-8B-GGUF:
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf krory/GenBook-Deepseek-R1.Llama-8B-GGUF:
# Run inference directly in the terminal:
./build/bin/llama-cli -hf krory/GenBook-Deepseek-R1.Llama-8B-GGUF:
Use Docker
docker model run hf.co/krory/GenBook-Deepseek-R1.Llama-8B-GGUF:
Quick Links

image/png

About the Model

This model is designed to be a storytelling AI capable of creating fun, engaging, and well-structured narratives. Its purpose is to serve as an interactive tool for generating and experiencing unique stories in real time, tailored to the user's input and preferences.

Key Features

  • Interactive Narratives: Produces coherent and entertaining stories based on user prompts, adapting dynamically to maintain engagement.
  • Consistent World-Building: Ensures logical progression and consistency in characters, settings, and events across long narratives.
  • Optimized for Efficiency: Built to perform reliably on limited hardware while delivering high-quality outputs.

Training Overview

The model was fine-tuned using datasets focused on narrative construction, character development, and immersive descriptions. Key aspects of the training include:

  • Adaptability: Special attention was given to creating a system that responds flexibly to varied user inputs while maintaining coherence.
  • Resource Efficiency: Techniques like LoRA (Low-Rank Adaptation) and 4-bit quantization were employed to optimize memory usage without compromising output quality.
  • Long-Context Support: Enhanced with methods to handle extended interactions and complex storylines.

Purpose

The primary goal of this model is to create a personal, customizable storytelling AI, allowing users to immerse themselves in unique, AI-driven stories anytime.


Downloads last month
42
GGUF
Model size
8B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support