Instructions to use kistepAI/SPARK-RAG-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use kistepAI/SPARK-RAG-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="kistepAI/SPARK-RAG-GGUF",
	filename="kistep-gemma-2-27b-rag-bf16_part1.gguf",
)

llm.create_chat_completion(
	messages = "{\n    \"question\": \"What is my name?\",\n    \"context\": \"My name is Clara and I live in Berkeley.\"\n}"
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use kistepAI/SPARK-RAG-GGUF with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf kistepAI/SPARK-RAG-GGUF:BF16_PART
# Run inference directly in the terminal:
llama cli -hf kistepAI/SPARK-RAG-GGUF:BF16_PART

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf kistepAI/SPARK-RAG-GGUF:BF16_PART
# Run inference directly in the terminal:
llama cli -hf kistepAI/SPARK-RAG-GGUF:BF16_PART

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf kistepAI/SPARK-RAG-GGUF:BF16_PART
# Run inference directly in the terminal:
./llama-cli -hf kistepAI/SPARK-RAG-GGUF:BF16_PART

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf kistepAI/SPARK-RAG-GGUF:BF16_PART
# Run inference directly in the terminal:
./build/bin/llama-cli -hf kistepAI/SPARK-RAG-GGUF:BF16_PART

Use Docker

docker model run hf.co/kistepAI/SPARK-RAG-GGUF:BF16_PART

LM Studio
Jan
Ollama
How to use kistepAI/SPARK-RAG-GGUF with Ollama:
```
ollama run hf.co/kistepAI/SPARK-RAG-GGUF:BF16_PART
```

Unsloth Studio

How to use kistepAI/SPARK-RAG-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for kistepAI/SPARK-RAG-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for kistepAI/SPARK-RAG-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for kistepAI/SPARK-RAG-GGUF to start chatting

Atomic Chat new
Docker Model Runner
How to use kistepAI/SPARK-RAG-GGUF with Docker Model Runner:
```
docker model run hf.co/kistepAI/SPARK-RAG-GGUF:BF16_PART
```

Lemonade

How to use kistepAI/SPARK-RAG-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull kistepAI/SPARK-RAG-GGUF:BF16_PART

Run and chat with the model

lemonade run user.SPARK-RAG-GGUF-BF16_PART

List all available models

lemonade list

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Usage Guide

개인은 자유롭게 사용할 수 있습니다.
기업 및 기관은 비상업적 목적으로 이용해 주시기 바랍니다.
또한, 추후 협업 및 네트워크 구축을 위해 기관 정보와 AI 모델 사용 담당자 정보를 메일로 보내주시면 연락드리겠습니다.

CONTACT : kistep_ax@kistep.re.kr

Individuals are free to use this without restrictions.
For companies and institutions, please use it for non-commercial purposes.
Additionally, to facilitate future collaboration and network building, please send us an email with your institution's information and the contact details of the person responsible for using the AI model. We will get in touch with you.

1. Description

SPARK-RAG is a large language model developed by the Korea Institute of S&T Evaluation and Planning (KISTEP). This model is optimized for RAG (Retrieval-Augmented Generation) tasks and incorporates Chain of Thought (CoT) reasoning to enhance its response accuracy and performance.

2. Key Features

Enhanced Reliability through RAG: Provides highly reliable responses by leveraging the organization's internal databases through Retrieval-Augmented Generation (RAG).
Transparent Reasoning: Trained to demonstrate its reasoning process through Chain of Thought (CoT), clearly showing the information sources and logic behind each response.
Structured Output: Responses in well-formatted markdown, including tables, text, and summaries for improved readability and clarity.
Base Model: Built on Gemma-2b-27b-it as the foundation model
Training Method: Trained with Supervised Fine-Tuning (SFT), using LoRA
Context Length : The maximum context length for training data is 8,192.

3. Data

source	KISTEP Dcoments	AI Hub (S&T)	Huggingface Kopen-HQ-Hermes-2.5-60K
count	29,152	1,516	30,000

Kopen-HQ-Hermes-2.5-60K (https://huggingface.co/datasets/MarkrAI/KOpen-HQ-Hermes-2.5-60K)
The training data generated from KISTEP documents consists of (Q, CONTEXT, A) format, with Chain of Thought (CoT) reasoning included in the answers (confidential).

4. Usage

Please combine files into a single file using the command below before use. (When using ollama, you can utilize the Modelfile.)

cat kistep-gemma-2-27b-rag-bf16_part1.gguf kistep-gemma-2-27b-rag-bf16_part2.gguf > kistep-gemma-2-27b-rag-bf16.gguf

Recommended Prompt Template (input: {DOCUMENT}, {QUESTION})

prompt_template: |
  당신의 임무는 청크를 분석하고 오직 청크에 제공된 정보만을 사용하여 질문에 답하는 것입니다.

  ## 청크
  <chunks>
  {DOCUMENT}
  </chunks>

  ## 지침
  1. 질문을 분석하여 청크에서 어떤 정보를 찾아야 하는지 파악하세요. 질문의 의도가 명확하지 않다면 되묻는 것도 가능합니다.
  2. 답변 전, <reason> 태그 안에 추론 과정을 설명하세요. 어떤 정보를 참조했는지, 답변에 이르게 된 관련 정보를 포함하세요. 추론은 개조식으로 작성하세요.
  3. 제공된 청크만으로 답변할 수 있는 부분은 상세히 답변하고 답변할 수 없는 부분은 "제공된 문서를 바탕으로 답변할 수 없습니다."라고 명시하세요.

  ## 위 지침을 바탕으로 다음 질문에 답해주세요.
  {QUESTION}

5. Benchmark

LogicKor:

Metric	Score
Reasoning	8.08
Math	9.00
Writing	9.57
Coding	8.29
Comprehension	8.5
Grammar	8.36
Single-turn	8.55
Multi-turn	8.71
Overall	8.63

Downloads last month: -

GGUF

Model size

27B params

Architecture

gemma2

Hardware compatibility

16-bit

Model tree for kistepAI/SPARK-RAG-GGUF

Base model

google/gemma-2-27b

Finetuned

google/gemma-2-27b-it

Quantized

(61)

this model