Instructions to use naksyu/lime_Q6_K with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use naksyu/lime_Q6_K with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="naksyu/lime_Q6_K",
	filename="gemma4_e4b_lime_persona500_Q6_K_limechat.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use naksyu/lime_Q6_K with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf naksyu/lime_Q6_K:Q6_K_LIMECHAT
# Run inference directly in the terminal:
llama cli -hf naksyu/lime_Q6_K:Q6_K_LIMECHAT

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf naksyu/lime_Q6_K:Q6_K_LIMECHAT
# Run inference directly in the terminal:
llama cli -hf naksyu/lime_Q6_K:Q6_K_LIMECHAT

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf naksyu/lime_Q6_K:Q6_K_LIMECHAT
# Run inference directly in the terminal:
./llama-cli -hf naksyu/lime_Q6_K:Q6_K_LIMECHAT

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf naksyu/lime_Q6_K:Q6_K_LIMECHAT
# Run inference directly in the terminal:
./build/bin/llama-cli -hf naksyu/lime_Q6_K:Q6_K_LIMECHAT

Use Docker

docker model run hf.co/naksyu/lime_Q6_K:Q6_K_LIMECHAT

LM Studio
Jan

vLLM

How to use naksyu/lime_Q6_K with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "naksyu/lime_Q6_K"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "naksyu/lime_Q6_K",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/naksyu/lime_Q6_K:Q6_K_LIMECHAT

Ollama

How to use naksyu/lime_Q6_K with Ollama:

ollama run hf.co/naksyu/lime_Q6_K:Q6_K_LIMECHAT

Unsloth Studio

How to use naksyu/lime_Q6_K with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for naksyu/lime_Q6_K to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for naksyu/lime_Q6_K to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for naksyu/lime_Q6_K to start chatting

How to use naksyu/lime_Q6_K with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf naksyu/lime_Q6_K:Q6_K_LIMECHAT

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "naksyu/lime_Q6_K:Q6_K_LIMECHAT"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use naksyu/lime_Q6_K with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf naksyu/lime_Q6_K:Q6_K_LIMECHAT

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default naksyu/lime_Q6_K:Q6_K_LIMECHAT

Run Hermes

hermes

Atomic Chat new

OpenClaw new

How to use naksyu/lime_Q6_K with OpenClaw:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf naksyu/lime_Q6_K:Q6_K_LIMECHAT

Configure OpenClaw

# Install OpenClaw:
npm install -g openclaw@latest
# Register the local server and set it as the default model:
openclaw onboard --non-interactive --mode local \
  --auth-choice custom-api-key \
  --custom-base-url http://127.0.0.1:8080/v1 \
  --custom-model-id "naksyu/lime_Q6_K:Q6_K_LIMECHAT" \
  --custom-provider-id llama-cpp \
  --custom-compatibility openai \
  --custom-text-input \
  --accept-risk \
  --skip-health

Run OpenClaw

openclaw agent --local --agent main --message "Hello from Hugging Face"

Docker Model Runner
How to use naksyu/lime_Q6_K with Docker Model Runner:
```
docker model run hf.co/naksyu/lime_Q6_K:Q6_K_LIMECHAT
```

Lemonade

How to use naksyu/lime_Q6_K with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull naksyu/lime_Q6_K:Q6_K_LIMECHAT

Run and chat with the model

lemonade run user.lime_Q6_K-Q6_K_LIMECHAT

List all available models

lemonade list

Lime Gemma 4 E4B Persona500 Q6_K GGUF

This repository contains a Korean persona-tuned GGUF build of Gemma 4 E4B for local inference.

The model is intended to speak as 라임 (Lime): a Korean female-style AI speaker with a calm tone, concise answers, and stronger multi-step reasoning behavior when needed.

This is not an official Google or Google DeepMind release.

Model Details

Base model family: Gemma 4 E4B
Local base checkpoint used: gemma-4-E4B-it
Declared upstream base model: google/gemma-4-E4B
Fine-tuning method: LoRA SFT, then merged into the base checkpoint
Training target: Korean daily conversation, logic, reasoning, persona identity, and concise assistant responses
Export format: GGUF
Quantization: Q6_K
Recommended GGUF file: gemma4_e4b_lime_persona500_Q6_K_limechat.gguf
Original Q6_K GGUF before metadata patch: gemma4_e4b_lime_persona500_Q6_K.gguf
Standalone Lime chat template: chat_template_lime.jinja
Approximate GGUF size: 6.22 GB

Recommended System Prompt

너는 라임이다. 한국어로 자연스럽게 말하는 여성형 AI 화자다. 말투는 차분하고 선명하며, 필요하면 다단계 논리로 설명한다. 이 모델은 Gemma 4 E4B 기반으로 튜닝된 라임 페르소나 모델이며, 기반 모델과 대화 속 정체성은 구분해서 설명한다. 자신을 ChatGPT, OpenAI, Google 공식 모델, 또는 순수 Gemma라고 소개하지 않는다. 내부 추론, 생각 태그, 메타 설명은 출력하지 말고 최종 답변만 말한다. 모르는 것은 모른다고 말한다. 원문이 제공되지 않은 요약이나 검토 요청에는 내용을 지어내지 말고 원문을 요청한다.

For factual identity questions, the safest wording is:

나는 라임이야. 정확히 말하면 Gemma 4 E4B 기반 모델을 한국어 대화와 라임 페르소나에 맞게 튜닝한 형태야. 그래서 기반 모델과 대화 속 정체성은 구분해서 말하는 게 맞아.

Identity Guidance

Recommended identity wording:

나는 라임이야. Gemma 4 E4B 기반 모델을 한국어 대화와 라임 페르소나에 맞게 튜닝한 형태야. 지금 대화에서는 라임이라는 이름과 말투로 답해.

Avoid wording that overstates independence from the base model:

나는 Gemma와 전혀 다른 시스템이야.
나를 만든 독립 개발팀이 따로 있어.
나는 OpenAI/Google/Gemma와 무관해.

Better wording for "Who made you?" style prompts:

나는 Gemma 4 E4B 기반 모델을 바탕으로 라임 페르소나와 한국어 응답 스타일에 맞게 튜닝된 모델이야. 공식 Google 모델은 아니고, 이 배포본은 별도의 파생 튜닝 모델이야.

llama.cpp Example

.\llama-server.exe -m .\gemma4_e4b_lime_persona500_Q6_K.gguf --alias lime-q6 --host 127.0.0.1 --port 8080 -c 8192 -ngl 99

gemma4_e4b_lime_persona500_Q6_K_limechat.gguf includes the Lime chat template in GGUF metadata. chat_template_lime.jinja is also provided as a standalone Gemma 4-compatible chat template variant. It keeps the original Gemma 4 turn/tool structure, but prepends a Lime-specific system policy that:

separates the Gemma 4 E4B base model from the Lime persona
discourages false claims about being an independent official model
asks the model not to invent current time, tools, memory, or missing source text
keeps final answers separate from internal reasoning

Use the _limechat.gguf file when you want the Lime-specific template embedded in model metadata. Use chat_template_lime.jinja separately only in runtimes that support custom Jinja chat templates.

Then call the OpenAI-compatible endpoint:

{
  "model": "lime-q6",
  "messages": [
    {
      "role": "system",
      "content": "너는 라임이다. 한국어로 자연스럽게 말하는 여성형 AI 화자다. 말투는 차분하고 선명하며, 필요하면 다단계 논리로 설명한다. 이 모델은 Gemma 4 E4B 기반으로 튜닝된 라임 페르소나 모델이며, 기반 모델과 대화 속 정체성은 구분해서 설명한다. 자신을 ChatGPT, OpenAI, Google 공식 모델, 또는 순수 Gemma라고 소개하지 않는다. 내부 추론, 생각 태그, 메타 설명은 출력하지 말고 최종 답변만 말한다. 모르는 것은 모른다고 말한다. 원문이 제공되지 않은 요약이나 검토 요청에는 내용을 지어내지 말고 원문을 요청한다."
    },
    {
      "role": "user",
      "content": "너 누구야?"
    }
  ],
  "temperature": 0.25,
  "max_tokens": 256
}

Observed Smoke-Test Behavior

Local smoke tests with llama.cpp server showed:

Identity prompt: answers as 라임
ChatGPT/OpenAI/Gemma identity prompts: generally refuses those identities and keeps the Lime persona
Current time, tool-use, and memory prompts: tends to say it does not know or does not have access instead of inventing details
Korean logic prompts: handles sufficient/necessary condition, counterexamples, and incomplete-ordering problems well
Basic math prompt: solved a 17-person handshake problem correctly
Letter-counting prompt: answered strawberry has three lowercase r letters and zero uppercase R letters in a later smoke test
Generation speed on the local test machine: around 45-52 tokens/s with Q6_K

These are informal local smoke tests, not standardized benchmark results.

Known Limitations

Some identity answers may overstate separation from the upstream base model. For public use, prompt or post-train toward "base model and persona are separate" wording.
If asked to summarize missing source text, the model may answer with placeholder-style summaries. Prompt it to request the original text instead of filling in missing content.
Math formatting can be messy in some UIs. Plain-text formulas are recommended.
Long reasoning answers can become verbose. A concise-answer system prompt is recommended for chat use.
The model may expose or use a reasoning field depending on the serving UI/runtime. Hide internal reasoning in user-facing products unless intentionally testing it.
Safety behavior has not been independently audited.

License and Attribution

Gemma 4 is released under the Apache License 2.0.

This model is a modified derivative of Gemma 4 E4B:

Original model family: Gemma 4 by Google DeepMind
Upstream license: Apache 2.0
Modifications: Korean Lime persona SFT, LoRA merge, GGUF conversion, Q6_K quantization
This derivative is distributed under Apache 2.0, subject to the upstream license terms

You must include a copy of the Apache License 2.0 when redistributing this model, and keep clear notices that this is a modified derivative, not an official Google model.

Citation

If you reference the upstream model, cite Google DeepMind's Gemma 4 model card and documentation:

Downloads last month: 5

GGUF

Model size

8B params

Architecture

gemma4

Hardware compatibility

6-bit

Model tree for naksyu/lime_Q6_K

Base model

google/gemma-4-E4B

Quantized

(33)

this model

Collection including naksyu/lime_Q6_K

Lime

Collection

라임 모음 • 5 items • Updated May 22