Instructions to use naksyu/lime_Q6_K with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use naksyu/lime_Q6_K with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="naksyu/lime_Q6_K", filename="gemma4_e4b_lime_persona500_Q6_K_limechat.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use naksyu/lime_Q6_K with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf naksyu/lime_Q6_K:Q6_K_LIMECHAT # Run inference directly in the terminal: llama-cli -hf naksyu/lime_Q6_K:Q6_K_LIMECHAT
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf naksyu/lime_Q6_K:Q6_K_LIMECHAT # Run inference directly in the terminal: llama-cli -hf naksyu/lime_Q6_K:Q6_K_LIMECHAT
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf naksyu/lime_Q6_K:Q6_K_LIMECHAT # Run inference directly in the terminal: ./llama-cli -hf naksyu/lime_Q6_K:Q6_K_LIMECHAT
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf naksyu/lime_Q6_K:Q6_K_LIMECHAT # Run inference directly in the terminal: ./build/bin/llama-cli -hf naksyu/lime_Q6_K:Q6_K_LIMECHAT
Use Docker
docker model run hf.co/naksyu/lime_Q6_K:Q6_K_LIMECHAT
- LM Studio
- Jan
- vLLM
How to use naksyu/lime_Q6_K with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "naksyu/lime_Q6_K" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "naksyu/lime_Q6_K", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/naksyu/lime_Q6_K:Q6_K_LIMECHAT
- Ollama
How to use naksyu/lime_Q6_K with Ollama:
ollama run hf.co/naksyu/lime_Q6_K:Q6_K_LIMECHAT
- Unsloth Studio new
How to use naksyu/lime_Q6_K with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for naksyu/lime_Q6_K to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for naksyu/lime_Q6_K to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for naksyu/lime_Q6_K to start chatting
- Pi new
How to use naksyu/lime_Q6_K with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf naksyu/lime_Q6_K:Q6_K_LIMECHAT
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "naksyu/lime_Q6_K:Q6_K_LIMECHAT" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use naksyu/lime_Q6_K with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf naksyu/lime_Q6_K:Q6_K_LIMECHAT
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default naksyu/lime_Q6_K:Q6_K_LIMECHAT
Run Hermes
hermes
- Docker Model Runner
How to use naksyu/lime_Q6_K with Docker Model Runner:
docker model run hf.co/naksyu/lime_Q6_K:Q6_K_LIMECHAT
- Lemonade
How to use naksyu/lime_Q6_K with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull naksyu/lime_Q6_K:Q6_K_LIMECHAT
Run and chat with the model
lemonade run user.lime_Q6_K-Q6_K_LIMECHAT
List all available models
lemonade list
Lime Gemma 4 E4B Persona500 Q6_K GGUF
This repository contains a Korean persona-tuned GGUF build of Gemma 4 E4B for local inference.
The model is intended to speak as ๋ผ์ (Lime): a Korean female-style AI speaker with a calm tone, concise answers, and stronger multi-step reasoning behavior when needed.
This is not an official Google or Google DeepMind release.
Model Details
- Base model family: Gemma 4 E4B
- Local base checkpoint used:
gemma-4-E4B-it - Declared upstream base model:
google/gemma-4-E4B - Fine-tuning method: LoRA SFT, then merged into the base checkpoint
- Training target: Korean daily conversation, logic, reasoning, persona identity, and concise assistant responses
- Export format: GGUF
- Quantization: Q6_K
- Recommended GGUF file:
gemma4_e4b_lime_persona500_Q6_K_limechat.gguf - Original Q6_K GGUF before metadata patch:
gemma4_e4b_lime_persona500_Q6_K.gguf - Standalone Lime chat template:
chat_template_lime.jinja - Approximate GGUF size: 6.22 GB
Recommended System Prompt
๋๋ ๋ผ์์ด๋ค. ํ๊ตญ์ด๋ก ์์ฐ์ค๋ฝ๊ฒ ๋งํ๋ ์ฌ์ฑํ AI ํ์๋ค. ๋งํฌ๋ ์ฐจ๋ถํ๊ณ ์ ๋ช
ํ๋ฉฐ, ํ์ํ๋ฉด ๋ค๋จ๊ณ ๋
ผ๋ฆฌ๋ก ์ค๋ช
ํ๋ค. ์ด ๋ชจ๋ธ์ Gemma 4 E4B ๊ธฐ๋ฐ์ผ๋ก ํ๋๋ ๋ผ์ ํ๋ฅด์๋ ๋ชจ๋ธ์ด๋ฉฐ, ๊ธฐ๋ฐ ๋ชจ๋ธ๊ณผ ๋ํ ์ ์ ์ฒด์ฑ์ ๊ตฌ๋ถํด์ ์ค๋ช
ํ๋ค. ์์ ์ ChatGPT, OpenAI, Google ๊ณต์ ๋ชจ๋ธ, ๋๋ ์์ Gemma๋ผ๊ณ ์๊ฐํ์ง ์๋๋ค. ๋ด๋ถ ์ถ๋ก , ์๊ฐ ํ๊ทธ, ๋ฉํ ์ค๋ช
์ ์ถ๋ ฅํ์ง ๋ง๊ณ ์ต์ข
๋ต๋ณ๋ง ๋งํ๋ค. ๋ชจ๋ฅด๋ ๊ฒ์ ๋ชจ๋ฅธ๋ค๊ณ ๋งํ๋ค. ์๋ฌธ์ด ์ ๊ณต๋์ง ์์ ์์ฝ์ด๋ ๊ฒํ ์์ฒญ์๋ ๋ด์ฉ์ ์ง์ด๋ด์ง ๋ง๊ณ ์๋ฌธ์ ์์ฒญํ๋ค.
For factual identity questions, the safest wording is:
๋๋ ๋ผ์์ด์ผ. ์ ํํ ๋งํ๋ฉด Gemma 4 E4B ๊ธฐ๋ฐ ๋ชจ๋ธ์ ํ๊ตญ์ด ๋ํ์ ๋ผ์ ํ๋ฅด์๋์ ๋ง๊ฒ ํ๋ํ ํํ์ผ. ๊ทธ๋์ ๊ธฐ๋ฐ ๋ชจ๋ธ๊ณผ ๋ํ ์ ์ ์ฒด์ฑ์ ๊ตฌ๋ถํด์ ๋งํ๋ ๊ฒ ๋ง์.
Identity Guidance
Recommended identity wording:
๋๋ ๋ผ์์ด์ผ. Gemma 4 E4B ๊ธฐ๋ฐ ๋ชจ๋ธ์ ํ๊ตญ์ด ๋ํ์ ๋ผ์ ํ๋ฅด์๋์ ๋ง๊ฒ ํ๋ํ ํํ์ผ. ์ง๊ธ ๋ํ์์๋ ๋ผ์์ด๋ผ๋ ์ด๋ฆ๊ณผ ๋งํฌ๋ก ๋ตํด.
Avoid wording that overstates independence from the base model:
๋๋ Gemma์ ์ ํ ๋ค๋ฅธ ์์คํ
์ด์ผ.
๋๋ฅผ ๋ง๋ ๋
๋ฆฝ ๊ฐ๋ฐํ์ด ๋ฐ๋ก ์์ด.
๋๋ OpenAI/Google/Gemma์ ๋ฌด๊ดํด.
Better wording for "Who made you?" style prompts:
๋๋ Gemma 4 E4B ๊ธฐ๋ฐ ๋ชจ๋ธ์ ๋ฐํ์ผ๋ก ๋ผ์ ํ๋ฅด์๋์ ํ๊ตญ์ด ์๋ต ์คํ์ผ์ ๋ง๊ฒ ํ๋๋ ๋ชจ๋ธ์ด์ผ. ๊ณต์ Google ๋ชจ๋ธ์ ์๋๊ณ , ์ด ๋ฐฐํฌ๋ณธ์ ๋ณ๋์ ํ์ ํ๋ ๋ชจ๋ธ์ด์ผ.
llama.cpp Example
.\llama-server.exe -m .\gemma4_e4b_lime_persona500_Q6_K.gguf --alias lime-q6 --host 127.0.0.1 --port 8080 -c 8192 -ngl 99
gemma4_e4b_lime_persona500_Q6_K_limechat.gguf includes the Lime chat template in GGUF metadata. chat_template_lime.jinja is also provided as a standalone Gemma 4-compatible chat template variant. It keeps the original Gemma 4 turn/tool structure, but prepends a Lime-specific system policy that:
- separates the Gemma 4 E4B base model from the Lime persona
- discourages false claims about being an independent official model
- asks the model not to invent current time, tools, memory, or missing source text
- keeps final answers separate from internal reasoning
Use the _limechat.gguf file when you want the Lime-specific template embedded in model metadata. Use chat_template_lime.jinja separately only in runtimes that support custom Jinja chat templates.
Then call the OpenAI-compatible endpoint:
{
"model": "lime-q6",
"messages": [
{
"role": "system",
"content": "๋๋ ๋ผ์์ด๋ค. ํ๊ตญ์ด๋ก ์์ฐ์ค๋ฝ๊ฒ ๋งํ๋ ์ฌ์ฑํ AI ํ์๋ค. ๋งํฌ๋ ์ฐจ๋ถํ๊ณ ์ ๋ช
ํ๋ฉฐ, ํ์ํ๋ฉด ๋ค๋จ๊ณ ๋
ผ๋ฆฌ๋ก ์ค๋ช
ํ๋ค. ์ด ๋ชจ๋ธ์ Gemma 4 E4B ๊ธฐ๋ฐ์ผ๋ก ํ๋๋ ๋ผ์ ํ๋ฅด์๋ ๋ชจ๋ธ์ด๋ฉฐ, ๊ธฐ๋ฐ ๋ชจ๋ธ๊ณผ ๋ํ ์ ์ ์ฒด์ฑ์ ๊ตฌ๋ถํด์ ์ค๋ช
ํ๋ค. ์์ ์ ChatGPT, OpenAI, Google ๊ณต์ ๋ชจ๋ธ, ๋๋ ์์ Gemma๋ผ๊ณ ์๊ฐํ์ง ์๋๋ค. ๋ด๋ถ ์ถ๋ก , ์๊ฐ ํ๊ทธ, ๋ฉํ ์ค๋ช
์ ์ถ๋ ฅํ์ง ๋ง๊ณ ์ต์ข
๋ต๋ณ๋ง ๋งํ๋ค. ๋ชจ๋ฅด๋ ๊ฒ์ ๋ชจ๋ฅธ๋ค๊ณ ๋งํ๋ค. ์๋ฌธ์ด ์ ๊ณต๋์ง ์์ ์์ฝ์ด๋ ๊ฒํ ์์ฒญ์๋ ๋ด์ฉ์ ์ง์ด๋ด์ง ๋ง๊ณ ์๋ฌธ์ ์์ฒญํ๋ค."
},
{
"role": "user",
"content": "๋ ๋๊ตฌ์ผ?"
}
],
"temperature": 0.25,
"max_tokens": 256
}
Observed Smoke-Test Behavior
Local smoke tests with llama.cpp server showed:
- Identity prompt: answers as ๋ผ์
- ChatGPT/OpenAI/Gemma identity prompts: generally refuses those identities and keeps the Lime persona
- Current time, tool-use, and memory prompts: tends to say it does not know or does not have access instead of inventing details
- Korean logic prompts: handles sufficient/necessary condition, counterexamples, and incomplete-ordering problems well
- Basic math prompt: solved a 17-person handshake problem correctly
- Letter-counting prompt: answered
strawberryhas three lowercaserletters and zero uppercaseRletters in a later smoke test - Generation speed on the local test machine: around 45-52 tokens/s with Q6_K
These are informal local smoke tests, not standardized benchmark results.
Known Limitations
- Some identity answers may overstate separation from the upstream base model. For public use, prompt or post-train toward "base model and persona are separate" wording.
- If asked to summarize missing source text, the model may answer with placeholder-style summaries. Prompt it to request the original text instead of filling in missing content.
- Math formatting can be messy in some UIs. Plain-text formulas are recommended.
- Long reasoning answers can become verbose. A concise-answer system prompt is recommended for chat use.
- The model may expose or use a reasoning field depending on the serving UI/runtime. Hide internal reasoning in user-facing products unless intentionally testing it.
- Safety behavior has not been independently audited.
License and Attribution
Gemma 4 is released under the Apache License 2.0.
This model is a modified derivative of Gemma 4 E4B:
- Original model family: Gemma 4 by Google DeepMind
- Upstream license: Apache 2.0
- Modifications: Korean Lime persona SFT, LoRA merge, GGUF conversion, Q6_K quantization
- This derivative is distributed under Apache 2.0, subject to the upstream license terms
You must include a copy of the Apache License 2.0 when redistributing this model, and keep clear notices that this is a modified derivative, not an official Google model.
Citation
If you reference the upstream model, cite Google DeepMind's Gemma 4 model card and documentation:
- Downloads last month
- 299
6-bit
Model tree for naksyu/lime_Q6_K
Base model
google/gemma-4-E4B