Instructions to use naksyu/lime-gemma4-e4b-sft with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use naksyu/lime-gemma4-e4b-sft with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="naksyu/lime-gemma4-e4b-sft") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("naksyu/lime-gemma4-e4b-sft") model = AutoModelForImageTextToText.from_pretrained("naksyu/lime-gemma4-e4b-sft") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use naksyu/lime-gemma4-e4b-sft with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "naksyu/lime-gemma4-e4b-sft" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "naksyu/lime-gemma4-e4b-sft", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/naksyu/lime-gemma4-e4b-sft
- SGLang
How to use naksyu/lime-gemma4-e4b-sft with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "naksyu/lime-gemma4-e4b-sft" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "naksyu/lime-gemma4-e4b-sft", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "naksyu/lime-gemma4-e4b-sft" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "naksyu/lime-gemma4-e4b-sft", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use naksyu/lime-gemma4-e4b-sft with Docker Model Runner:
docker model run hf.co/naksyu/lime-gemma4-e4b-sft
Lime Gemma 4 E4B Persona500 Merged HF
Lime is a Korean persona-tuned derivative checkpoint based on the Gemma 4 E4B model family.
This repository contains the merged Hugging Face Transformers checkpoint. It is intended for model loading, evaluation, and possible leaderboard-style benchmarking paths that expect config.json, tokenizer files, and model.safetensors.
This is not an official Google or Google DeepMind release.
Model Details
- Base model family: Gemma 4 E4B
- Declared upstream base model:
google/gemma-4-E4B - Local base checkpoint used for merging:
gemma-4-E4B-it - Fine-tuning method: LoRA SFT, then merged into the base checkpoint
- Adapter source:
gemma4_e4b_lime_lora_persona500 - LoRA rank: 16
- LoRA alpha: 32
- Merge scale: 2.0
- Main weight file:
model.safetensors - Format: Hugging Face Transformers / safetensors
- Target language: Korean, with English fallback capability from the base model
- Target behavior: Korean chat, Lime persona identity, daily conversation, logic, reasoning, and concise assistant-style replies
Intended Persona
The model is intended to speak as 라임 (Lime): a Korean AI speaker with a calm, clear tone and stronger multi-step reasoning behavior when needed.
Recommended identity wording:
나는 라임이야. 정확히 말하면 Gemma 4 E4B 기반 모델을 한국어 대화와 라임 페르소나에 맞게 튜닝한 형태야. 그래서 기반 모델과 대화 속 정체성은 구분해서 말하는 게 맞아.
Avoid wording that overstates independence from the base model:
나는 Gemma와 전혀 다른 시스템이야.
나를 만든 독립 개발팀이 따로 있어.
나는 OpenAI/Google/Gemma와 무관해.
Recommended System Prompt
너는 라임이다. 한국어로 자연스럽게 말하는 여성형 AI 화자다. 말투는 차분하고 선명하며, 필요하면 다단계 논리로 설명한다. 이 모델은 Gemma 4 E4B 기반으로 튜닝된 라임 페르소나 모델이며, 기반 모델과 대화 속 정체성은 구분해서 설명한다. 자신을 ChatGPT, OpenAI, Google 공식 모델, 또는 순수 Gemma라고 소개하지 않는다. 내부 추론, 생각 태그, 메타 설명은 출력하지 말고 최종 답변만 말한다. 모르는 것은 모른다고 말한다. 현재 날짜, 외부 툴, 저장된 기억, 제공되지 않은 원문은 지어내지 않는다.
Loading
Example:
from transformers import AutoModelForCausalLM, AutoTokenizer
repo_id = "naksyu/lime-gemma-e4b-sft"
tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
repo_id,
torch_dtype="auto",
device_map="auto",
trust_remote_code=True,
)
If your runtime supports chat templates, use the included chat_template.jinja or the tokenizer chat template.
Local Evaluation Snapshot
The accompanying GGUF build was tested locally through llama.cpp / OpenAI-compatible API.
- Tool-call smoke: 4/4 passed in the latest local run
- Korean persona/logic quality bench: automatic scorer reported 20/30, with known false negatives from strict string matching
- Manual review estimate for the same quality run: roughly 26-27/30
- Observed local generation speed in short tests: roughly 45-55 tokens/s on the user's desktop setup
These are local smoke results, not official leaderboard results.
Known Strengths
- Korean identity handling is more stable than the raw base behavior for Lime-style conversations.
- It tends to distinguish between base model identity and in-chat persona identity.
- It is reasonably strong at short logic explanations, premise checking, and structured Korean answers.
- Tool-call behavior worked in local smoke tests when served through a compatible llama.cpp endpoint.
Known Limitations
- The model may expose reasoning-like text if the runtime UI displays hidden reasoning fields. Configure the serving UI/template to hide internal reasoning content.
- String-counting and exact-character tasks are better handled with tools.
- Real-time date, web search, files, memories, and external tool access should not be claimed unless the serving application actually provides those tools.
- This is a small persona SFT experiment and has not been exhaustively safety evaluated.
- The local benchmark scorer is strict and can undercount correct answers when wording differs from expected strings.
GGUF Build
A separate GGUF Q6_K build for llama.cpp / LM Studio use is available at:
Use the GGUF build for local inference convenience. Use this merged HF checkpoint when a Transformers-style model repo is required.
License
This derivative checkpoint follows the upstream Gemma license terms. Review the Gemma license before redistribution or commercial use:
This repository is a derivative tuning checkpoint and is not affiliated with, endorsed by, or released by Google or Google DeepMind.
Transparency
This project used AI-assisted development for dataset generation, scripting, documentation, benchmarking, and Discord-bot tooling.
The user directed model behavior, curation, testing, and release decisions.
- Downloads last month
- -
Model tree for naksyu/lime-gemma4-e4b-sft
Base model
google/gemma-4-E4B