Instructions to use acidsound/gyeongsang_dialect_gemma-4-e4b-it-LoRA with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use acidsound/gyeongsang_dialect_gemma-4-e4b-it-LoRA with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("google/gemma-4-e4b-it") model = PeftModel.from_pretrained(base_model, "acidsound/gyeongsang_dialect_gemma-4-e4b-it-LoRA") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- Unsloth Studio new
How to use acidsound/gyeongsang_dialect_gemma-4-e4b-it-LoRA with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for acidsound/gyeongsang_dialect_gemma-4-e4b-it-LoRA to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for acidsound/gyeongsang_dialect_gemma-4-e4b-it-LoRA to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for acidsound/gyeongsang_dialect_gemma-4-e4b-it-LoRA to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="acidsound/gyeongsang_dialect_gemma-4-e4b-it-LoRA", max_seq_length=2048, )
Gyeongsang Dialect to Standard Korean LoRA for Gemma 4 E4B
This repository contains a LoRA adapter fine-tuned from google/gemma-4-e4b-it for a narrow text-rewriting task:
- input: Gyeongsang dialect Korean sentence
- output: standard Korean sentence
- goal: preserve meaning and avoid unnecessary additions or omissions
This is an adapter-only release. You need the base model google/gemma-4-e4b-it to use it.
Model Details
- Base model:
google/gemma-4-e4b-it - Adapter type:
PEFT LoRA - LoRA rank:
16 - LoRA alpha:
16 - LoRA dropout:
0.0 - Training stack:
Unsloth + PEFT + Transformers - Primary language: Korean
- Primary task: dialect normalization / standardization
Intended Use
This adapter is intended for:
- converting Gyeongsang dialect utterances into standard Korean
- deterministic rewrite workflows
- offline batch normalization pipelines
- experimentation inside Hugging Face, Transformers, or Unsloth Studio
This adapter is not intended for:
- open-ended chat
- long-form reasoning
- instruction following outside the rewrite task
- high-stakes legal, medical, or financial usage without human review
Prompt Format
The best-performing eval path used a plain single-turn rewrite prompt with a strict instruction not to add reasoning.
Minimal usage pattern:
๋น์ ์ ๊ฒฝ์๋ ๋ฐฉ์ธ์ ํ์ค ํ๊ตญ์ด๋ก ๋ฐ๊พธ๋ ์ ๋ฌธ๊ฐ์
๋๋ค.
์๋ฏธ๋ฅผ ๋ณด์กดํ๊ณ , ๋ถํ์ํ ์ถ๊ฐ๋ ์๋ต ์์ด ํ์ค์ด ๋ฌธ์ฅ๋ง ๋ตํ์ธ์.
๋ฐฉ์ธ: <input sentence>
ํ์ค์ด:
Greedy decoding worked better than sampling for this task.
Evaluation Summary
Final full-run evaluation was performed on internal dev, test, and hard_test splits.
| split | rows | char_similarity | critical_error_rate | number_preservation | format_contamination_rate |
|---|---|---|---|---|---|
| dev | 4002 | 0.9436 | 0.0265 | 1.0000 | 0.0000 |
| test | 4001 | 0.9433 | 0.0280 | 0.9615 | 0.0000 |
| hard_test | 2048 | 0.9337 | 0.0381 | 1.0000 | 0.0000 |
Interpretation:
- rewrite quality is strong on the held-out evaluation splits
- number preservation remained strong in the final E4B run
- prompt-format contamination was reduced to zero in final evaluation
Training Summary
- Final training rows:
51719 - Eval rows during training run summary:
4002 - Training runtime:
3612.06s - Final recorded train loss:
0.2783 - Dataset format:
prompt_completion_text - Prompt variant:
strict - Completion-only loss:
True
The adapter was selected from an adaptive loop that tuned prompt format, eval format, LoRA settings, and recovery behavior across multiple iterations before the final full run.
Data
The training task uses Korean dialect-to-standard sentence pairs derived from Gyeongsang dialect data.
High-level properties:
- sentence-level rewrite pairs
- source side: dialect Korean
- target side: standard Korean
- task emphasis: minimal-edit normalization with meaning preservation
Limitations
- This model is specialized for Gyeongsang dialect normalization and may not transfer well to other dialects.
- It is optimized for short-to-medium sentence rewriting, not general conversation.
- It may still make lexical or phrasing errors on rare dialect forms.
- It should not be treated as a factual QA or reasoning model.
Usage
Example with transformers + peft:
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
base_model_id = "google/gemma-4-e4b-it"
adapter_id = "acidsound/gyeongsang_dialect_gemma-4-e4b-it-LoRA"
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
base_model = AutoModelForCausalLM.from_pretrained(
base_model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
model = PeftModel.from_pretrained(base_model, adapter_id)
prompt = (
"๋น์ ์ ๊ฒฝ์๋ ๋ฐฉ์ธ์ ํ์ค ํ๊ตญ์ด๋ก ๋ฐ๊พธ๋ ์ ๋ฌธ๊ฐ์
๋๋ค.\\n"
"์๋ฏธ๋ฅผ ๋ณด์กดํ๊ณ , ๋ถํ์ํ ์ถ๊ฐ๋ ์๋ต ์์ด ํ์ค์ด ๋ฌธ์ฅ๋ง ๋ตํ์ธ์.\\n\\n"
"๋ฐฉ์ธ: ์ ์ด๋ฆฌ ์ถฅ๋
ธ ๋ฐ์ ๋ฐ๋์ด ์ต์๋ก ๋ถ๋ค\\n"
"ํ์ค์ด:"
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=256,
do_sample=False,
repetition_penalty=1.05,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Notes
- Base model access, usage terms, and license follow the upstream
google/gemma-4-e4b-itmodel. - This repository contains only the LoRA adapter and tokenizer/processor side files needed for the fine-tuned setup.
- Downloads last month
- 1