Instructions to use Yvthyvq/liujgoj-cantonese-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Local Apps
- Unsloth Studio new
How to use Yvthyvq/liujgoj-cantonese-v1 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Yvthyvq/liujgoj-cantonese-v1 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Yvthyvq/liujgoj-cantonese-v1 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Yvthyvq/liujgoj-cantonese-v1 to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="Yvthyvq/liujgoj-cantonese-v1", max_seq_length=2048, )
Liujgoj Cantonese LLM (V1.0-Alpha)
🌟 Introduction
Liujgoj Cantonese LLM is a fine-tuned language model built for the Liujgoj (溜歌粵語) romanization system.
Liujgoj is more than a phonetic transcription method. It is an experimental writing system designed to represent spoken Cantonese through:
- word-based orthography
- Latin script writing
- efficient tone encoding
- native Cantonese expression
This model is trained to convert natural Cantonese Chinese text into standardized Liujgoj Romanization.
中文簡介
Liujgoj Cantonese LLM 係專為 溜歌粵語(Liujgoj)羅馬字系統 微調嘅語言模型。
Liujgoj 唔單止係拼音方案,而係一套以廣東話口語為核心、強調「單詞化」同「拉丁字母化」嘅書寫系統。
本模型主要用途係:
- 將廣東話漢字句子轉換為 Liujgoj 羅馬字
- 學習地道口語詞彙組合
- 處理溜歌獨特拼寫規則
- 推動廣東話 AI 書寫技術發展
🚀 Key Features
✅ Word-Based Orthography
Learns merged word forms instead of character-by-character output.
Examples:
- 食咗 →
sikhzor - 靚仔 →
lengzair - 做咩 →
zouh mej
✅ Tone-as-Letter System
Supports Liujgoj tone letters:
jrxqh
This allows compact tone representation without numbers.
✅ Native Cantonese Fluency
Training data is derived from authentic Hong Kong Cantonese dialogue, preserving:
- colloquial speech
- sentence particles
- slang usage
- real spoken rhythm
📊 Training Data
Dataset Size
30,082 high-quality instruction pairs
Sources
Curated from 60+ Hong Kong movie subtitle files (SRT) and converted through manual / semi-automatic annotation.
Format Example
{
"instruction": "Convert Cantonese Chinese into Liujgoj Romanization",
"input": "你食咗飯未啊?",
"output": "Neiq sikhzor faanh meih aa?"
}
🧠 Base Model
unsloth/Qwen2.5-7B-bnb-4bit
Fine-tuning Method
- LoRA
- Supervised Fine-Tuning (SFT)
- Unsloth optimized training pipeline
🛠️ Usage
Python Example
from unsloth import FastLanguageModel
import torch
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "yvthyvq/liujgoj-cantonese-lora",
max_seq_length = 2048,
load_in_4bit = True,
)
FastLanguageModel.for_inference(model)
prompt = """Below is an instruction that describes a task.
### Instruction:
Convert Cantonese Chinese into Liujgoj Romanization.
### Input:
{}
### Response:
"""
inputs = tokenizer(
[prompt.format("你食咗飯未啊?")],
return_tensors="pt"
).to("cuda")
outputs = model.generate(**inputs, max_new_tokens=64)
print(tokenizer.batch_decode(outputs))
💬 Example
Input
你食咗飯未啊?
Output
Neiq sikhzor faanh meih aa?
⚠️ Limitations
This is an Alpha release.
Current limitations may include:
- unseen slang words
- long-context instability
- occasional spelling inconsistency
- hallucination on unrelated tasks
Recommended primarily for:
- Cantonese romanization
- Liujgoj experiments
- linguistic research
- niche Cantonese NLP tasks
🗺️ Roadmap
V1.0 Alpha
- Initial public release
- Cantonese Hanzi → Liujgoj conversion
V2.0 Planned
- improved accuracy
- better segmentation
- stronger instruction following
- broader vocabulary coverage
Future Goals
- Liujgoj chat assistant
- speech alignment
- grammar tools
- full Cantonese-native LLM ecosystem
🏷️ Tags
cantonese yue romanization liujgoj lora qwen unsloth linguistics
🙌 Author
Created by Yvthyvq
Building language technology for Cantonese and Liujgoj.