Instructions to use DatarrX/myX-TransStyle-W2S with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use DatarrX/myX-TransStyle-W2S with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="DatarrX/myX-TransStyle-W2S")

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("DatarrX/myX-TransStyle-W2S")
model = AutoModelForMultimodalLM.from_pretrained("DatarrX/myX-TransStyle-W2S")

PEFT
How to use DatarrX/myX-TransStyle-W2S with PEFT:
```
Task type is invalid.
```
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use DatarrX/myX-TransStyle-W2S with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "DatarrX/myX-TransStyle-W2S"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "DatarrX/myX-TransStyle-W2S",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/DatarrX/myX-TransStyle-W2S

SGLang

How to use DatarrX/myX-TransStyle-W2S with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "DatarrX/myX-TransStyle-W2S" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "DatarrX/myX-TransStyle-W2S",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "DatarrX/myX-TransStyle-W2S" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "DatarrX/myX-TransStyle-W2S",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use DatarrX/myX-TransStyle-W2S with Docker Model Runner:
```
docker model run hf.co/DatarrX/myX-TransStyle-W2S
```

kalixlouiis commited on Apr 26

Commit

4d0c805

verified ·

1 Parent(s): 1308b79

Update README.md

Browse files

Files changed (1) hide show

README.md +134 -1

README.md CHANGED Viewed

@@ -61,4 +61,137 @@ model-index:
             value: 0.9693
             name: BERTScore F1
----

             value: 0.9693
             name: BERTScore F1
+---
+# 📝 myX-TransStyle-W2S: A Transformer-based Style Transfer for Myanmar Written (ရေးဟန်) to Spoken (ပြောဟန်)
+**myX-TransStyle-W2S** is a specialized Sequence-to-Sequence (Seq2Seq) model developed by **Khant Sint Heinn (Kalix Louis)** under **DatarrX**. It is specifically designed to transform formal **Written Burmese (ရေးဟန်)** into its natural colloquial **Spoken Burmese (ပြောဟန်)** counterpart. This model ensures that formal documents or news can be converted into fluid, human-like dialogue while maintaining 100% semantic integrity.
+## Model Details
+- **Developed by:** [Khant Sint Heinn (Kalix Louis)](https://huggingface.co/kalixlouiis)
+- **Organization:** [DatarrX | ဒေတာ-အက်စ်](https://huggingface.co/DatarrX)
+- **Model Architecture:** Fine-tuned NLLB-200 (600M Distilled) with merged LoRA adapters
+- **Language:** Burmese (Myanmar)
+- **Task:** Text Style Transfer (Written → Spoken)
+- **License:** MIT
+- **Trained on:** [Myanmar Written-Spoken Parallel Corpus (MWSPC)](https://huggingface.co/datasets/DatarrX/Myanmar-Written-Spoken-Parallel-Corpus)
+---
+## Linguistic Context: The Diglossia Challenge
+Burmese is a **diglossic language**, featuring a major linguistic gap between two functional registers:
+* **Written Style (ရေးဟန်):** Used in news, law, textbooks, and officialdom. It relies on formal grammatical markers such as **"သည်"**, **"၏"**, and **"၍"**.
+* **Spoken Style (ပြောဟန်):** Used in daily life, verbal communication, and social media. It uses colloquial markers like **"တယ်"** (tense), **"ရဲ့"** (possessive), and **"နဲ့"** (conjunction).
+**myX-TransStyle-W2S** addresses the "robotic" nature of modern AI by allowing formal text to be localized into the natural, warm tone used by native speakers every day.
+---
+## Training Methodology
+The model was trained using an efficient adaptation strategy optimized for the unique structural shifts of Myanmar style.
+### 1. The Dataset ([MWSPC](https://huggingface.co/datasets/DatarrX/Myanmar-Written-Spoken-Parallel-Corpus))
+The model was trained on **5,555 high-quality, unique parallel text pairs**. This dataset provides a direct mapping from formal literary structures to their informal colloquial equivalents, filtered to ensure maximum diversity.
+### 2. Parameter-Efficient Fine-Tuning (PEFT)
+To capture nuanced stylistic shifts without overwriting the base model's linguistic depth, we utilized **Low-Rank Adaptation (LoRA)**:
+* **Target Modules:** `q_proj`, `k_proj`, `v_proj`, `out_proj`.
+* **Rank (R):** 32 | **Alpha:** 64.
+* **Learning Rate:** 8e-5 with a Cosine scheduler.
+### 3. Merging Strategy
+The LoRA adapters were merged into the base `nllb-200-distilled-600M` model using `merge_and_unload()`. The resulting standalone **2.8 GB** model provides high-speed inference without requiring the PEFT library.
+---
+## Evaluation Results
+The model was validated on **100 unseen test sentences** and showed superior performance compared to its S2W sibling.
+### Performance Metrics
+| Metric | Score | Interpretation |
+|---|---|---|
+| **BERTScore F1** | **0.9693** | Indicates near-perfect meaning preservation during style transfer. |
+| **chrF** | **78.40** | Exceptional character-level accuracy, specifically in converting formal suffixes. |
+| **BLEU** | **19.64** | Higher than S2W, reflecting a more consistent conversion pattern into spoken style. |
+### Qualitative Analysis
+Manual review by native speakers confirms the model's ability to not only swap particles but also adjust vocabulary (e.g., converting *“အလွန်ပင်”* to *“သိပ်”* or *“အကယ်ပင်”* to *“တကယ်လို့တောင်”*) in a way that feels authentic and human.
+---
+## How to Use
+```python
+from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
+# 1. Load the Merged Model
+model_id = "DatarrX/myX-TransStyle-W2S"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForSeq2SeqLM.from_pretrained(model_id)
+# 2. Prepare Input
+prefix = "Rewrite Burmese formal written sentence into spoken Burmese: "
+written_text = "ပုဂံခေတ်သည် မြန်မာနိုင်ငံသမိုင်းတွင် ပထမဆုံးသော အင်ပါယာနိုင်ငံတော်ကြီး ဖြစ်ခဲ့သည်။"
+input_text = prefix + written_text
+# 3. Generate Spoken Style
+inputs = tokenizer(input_text, return_tensors="pt")
+outputs = model.generate(
+    **inputs,
+    forced_bos_token_id=tokenizer.convert_tokens_to_ids("mya_Mymr"),
+    max_length=160,
+    num_beams=5
+)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+# Output: ပုဂံခေတ်က မြန်မာနိုင်ငံသမိုင်းမှာ ပထမဆုံး ���င်ပါယာနိုင်ငံတော်ကြီးဖြစ်ခဲ့တယ်။
+```
+---
+## Intended Use & Limitations
+### Use Cases
+- **Natural AI Personalities:** Converting formal bot responses into natural-sounding speech.
+- **Content Localization:** Making formal news or articles more accessible for audio/podcasts.
+- **Creative Writing:** Assisting authors in converting narrative descriptions into natural character dialogue.
+### Limitations
+- **Dialectal Focus:** Primarily focuses on the standard Yangon/Mandalay dialect; regional slang may be less represented.
+- **Contextual Nuance:** While meaning is preserved, the "warmth" of the spoken style may vary depending on the complexity of the input.
+## Citation
+### BibTeX
+```BibTeX
+@misc{myx_transstyle_w2s_2026,
+  author = {Khant Sint Heinn (Kalix Louis)},
+  title = {myX-TransStyle-W2S: A Written to Spoken Burmese Style Transfer Model},
+  year = {2026},
+  publisher = {Hugging Face},
+  organization = {DatarrX},
+  howpublished = {https://huggingface.co/DatarrX/myX-TransStyle-W2S}
+}
+```
+---
+## About the Author
+**Khant Sint Heinn**, working under the name **Kalix Louis**, is a **Machine Learning Engineer focused on Natural Language Processing (NLP), data foundations, and open-source AI development**. His work is centered on improving support for the Burmese (Myanmar) language in modern AI systems by building high-quality datasets, practical tools, and scalable infrastructure for language technology.
+He is currently the **Lead Developer at DatarrX**, where he develops data pipelines, manages large-scale data collection workflows, and helps create open-source resources for researchers, developers, and organizations. His experience includes data engineering, web scripting, dataset curation, and building systems that support real-world machine learning applications.
+Khant Sint Heinn is especially interested in advancing low-resource languages and making AI more accessible to underrepresented communities. Through his open-source contributions, he works to strengthen the Burmese (Myanmar) tech ecosystem and provide reliable building blocks for future language models, search systems, and intelligent applications.
+His goal is simple: to turn limited language resources into practical opportunities through clean data, useful tools, and community-driven innovation.
+**Connect with the Author:**
+[GitHub](https://github.com/kalixlouiis) | [Hugging Face](https://huggingface.co/kalixlouiis) | [Kaggle](https://www.kaggle.com/organizations/kalixlouiis)
+---
+*Developed with ❤️ by [DatarrX](https://huggingface.co/DatarrX) to empower the Myanmar AI ecosystem.*