Instructions to use DatarrX/myX-TransStyle-S2W with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use DatarrX/myX-TransStyle-S2W with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="DatarrX/myX-TransStyle-S2W")

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("DatarrX/myX-TransStyle-S2W")
model = AutoModelForMultimodalLM.from_pretrained("DatarrX/myX-TransStyle-S2W")

PEFT
How to use DatarrX/myX-TransStyle-S2W with PEFT:
```
Task type is invalid.
```
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use DatarrX/myX-TransStyle-S2W with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "DatarrX/myX-TransStyle-S2W"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "DatarrX/myX-TransStyle-S2W",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/DatarrX/myX-TransStyle-S2W

SGLang

How to use DatarrX/myX-TransStyle-S2W with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "DatarrX/myX-TransStyle-S2W" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "DatarrX/myX-TransStyle-S2W",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "DatarrX/myX-TransStyle-S2W" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "DatarrX/myX-TransStyle-S2W",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use DatarrX/myX-TransStyle-S2W with Docker Model Runner:
```
docker model run hf.co/DatarrX/myX-TransStyle-S2W
```

myX-TransStyle-S2W / README.md

kalixlouiis

Update README.md

b385467 verified about 2 months ago

preview code

raw

history blame contribute delete

9.17 kB

	---
	license: mit

	datasets:
	- DatarrX/Myanmar-Written-Spoken-Parallel-Corpus

	language:
	- my

	metrics:
	- bleu
	- chrf
	- ter
	- bertscore

	base_model:
	- facebook/nllb-200-distilled-600M

	pipeline_tag: text-generation

	library_name: transformers

	tags:
	- burmese
	- myanmar
	- myanmar-language
	- burmese-nlp
	- style-transfer
	- text-rewriting
	- informal-to-formal
	- spoken-to-written
	- seq2seq
	- nllb
	- lora
	- peft
	- low-resource-language
	- text-generation

	model-index:
	- name: myX-TransStyle-S2W
	results:
	- task:
	type: text-generation
	name: Burmese Style Transfer (Spoken to Written)
	dataset:
	name: Custom External Test Set
	type: csv
	config: default
	split: test
	metrics:
	- type: bleu
	value: 12.9445
	name: BLEU
	- type: chrf
	value: 75.5601
	name: chrF
	- type: ter
	value: 58.0189
	name: TER
	- type: bertscore
	value: 0.9685
	name: BERTScore F1

	---
	# 📝 myX-TransStyle-S2W: A Transformer-based Style Transfer for Myanmar Spoken (ပြောဟန်) to Written (ရေးဟန်)

	myX-TransStyle-S2W is a specialized Sequence-to-Sequence (Seq2Seq) model developed by Khant Sint Heinn (Kalix Louis) under DatarrX. It is designed to transform colloquial Spoken Burmese (ပြောဟန်) into its formal Written Burmese (ရေးဟန်) counterpart while strictly preserving the original semantic meaning.

	## Model Details

	- Developed by: [Khant Sint Heinn (Kalix Louis)](https://huggingface.co/kalixlouiis)
	- Organization: [DatarrX \| ဒေတာ-အက်စ်](https://huggingface.co/DatarrX)
	- Model Architecture: Fine-tuned NLLB-200 (600M Distilled) with merged LoRA adapters
	- Language: Burmese (Myanmar)
	- Task: Text Style Transfer (Spoken → Written)
	- License: MIT
	- Trained on: [Myanmar Written-Spoken Parallel Corpus (MWSPC)](https://huggingface.co/datasets/DatarrX/Myanmar-Written-Spoken-Parallel-Corpus)

	---

	## Linguistic Context: The Diglossia Challenge

	Burmese is a diglossic language, characterized by a sharp divide between two distinct registers. Understanding this is crucial for effective Myanmar NLP:

	* Spoken Style (ပြောဟန်): Used in daily life, social media, and verbal communication. It relies on colloquial grammatical markers like "တယ်" (tense) or "ရဲ့" (possessive).
	* Written Style (ရေးဟန်): The standard for news, law, textbooks, and officialdom. It uses formal markers such as "သည်", "၏", and "၍".

	Most existing AI models sound "robotic" because they are trained primarily on formal web-scraped data. myX-TransStyle-S2W bridges this gap by enabling AI to convert natural spoken input into grammatically correct formal documentation.

	---

	## Training Methodology

	The model was trained using an efficient yet powerful adaptation strategy to handle the nuances of Myanmar grammar.

	### 1. The Dataset ([MWSPC](https://huggingface.co/datasets/DatarrX/Myanmar-Written-Spoken-Parallel-Corpus))
	We utilized 5,555 high-quality, unique parallel text pairs from the [MWSPC dataset](https://huggingface.co/datasets/DatarrX/Myanmar-Written-Spoken-Parallel-Corpus). This dataset provides a direct mapping between informal and formal structures, curated specifically to remove duplicates and ensure linguistic diversity.

	### 2. Parameter-Efficient Fine-Tuning (PEFT)
	To capture complex structural transformations without losing the base model's knowledge, we used Low-Rank Adaptation (LoRA):
	* Target Modules: `q_proj`, `k_proj`, `v_proj`, `out_proj`.
	* Rank (R): 32 \| Alpha: 64.
	* Learning Rate: 8e-5 with a Cosine scheduler.

	### 3. Merging Strategy
	After training, the LoRA weights were merged back into the base `nllb-200-distilled-600M` model using `merge_and_unload()`. This creates a standalone 2.8 GB model that does not require additional PEFT libraries for inference.

	---

	## Evaluation Results

	The model was evaluated on 100 unseen test sentences across multiple metrics to ensure reliability.

	### Performance Metrics
	\| Metric \| Score \| Interpretation \|
	\|---\|---\|---\|
	\| BERTScore F1 \| 0.9685 \| Indicates near-perfect meaning preservation during style transfer. \|
	\| chrF \| 75.56 \| High character-level similarity, showing mastery over Myanmar suffixes. \|
	\| BLEU \| 12.94 \| Reflects the model's creative flexibility; multiple formal rewrites are often valid. \|

	### Qualitative Analysis
	Manual review by native speakers confirms that the model excels at swapping spoken particles (e.g., ...တာပါ။) for formal equivalents (e.g., ...ခြင်းဖြစ်သည်။). Even when the model deviates from the reference text, the outputs remain linguistically acceptable and natural within a formal context.
	---

	## 🔗 Related Models in the DatarrX Ecosystem

	To get the most out of Myanmar Style Transfer, we recommend using these sibling models:

	* [myX-TransStyle-W2S](https://huggingface.co/DatarrX/myX-TransStyle-W2S): The inverse model for converting Written Style to Spoken Style.
	* [myX-StyleClassifier](https://huggingface.co/DatarrX/myX-StyleClassifier): A high-performance classifier to identify whether a sentence is Written or Spoken before applying style transfer.

	---

	## How to Use

	```python
	from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

	# 1. Load the Merged Model
	model_id = "DatarrX/myX-TransStyle-S2W"
	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForSeq2SeqLM.from_pretrained(model_id)

	# 2. Prepare Input
	prefix = "Rewrite Burmese spoken sentence into formal written Burmese: "
	spoken_text = "ပုဂံခေတ်က မြန်မာနိုင်ငံသမိုင်းမှာ ပထမဆုံး အင်ပါယာနိုင်ငံကြီး ဖြစ်ခဲ့တယ်။"
	input_text = prefix + spoken_text

	# 3. Generate Written Style
	inputs = tokenizer(input_text, return_tensors="pt")
	outputs = model.generate(
	**inputs,
	forced_bos_token_id=tokenizer.convert_tokens_to_ids("mya_Mymr"),
	max_length=160,
	num_beams=5
	)

	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	# Output: ပုဂံခေတ်သည် မြန်မာနိုင်ငံသမိုင်းတွင် ပထမဆုံး အင်ပါယာနိုင်ငံကြီး ဖြစ်ခဲ့၏။
	```

	---

	## Intended Use & Limitations

	### Use Cases
	- Formalizing Content: Converting interview transcripts or casual notes into professional reports.
	- Data Normalization: Cleaning social media text for downstream NLP tasks.
	- Educational Tools: Helping students learn the differences between Myanmar registers.

	### Limitations
	- Hybrid Ambiguity: In cases where a sentence structure is valid in both registers, the model may output minimal changes.
	- Domain Specificity: Performance is optimized for standard Yangon/Mandalay dialects and may vary with heavy regional slang.

	## Citation

	### BibTeX
	```BibTeX
	@misc{myx_transstyle_s2w_2026,
	author = {Khant Sint Heinn (Kalix Louis)},
	title = {myX-TransStyle-S2W: A Spoken to Written Burmese Style Transfer Model},
	year = {2026},
	publisher = {Hugging Face},
	organization = {DatarrX},
	howpublished = {https://huggingface.co/DatarrX/myX-TransStyle-S2W}
	}
	```
	---

	## About the Author

	Khant Sint Heinn, working under the name Kalix Louis, is a Machine Learning Engineer focused on Natural Language Processing (NLP), data foundations, and open-source AI development. His work is centered on improving support for the Burmese (Myanmar) language in modern AI systems by building high-quality datasets, practical tools, and scalable infrastructure for language technology.

	He is currently the Lead Developer at DatarrX, where he develops data pipelines, manages large-scale data collection workflows, and helps create open-source resources for researchers, developers, and organizations. His experience includes data engineering, web scripting, dataset curation, and building systems that support real-world machine learning applications.

	Khant Sint Heinn is especially interested in advancing low-resource languages and making AI more accessible to underrepresented communities. Through his open-source contributions, he works to strengthen the Burmese (Myanmar) tech ecosystem and provide reliable building blocks for future language models, search systems, and intelligent applications.

	His goal is simple: to turn limited language resources into practical opportunities through clean data, useful tools, and community-driven innovation.

	Connect with the Author:
	[GitHub](https://github.com/kalixlouiis) \| [Hugging Face](https://huggingface.co/kalixlouiis) \| [Kaggle](https://www.kaggle.com/organizations/kalixlouiis)

	---
	Developed with ❤️ by [DatarrX](https://huggingface.co/DatarrX) to empower the Myanmar AI ecosystem.