teamapocalypseml
/

regben2ipa-byt5small

text2text-generation

text-generation-inference

Model card Files Files and versions

regben2ipa-byt5small / README.md

smji's picture

Update README.md

08732c6 verified about 2 years ago

|

history blame contribute delete

2.66 kB

	---
	license: apache-2.0
	language:
	- bn
	metrics:
	- wer
	- cer
	tags:
	- seq2seq
	- ipa
	- bengali
	- byt5
	widget:
	- text: <Narail> আমি সে বাবুর মামু বাড়ি গিছিলাম।
	example_title: Narail Text
	- text: <Rangpur> এখন এই কুলো তার শেষ অই কুলো তার শেষ।
	example_title: Rangpur Text
	- text: <Chittagong> খয়দে সিআরের এইল্লা কি অবস্থা!
	example_title: Chittagong Text
	- text: <Kishoreganj> আটাইশ করছিলাম দের কানি ক্ষেত, ইবার মাইর কাইছি।
	example_title: Kishoreganj Text
	- text: <Narsingdi> তারা তো ওই খারাপ খেইলাই আসে না।
	example_title: Narsingdi Text
	- text: <Tangail> আর সব থেকে ফানি কথা হইতেছে দেখ?
	example_title: Tangail Text
	---


	# Regional bengali text to IPA transcription - byT5-small


	This is a fine-tuned version of the [google/byt5-small](https://huggingface.co/google/byt5-small) for the task of generating IPA transcriptions from regional bengali text.
	This was done on the dataset of the competition [“ভাষামূল: মুখের ভাষার খোঁজে“](https://www.kaggle.com/competitions/regipa/overview) by Bengali.AI.

	Model performance:
	- Word error rate (wer): 0.0124279344454407
	- Char error rate (cer): 0.00427635805681347


	Supported district tokens:
	- Kishoreganj
	- Narail
	- Narsingdi
	- Chittagong
	- Rangpur
	- Tangail

	---

	## Loading & using the model
	```python
	# Load model directly
	from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

	tokenizer = AutoTokenizer.from_pretrained("teamapocalypseml/ben2ipa-byt5small")
	model = AutoModelForSeq2SeqLM.from_pretrained("teamapocalypseml/ben2ipa-byt5small")

	"""
	The format of the input text MUST BE: <district> <bengali_text>
	"""
	text = "<district> bengali_text_here"
	text_ids = tokenizer(text, return_tensors='pt').input_ids
	model(text_ids)
	```


	## Using the pipeline
	```python
	# Use a pipeline as a high-level helper
	from transformers import pipeline

	device = "cuda" if torch.cuda.is_available() else "cpu"

	pipe = pipeline("text2text-generation", model="teamapocalypseml/ben2ipa-byt5small", device=device)


	"""
	`texts` must be in the format of: <district> <contents>
	"""
	outputs = pipe(texts, max_length=1024, batch_size=batch_size)
	```

	## Credits
	Done by [S M Jishanul Islam](https://github.com/S-M-J-I), [Sadia Ahmmed](https://huggingface.co/sadiaahmmed), [Sahid Hossain Mustakim](https://huggingface.co/rhsm15)