cnababaie
/

tuti

text-generation-inference

Model card Files Files and versions

tuti / README.md

cnababaie's picture

Update README.md

e4d6b5a verified 11 months ago

|

history blame contribute delete

3.44 kB

	---
	base_model:
	- unsloth/gemma-2-9b-bnb-4bit
	- google/gemma-2-9b
	tags:
	- text-generation-inference
	- transformers
	- unsloth
	- gemma2
	- trl
	license: gemma
	language:
	- fa
	- en
	---


	# Tuti 🦜

	This is a [Gemma 2 9b](https://huggingface.co/google/gemma-2-9b), fined tuned using Unsloth's 4-bit quantization and LORA (QLORA), on Persian literature datasets I curated/created or found.

	## Use cases and datasets

	### Word IPA Detection

	I have fined tuned this model with QLORA and only uploaded the LORA adapter, so it could be used like this:

	```python
	# pip install unsloth
	from unsloth import FastLanguageModel
	from transformers import TextStreamer

	model_name = "cnababaie/tuti"
	max_seq_length = 4096 # Adjust as needed
	dtype = None
	load_in_4bit = True

	model, tokenizer = FastLanguageModel.from_pretrained(
	model_name=model_name,
	max_seq_length=max_seq_length,
	dtype=dtype,
	load_in_4bit=load_in_4bit,
	)
	FastLanguageModel.for_inference(model)
	alpaca_prompt_template = """### Instruction:
	{}

	### Input:
	{}

	### Response:
	{}"""
	```

	```python
	inputs = tokenizer(
	[
	alpaca_prompt_template.format(
	"IPA این کلمه چیست؟", # instruction
	"جوینده",
	"", # output - leave this blank for generation!
	)
	], return_tensors = "pt").to("cuda")

	text_streamer = TextStreamer(tokenizer)
	_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 64)
	```

	This will correctly output IPA as "/d͡ʒuːjænde/ (juyande)".

	#### IPA Sources

	- [IPA-dict](https://github.com/open-dict-data/ipa-dict/tree/master): Monolingual wordlists with pronunciation information in IPA
	- [Wiktionary](https://en.wiktionary.org): The Persian corpus don't contain IPA but the English one(which contains many words and phrases in other than English) are a lot of Persian words with their IPA

	### Persian Text Romanization

	```python
	inputs = tokenizer(
	[
	alpaca_prompt_template.format(
	"این متن چه تلفظی داره؟", # instruction
	"خاک به خاطر بارش زیاد باران گل شد.",
	"", # output - leave this blank for generation!
	)
	], return_tensors = "pt").to("cuda")

	text_streamer = TextStreamer(tokenizer)
	_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 64)
	```

	This will output exact pronunciation as "Xāk be xāter-e bāreš-e ziyād-e bārān gel šod.".

	#### Romanization Sources

	- [http://alefbaye2om.org/](http://alefbaye2om.org/): Contain PDFs with Persian Romanized text


	### Persian Poem Translation

	```python
	inputs = tokenizer(
	[
	alpaca_prompt_template.format(
	"ترجمه", # instruction
	"برخیز بتا بیا ز بهر دل ما\r\nحل کن به جمال خویشتن مشکل ما\r\nیک کوزه شراب تا به هم نوش کن\r\nزآن پیش که کوزه‌ها کنند از گل ما",
	"", # output - leave this blank for generation!
	)
	], return_tensors = "pt").to("cuda")

	text_streamer = TextStreamer(tokenizer)
	_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 64)
	```

	This will output rhymed poetry with the original poem content:

	*"Arise, O idol, for our heart's sake,
	Solve our troubles with your beauty's make.
	One pot of wine, let's drink it all,
	Before they make pots from our clay's fall."*.

	#### Poem Translation Sources

	- Created list of random poems from Ganjoor and translation text pair