--- base_model: - unsloth/gemma-2-9b-bnb-4bit - google/gemma-2-9b tags: - text-generation-inference - transformers - unsloth - gemma2 - trl license: gemma language: - fa - en --- # Tuti 🦜 This is a [Gemma 2 9b](https://huggingface.co/google/gemma-2-9b), fined tuned using Unsloth's 4-bit quantization and LORA (QLORA), on Persian literature datasets I curated/created or found. ## Use cases and datasets ### Word IPA Detection I have fined tuned this model with QLORA and only uploaded the LORA adapter, so it could be used like this: ```python # pip install unsloth from unsloth import FastLanguageModel from transformers import TextStreamer model_name = "cnababaie/tuti" max_seq_length = 4096 # Adjust as needed dtype = None load_in_4bit = True model, tokenizer = FastLanguageModel.from_pretrained( model_name=model_name, max_seq_length=max_seq_length, dtype=dtype, load_in_4bit=load_in_4bit, ) FastLanguageModel.for_inference(model) alpaca_prompt_template = """### Instruction: {} ### Input: {} ### Response: {}""" ``` ```python inputs = tokenizer( [ alpaca_prompt_template.format( "IPA این کلمه چیست؟", # instruction "جوینده", "", # output - leave this blank for generation! ) ], return_tensors = "pt").to("cuda") text_streamer = TextStreamer(tokenizer) _ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 64) ``` This will correctly output IPA as *"/d͡ʒuːjænde/ (*juyande*)"*. #### IPA Sources - [IPA-dict](https://github.com/open-dict-data/ipa-dict/tree/master): Monolingual wordlists with pronunciation information in IPA - [Wiktionary](https://en.wiktionary.org): The Persian corpus don't contain IPA but the English one(which contains many words and phrases in other than English) are a lot of Persian words with their IPA ### Persian Text Romanization ```python inputs = tokenizer( [ alpaca_prompt_template.format( "این متن چه تلفظی داره؟", # instruction "خاک به خاطر بارش زیاد باران گل شد.", "", # output - leave this blank for generation! ) ], return_tensors = "pt").to("cuda") text_streamer = TextStreamer(tokenizer) _ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 64) ``` This will output exact pronunciation as *"Xāk be xāter-e bāreš-e ziyād-e bārān gel šod."*. #### Romanization Sources - [http://alefbaye2om.org/](http://alefbaye2om.org/): Contain PDFs with Persian Romanized text ### Persian Poem Translation ```python inputs = tokenizer( [ alpaca_prompt_template.format( "ترجمه", # instruction "برخیز بتا بیا ز بهر دل ما\r\nحل کن به جمال خویشتن مشکل ما\r\nیک کوزه شراب تا به هم نوش کن\r\nزآن پیش که کوزه‌ها کنند از گل ما", "", # output - leave this blank for generation! ) ], return_tensors = "pt").to("cuda") text_streamer = TextStreamer(tokenizer) _ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 64) ``` This will output rhymed poetry with the original poem content: *"Arise, O idol, for our heart's sake, Solve our troubles with your beauty's make. One pot of wine, let's drink it all, Before they make pots from our clay's fall."*. #### Poem Translation Sources - Created list of random poems from Ganjoor and translation text pair