--- library_name: transformers datasets: - culturalheritagenus/rumi-correction-v1.1-data-v3 language: - en - ms metrics: - bleu base_model: - aisingapore/Gemma-SEA-LION-v3-9B-IT --- # Model Card for Model ID ## Model Details ### Model Description This model is trained with QLoRA with parameters `r = lora_alpha = 4`. - **Developed by:** hyhyhyhyyhyh - **Model type:** Gemma 2 9B - **Language(s) (NLP):** Malay, English - **License:** [More Information Needed] - **Finetuned from model** aisingapore/Gemma-SEA-LION-v3-9B-IT ### Model Sources [optional] - **Repository:** [More Information Needed] - **Paper [optional]:** [More Information Needed] - **Demo [optional]:** [More Information Needed] ## How to Get Started with the Model Use the code below to get started with the model: ``` from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer trained_model = AutoModelForCausalLM.from_pretrained( "culturalheritagenus/rumi-correction-v1.1", device_map="auto", torch_dtype=torch.bfloat16 ) trained_tokenizer = AutoTokenizer.from_pretrained("culturalheritagenus/rumi-correction-v1.1") ``` To perform inference: ``` messages = [ {"role": "user", "content": "You are a Malay language spelling corrector. I will give you some text written in messy Rumi (shortened or mistyped). Rewrite it in correct Malay Rumi spelling.\naurng ank. yngdim dimn anm aurngdan"}, ] inputs = tokenizer.apply_chat_template( messages, tokenize = True, add_generation_prompt = True, # Must add for generation return_tensors = "pt", ).to("cuda") text_streamer = TextStreamer(tokenizer) _ = trained_model.generate(input_ids = inputs, streamer = text_streamer, max_new_tokens = 128, use_cache = True) ``` ## Training Details ### Training Data The model was trained on [culturalheritagenus/rumi-correction-v1.1-data-v3](https://huggingface.co/datasets/culturalheritagenus/rumi-correction-v1.1-data-v3) ### Training Procedure To replicate this model, please refer to the provided script and below. Ensure that the versions of all languages and libraries are the same. ## Environmental Impact Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). - **Hardware Type:** 1x GH200 (96 GB) - **Hours used:** ~12 - **Cloud Provider:** Lambda - **Compute Region:** US-East (Lambda Labs) ## Technical Specifications ### Software - Python version: 3.10.12 - CUDA version: 12.8 - Torch version: 2.7.1+cu128 ## Citation [optional] **BibTeX:** [More Information Needed] **APA:** [More Information Needed] ## Model Card Authors [optional] hyhyhyhyyhyh