Model Card for AI-Manith/manith-gemma-sinhala-gpt

This model is a fine-tuned version of the google/gemma-2b model for English-to-Sinhala translation.

Model Details

Model Description

This model is a fine-tuned version of the google/gemma-2b model using the Programmer-RD-AI/sinhala-english-singlish-translation dataset from Hugging Face. It was fine-tuned using PEFT and QLoRA for efficient training on a single GPU.

Developed by: Manith Jayaba
Model type: Causal Language Model (Fine-tuned for Translation)
Language(s) (NLP): English to Sinhala
License: Apache 2.0 (inherited from Gemma)
Finetuned from model [optional]: google/gemma-2b

Model Sources [optional]

Repository: [More Information Needed - You can add a link to your Hugging Face model repository here]
Paper [optional]: N/A
Demo [optional]: [More Information Needed - You can add a link to a demo if you create one]

Uses

Direct Use

This model can be used for translating English text to Sinhala text.

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

# Define the model ID on the Hugging Face Hub
model_id = "google/gemma-2b"
peft_model_id = "AI-Manith/manith-gemma-sinhala-gpt"

# Load the base model
base_model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)

# Load the LoRA adapters and merge them with the base model
model = PeftModel.from_pretrained(base_model, peft_model_id)
model = model.merge_and_unload() # Merge LoRA layers and unload the adapter

# Ensure the model is in evaluation mode
model.eval()

# Define the translation function
def translate_from_hub(english_text):
    """This function takes an English sentence and returns the Sinhala translation using the model from the Hub."""
    instruction = "Translate the following English text to Sinhala."
    prompt_text = f"""### INSTRUCTION:
{instruction}

### INPUT:
{english_text}

### RESPONSE:
"""

    # Tokenize the input
    inputs = tokenizer(prompt_text, return_tensors="pt").to("cuda")

    # Generate the response
    with torch.no_grad(): # Disable gradient calculation for inference
        outputs = model.generate(**inputs, max_new_tokens=100)

    # Decode the output and extract just the response part
    decoded_output = tokenizer.decode(outputs[0], skip_special_tokens=True)
    response_part = decoded_output.split("### RESPONSE:")[1].strip()

    return response_part

# --- Test Cases --- #
print("\n--- Testing the Translator from Hub ---")

test_sentence_1 = "How are you doing today?"
translation_1 = translate_from_hub(test_sentence_1)
print(f"English: {test_sentence_1}")
print(f"Sinhala: {translation_1}")

print("---")

test_sentence_2 = "Can you translate this sentence?"
translation_2 = translate_from_hub(test_sentence_2)
print(f"English: {test_sentence_2}")
print(f"Sinhala: {translation_2}")

Downstream Use [optional]

This model can be used as a component in larger applications requiring English-to-Sinhala translation.

Out-of-Scope Use

This model is not intended for:

Translating from Sinhala to English.
Translating between other language pairs.
Generating text in languages other than Sinhala based on English input.

Bias, Risks, and Limitations

The model's performance is dependent on the quality and coverage of the training data. It may not perform well on informal language, slang, or highly technical text not present in the dataset.
As with any translation model, there is a risk of perpetuating biases present in the training data.
The model may produce inaccurate or nonsensical translations for certain inputs.

Recommendations

Users should be aware of the model's limitations and evaluate the quality of the translations for their specific use case. It is recommended to use the model for its intended purpose (English to Sinhala translation) and to be mindful of potential biases or inaccuracies.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support