nur-dev
/

roberta-large-kazqad

Question Answering

Model card Files Files and versions

nur-dev commited on Aug 26, 2024

Commit

86eb4dc

·

verified ·

1 Parent(s): 301ed00

Update README.md

Files changed (1) hide show

README.md +72 -3

README.md CHANGED Viewed

@@ -1,3 +1,72 @@
----
-license: afl-3.0
----

+---
+license: afl-3.0
+datasets:
+- issai/kazqad
+language:
+- kk
+library_name: transformers
+pipeline_tag: question-answering
+---
+# RoBERTa-Large-KazQAD for Question Answering
+## Model Description
+nur-dev/roberta-large-kazqad is a fine-tuned version of RoBERTa-Kaz-Large, specifically optimized for the Question Answering (QA) task using the Kazakh Open-Domain Question Answering Dataset (KazQAD). This model is trained to extract precise answers from given contexts in the Kazakh language.
+## Usage
+The model can be used with the Hugging Face Transformers library:
+```python
+from transformers import AutoModelForQuestionAnswering, AutoTokenizer
+import torch
+# Load the fine-tuned model and tokenizer
+repo_id = 'nur-dev/roberta-large-kazqad'
+model = AutoModelForQuestionAnswering.from_pretrained(repo_id)
+tokenizer = AutoTokenizer.from_pretrained(repo_id)
+# Define the context and question
+context = """
+Алматы Қазақстанның ең ірі мегаполисі. Алматы – асқақ Тянь-Шань тауы жотасының көкжасыл бауырайынан,
+Іле Алатауының бөктерінде, Қазақстан Республикасының оңтүстік-шығысында, Еуразия құрлығының орталығында орналасқан қала.
+Бұл қаланы «қала-бақ» деп те атайды.
+"""
+question = "Алматы қаласы Қазақстанның қай бөлігінде орналасқан?"
+# Tokenize the input
+inputs = tokenizer.encode_plus(
+    question,
+    context,
+    add_special_tokens=True,
+    return_tensors="pt"
+)
+input_ids = inputs["input_ids"]
+attention_mask = inputs["attention_mask"]
+# Perform inference
+with torch.no_grad():
+    outputs = model(input_ids=input_ids, attention_mask=attention_mask)
+    start_logits = outputs.start_logits
+    end_logits = outputs.end_logits
+# Find the answer's start and end position
+start_index = torch.argmax(start_logits)
+end_index = torch.argmax(end_logits)
+# Decode the answer from the context
+answer = tokenizer.decode(input_ids[0][start_index:end_index + 1])
+print(f"Question: {question}")
+print(f"Answer: {answer}")
+```
+## Limitations and Biases
+	•	Language Specificity: This model is specifically fine-tuned for the Kazakh language and may not perform well in other languages.
+	•	Context Length: The model has limitations with very long contexts, as it is fine-tuned for input lengths up to 512 tokens.
+	•	Biases: Like other large pre-trained language models, nur-dev/roberta-large-kazqad may exhibit biases present in its training data. Users should be cautious and critically evaluate the model’s outputs, especially for sensitive applications.
+## Model Authors
+**Name:** Kadyrbek Nurgali
+- **Email:** nurgaliqadyrbek@gmail.com
+- **LinkedIn:** [Kadyrbek Nurgali](https://www.linkedin.com/in/nurgali-kadyrbek-504260231/)