Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,67 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language: sr
|
| 3 |
+
metrics:
|
| 4 |
+
- accuracy
|
| 5 |
+
- precision
|
| 6 |
+
- recall
|
| 7 |
+
- f1
|
| 8 |
+
base_model: microsoft/deberta-v3-large
|
| 9 |
+
---
|
| 10 |
+
# srbNLI: Serbian Natural Language Inference Model
|
| 11 |
+
|
| 12 |
+
## Model Overview
|
| 13 |
+
srbNLI is a fine-tuned Natural Language Inference (NLI) model for Serbian, created by adapting the SciFact dataset. The model is based on state-of-the-art transformer architectures. It is trained to recognize relationships between claims and evidence in Serbian text, with applications in scientific claim verification and potential expansion to broader claim verification tasks.
|
| 14 |
+
|
| 15 |
+
## Key Details
|
| 16 |
+
- **Model Type**: Transformer-based
|
| 17 |
+
- **Language**: Serbian
|
| 18 |
+
- **Task**: Natural Language Inference (NLI), Textual Entailment, Claim Verification
|
| 19 |
+
- **Dataset**: srbSciFact (automatically translated SciFact dataset)
|
| 20 |
+
- **Fine-tuning**: Fine-tuned on Serbian NLI data (support, contradiction, and neutral categories).
|
| 21 |
+
- **Metrics**: Accuracy, Precision, Recall, F1-score
|
| 22 |
+
|
| 23 |
+
## Motivation
|
| 24 |
+
This model addresses the lack of NLI datasets and models for Serbian, a low-resource language. It provides a tool for textual entailment and claim verification, especially for scientific claims, with broader potential for misinformation detection and automated fact-checking.
|
| 25 |
+
|
| 26 |
+
## Training
|
| 27 |
+
- **Base Models Used**: DeBERTa-v3-large
|
| 28 |
+
- **Training Data**: Automatically translated SciFact dataset
|
| 29 |
+
- **Fine-tuning**: Conducted on a single DGX NVIDIA A100 GPU (40 GB)
|
| 30 |
+
- **Hyperparameters**: Optimized learning rate, batch size, weight decay, epochs, and early stopping
|
| 31 |
+
|
| 32 |
+
## Evaluation
|
| 33 |
+
The model was evaluated using standard NLI metrics (accuracy, precision, recall, F1-score). It was also compared to the GPT-4o model for generalization capabilities.
|
| 34 |
+
|
| 35 |
+
## Use Cases
|
| 36 |
+
- **Claim Verification**: Scientific claims and general domain claims in Serbian
|
| 37 |
+
- **Misinformation Detection**: Identifying contradictions or support between claims and evidence
|
| 38 |
+
- **Cross-lingual Applications**: Potential for cross-lingual claim verification with multilingual models
|
| 39 |
+
|
| 40 |
+
## Future Work
|
| 41 |
+
- Improving accuracy with human-corrected translations and Serbian-specific datasets
|
| 42 |
+
- Expanding to general-domain claim verification
|
| 43 |
+
- Enhancing multilingual NLI capabilities
|
| 44 |
+
|
| 45 |
+
## Results Comparison
|
| 46 |
+
|
| 47 |
+
The table below presents a comparison of the fine-tuned models (DeBERTa-v3-large, RoBERTa-large, BERTić, GPT-4o, and others) on the srbSciFact dataset, focusing on key metrics: Accuracy (Acc), Precision (P), Recall (R), and F1-score (F1). The models were evaluated on their ability to classify relationships between claims and evidence in Serbian text.
|
| 48 |
+
|
| 49 |
+
| Model | Accuracy | Precision (P) | Recall (R) | F1-score (F1) |
|
| 50 |
+
|----------------------|----------|---------------|------------|---------------|
|
| 51 |
+
| **DeBERTa-v3-large** | 0.70 | 0.86 | 0.82 | 0.84 |
|
| 52 |
+
| **RoBERTa-large** | 0.57 | 0.63 | 0.76 | 0.69 |
|
| 53 |
+
| **BERTić (Serbian)** | 0.56 | 0.56 | 0.37 | 0.44 |
|
| 54 |
+
| **GPT-4o (English)** | 0.66 | 0.70 | 0.77 | 0.78 |
|
| 55 |
+
| **mDeBERTa-base** | 0.63 | 0.92 | 0.75 | 0.83 |
|
| 56 |
+
| **XLM-RoBERTa-large** | 0.64 | 0.89 | 0.77 | 0.83 |
|
| 57 |
+
| **mBERT-cased** | 0.48 | 0.76 | 0.50 | 0.60 |
|
| 58 |
+
| **mBERT-uncased** | 0.57 | 0.45 | 0.61 | 0.52 |
|
| 59 |
+
|
| 60 |
+
### Observations
|
| 61 |
+
- **DeBERTa-v3-large** performed the best overall, with an accuracy of 0.70 and an F1-score of 0.84.
|
| 62 |
+
- **RoBERTa-large** and **BERTić** showed lower performance, especially in recall, suggesting challenges in handling complex linguistic inference in Serbian.
|
| 63 |
+
- **GPT-4o** outperforms all fine-tuned models in F1-score when the prompt is in English, but the **DeBERTa-v3-large** model slightly outperforms GPT-4o when the prompt is in Serbian.
|
| 64 |
+
- **mDeBERTa-base** and **XLM-RoBERTa-large** exhibited strong cross-lingual performance, with F1-scores of 0.83 and 0.83, respectively.
|
| 65 |
+
|
| 66 |
+
This demonstrates the potential of adapting advanced transformer models to Serbian while highlighting areas for future improvement, such as refining translations and expanding domain-specific data.
|
| 67 |
+
---
|