File size: 4,196 Bytes
cd7918b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
---
language: sr
metrics:
  - accuracy
  - precision
  - recall
  - f1
base_model: microsoft/deberta-v3-large
---
# srbNLI: Serbian Natural Language Inference Model

## Model Overview
srbNLI is a fine-tuned Natural Language Inference (NLI) model for Serbian, created by adapting the SciFact dataset. The model is based on state-of-the-art transformer architectures. It is trained to recognize relationships between claims and evidence in Serbian text, with applications in scientific claim verification and potential expansion to broader claim verification tasks.

## Key Details
- **Model Type**: Transformer-based
- **Language**: Serbian
- **Task**: Natural Language Inference (NLI), Textual Entailment, Claim Verification
- **Dataset**: srbSciFact (automatically translated SciFact dataset)
- **Fine-tuning**: Fine-tuned on Serbian NLI data (support, contradiction, and neutral categories).
- **Metrics**: Accuracy, Precision, Recall, F1-score

## Motivation
This model addresses the lack of NLI datasets and models for Serbian, a low-resource language. It provides a tool for textual entailment and claim verification, especially for scientific claims, with broader potential for misinformation detection and automated fact-checking.

## Training
- **Base Models Used**: DeBERTa-v3-large
- **Training Data**: Automatically translated SciFact dataset
- **Fine-tuning**: Conducted on a single DGX NVIDIA A100 GPU (40 GB)
- **Hyperparameters**: Optimized learning rate, batch size, weight decay, epochs, and early stopping

## Evaluation
The model was evaluated using standard NLI metrics (accuracy, precision, recall, F1-score). It was also compared to the GPT-4o model for generalization capabilities.

## Use Cases
- **Claim Verification**: Scientific claims and general domain claims in Serbian
- **Misinformation Detection**: Identifying contradictions or support between claims and evidence
- **Cross-lingual Applications**: Potential for cross-lingual claim verification with multilingual models

## Future Work
- Improving accuracy with human-corrected translations and Serbian-specific datasets
- Expanding to general-domain claim verification
- Enhancing multilingual NLI capabilities

## Results Comparison

The table below presents a comparison of the fine-tuned models (DeBERTa-v3-large, RoBERTa-large, BERTić, GPT-4o, and others) on the srbSciFact dataset, focusing on key metrics: Accuracy (Acc), Precision (P), Recall (R), and F1-score (F1). The models were evaluated on their ability to classify relationships between claims and evidence in Serbian text.

| Model                | Accuracy | Precision (P) | Recall (R) | F1-score (F1) |
|----------------------|----------|---------------|------------|---------------|
| **DeBERTa-v3-large**  | 0.70     | 0.86          | 0.82       | 0.84          |
| **RoBERTa-large**     | 0.57     | 0.63          | 0.76       | 0.69          |
| **BERTić (Serbian)**  | 0.56     | 0.56          | 0.37       | 0.44          |
| **GPT-4o (English)**  | 0.66     | 0.70          | 0.77       | 0.78          |
| **mDeBERTa-base**     | 0.63     | 0.92          | 0.75       | 0.83          |
| **XLM-RoBERTa-large** | 0.64     | 0.89          | 0.77       | 0.83          |
| **mBERT-cased**       | 0.48     | 0.76          | 0.50       | 0.60          |
| **mBERT-uncased**     | 0.57     | 0.45          | 0.61       | 0.52          |

### Observations
- **DeBERTa-v3-large** performed the best overall, with an accuracy of 0.70 and an F1-score of 0.84.
- **RoBERTa-large** and **BERTić** showed lower performance, especially in recall, suggesting challenges in handling complex linguistic inference in Serbian.
- **GPT-4o** outperforms all fine-tuned models in F1-score when the prompt is in English, but the **DeBERTa-v3-large** model slightly outperforms GPT-4o when the prompt is in Serbian.
- **mDeBERTa-base** and **XLM-RoBERTa-large** exhibited strong cross-lingual performance, with F1-scores of 0.83 and 0.83, respectively.

This demonstrates the potential of adapting advanced transformer models to Serbian while highlighting areas for future improvement, such as refining translations and expanding domain-specific data.
---