--- language: sr metrics: - accuracy - precision - recall - f1 base_model: microsoft/deberta-v3-large --- # srbNLI: Serbian Natural Language Inference Model ## Model Overview srbNLI is a fine-tuned Natural Language Inference (NLI) model for Serbian, created by adapting the SciFact dataset. The model is based on state-of-the-art transformer architectures. It is trained to recognize relationships between claims and evidence in Serbian text, with applications in scientific claim verification and potential expansion to broader claim verification tasks. ## Key Details - **Model Type**: Transformer-based - **Language**: Serbian - **Task**: Natural Language Inference (NLI), Textual Entailment, Claim Verification - **Dataset**: srbSciFact (automatically translated SciFact dataset) - **Fine-tuning**: Fine-tuned on Serbian NLI data (support, contradiction, and neutral categories). - **Metrics**: Accuracy, Precision, Recall, F1-score ## Motivation This model addresses the lack of NLI datasets and models for Serbian, a low-resource language. It provides a tool for textual entailment and claim verification, especially for scientific claims, with broader potential for misinformation detection and automated fact-checking. ## Training - **Base Models Used**: DeBERTa-v3-large - **Training Data**: Automatically translated SciFact dataset - **Fine-tuning**: Conducted on a single DGX NVIDIA A100 GPU (40 GB) - **Hyperparameters**: Optimized learning rate, batch size, weight decay, epochs, and early stopping ## Evaluation The model was evaluated using standard NLI metrics (accuracy, precision, recall, F1-score). It was also compared to the GPT-4o model for generalization capabilities. ## Use Cases - **Claim Verification**: Scientific claims and general domain claims in Serbian - **Misinformation Detection**: Identifying contradictions or support between claims and evidence - **Cross-lingual Applications**: Potential for cross-lingual claim verification with multilingual models ## Future Work - Improving accuracy with human-corrected translations and Serbian-specific datasets - Expanding to general-domain claim verification - Enhancing multilingual NLI capabilities ## Results Comparison The table below presents a comparison of the fine-tuned models (DeBERTa-v3-large, RoBERTa-large, BERTić, GPT-4o, and others) on the srbSciFact dataset, focusing on key metrics: Accuracy (Acc), Precision (P), Recall (R), and F1-score (F1). The models were evaluated on their ability to classify relationships between claims and evidence in Serbian text. | Model | Accuracy | Precision (P) | Recall (R) | F1-score (F1) | |----------------------|----------|---------------|------------|---------------| | **DeBERTa-v3-large** | 0.70 | 0.86 | 0.82 | 0.84 | | **RoBERTa-large** | 0.57 | 0.63 | 0.76 | 0.69 | | **BERTić (Serbian)** | 0.56 | 0.56 | 0.37 | 0.44 | | **GPT-4o (English)** | 0.66 | 0.70 | 0.77 | 0.78 | | **mDeBERTa-base** | 0.63 | 0.92 | 0.75 | 0.83 | | **XLM-RoBERTa-large** | 0.64 | 0.89 | 0.77 | 0.83 | | **mBERT-cased** | 0.48 | 0.76 | 0.50 | 0.60 | | **mBERT-uncased** | 0.57 | 0.45 | 0.61 | 0.52 | ### Observations - **DeBERTa-v3-large** performed the best overall, with an accuracy of 0.70 and an F1-score of 0.84. - **RoBERTa-large** and **BERTić** showed lower performance, especially in recall, suggesting challenges in handling complex linguistic inference in Serbian. - **GPT-4o** outperforms all fine-tuned models in F1-score when the prompt is in English, but the **DeBERTa-v3-large** model slightly outperforms GPT-4o when the prompt is in Serbian. - **mDeBERTa-base** and **XLM-RoBERTa-large** exhibited strong cross-lingual performance, with F1-scores of 0.83 and 0.83, respectively. This demonstrates the potential of adapting advanced transformer models to Serbian while highlighting areas for future improvement, such as refining translations and expanding domain-specific data. ---