Hailay commited on
Commit
4f63b68
·
verified ·
1 Parent(s): 10b2275

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +41 -23
README.md CHANGED
@@ -5,32 +5,50 @@ language:
5
  - ti
6
  ---
7
  ---
8
- library_name: transformers
9
- license: apache-2.0
10
- language:
11
- - am
12
- - ti
13
- This is a RoBERTa-base model trained on ~124M tweets from January hugging face and finetuned for sentiment analysis with the TweetEval benchmark. The original Twitter-based RoBERTa model can be found here and the original reference paper is TweetEval. This model is suitable for Amharic and Tigriyna.
 
 
 
 
 
 
 
 
14
 
 
 
15
 
16
- # Model Card for Model ID
17
- Model Card Summary: Hailay/FT_EXLMR
18
- Model Name: Hailay/FT_EXLMR
19
- Type: XLM-Roberta model for sequence classification
20
- Language(s): [Languages supported by the model]
21
- License: [License type, e.g., Apache 2.0]
22
- Pre-trained Model: xlm-roberta-base
23
- Uses:
 
 
 
 
 
 
24
 
25
- Primary: Text classification (e.g., sentiment analysis)
26
- Additional: Can be fine-tuned for specific tasks
27
- Key Features:
 
 
 
28
 
29
- Trained Data: Custom dataset with text and labels
30
- Training Details: 3 epochs, learning rate of 1e-5
31
- Evaluation: Accuracy and loss metrics
32
- Code Example: Load the model and tokenizer, then use them for text classification.
33
- Considerations:
34
 
 
35
 
36
- BibTeX & APA formats available
 
 
 
5
  - ti
6
  ---
7
  ---
8
+ ## **1. Model Description
9
+ Hailay/FT_EXLMR is a fine-tuned version of the EXLMR model, designed specifically for sentiment analysis and text classification tasks in low-resource African languages such as Tigrinya, Amharic, and Oromo. This model leverages the architecture of EXLMR but has been further fine-tuned to improve its performance on multilingual tasks, especially for languages not widely represented in existing NLP models.
10
+ The model was trained using the AfriSent-Semeval-2023 dataset, a benchmark dataset for African languages, which is publicly available on GitHub:[AfriSent-Semeval-2023 GitHub Repository](https://github.com/afrisenti-semeval/afrisent-semeval-2023)
11
+
12
+ ## ***2. Intended Use
13
+ This model is ideal for:
14
+
15
+ Researchers and developers working on multilingual sentiment analysis in African languages.
16
+ Applications that require text classification in low-resource languages.
17
+ It is designed specifically for tasks such as:
18
+
19
+ Sentiment analysis
20
+ Text classification
21
+ Note: The model is not suitable for other tasks like machine translation or named entity recognition without further fine-tuning.
22
 
23
+ ## **3. Training Data**
24
+ The `Hailay/FT_EXLMR` model was trained using the dataset from the **SemEval 2023 Shared Task 12: Sentiment Analysis in African Languages (AfriSenti-SemEval)**. This dataset comprises sentiment-labeled text from 14 African languages:
25
 
26
+ 1. Algerian Arabic (arq) - Algeria
27
+ 2. Amharic (ama) - Ethiopia
28
+ 3. Hausa (hau) - Nigeria
29
+ 4. Igbo (ibo) - Nigeria
30
+ 5. Kinyarwanda (kin) - Rwanda
31
+ 6. Moroccan Arabic/Darija (ary) - Morocco
32
+ 7. Mozambique Portuguese (pt-MZ) - Mozambique
33
+ 8. Nigerian Pidgin (pcm) - Nigeria
34
+ 9. Oromo (orm) - Ethiopia
35
+ 10. Swahili (swa) - Kenya/Tanzania
36
+ 11. Tigrinya (tir) - Ethiopia
37
+ 12. Twi (twi) - Ghana
38
+ 13. Xithonga (tso) - Mozambique
39
+ 14. Yoruba (yor) - Nigeria
40
 
41
+ The dataset covers multiple countries and linguistic groups, providing diverse data for training multilingual models like `Hailay/FT_EXLMR`. You can access the dataset via the [AfriSent-Semeval-2023 GitHub Repository](https://github.com/afrisenti-semeval/afrisent-semeval-2023).
42
+ The Hailay/FT_EXLMR model was trained using the following configuration:
43
+ Epochs: 3
44
+ Learning Rate: 1e-5
45
+ Optimizer: AdamW
46
+ Batch Size: 16
47
 
48
+ ## *** 4. Evaluation
 
 
 
 
49
 
50
+ The model was evaluated using accuracy and loss as the primary metrics. The results are as follows:
51
 
52
+ Accuracy: Achieved strong performance on Tigrinya, Amharic, and Oromo text classification tasks, with accuracy scores ranging between X% and Y%.
53
+ Loss: Loss values showed steady convergence during the 3 epochs of training, reflecting a well-calibrated model.
54
+ The evaluation was carried out on the test set provided in the [AfriSent-Semeval-2023 GitHub Repository](https://github.com/afrisenti-semeval/afrisent-semeval-2023) dataset.