Joshi-Aryan commited on
Commit
43ed7e7
·
1 Parent(s): 43cba70

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +35 -15
README.md CHANGED
@@ -12,23 +12,23 @@ metrics:
12
  - recall
13
  library_name: transformers
14
  ---
15
- # Your Model Name
16
- **Fine_Tuned_HF_Language_Identification_Model:** Language Identification Model
17
 
18
  <img src="https://miro.medium.com/v2/resize:fit:1400/1*G5AyGtaUAQBcVLikpxu6CQ.png" style="border-radius: 5%;">
19
 
20
- ## language:
21
- - en
22
- - fr
23
- - de
24
- - ru
25
- - ar
26
- ## metrics:
27
- - f1
28
- - accuracy
29
- - precision
30
- - recall
31
- ## library_name: transformers
 
32
 
33
  ## Overview
34
  Language identification is a foundational task in Natural Language Processing (NLP). This project introduces a meticulously fine-tuned language identification model, rooted in the robust XLM-RoBERTa architecture. It excels at classifying text in five diverse languages: English, French, German, Arabic, and Russian. Delve into the intricate details of this cutting-edge model that pushes the boundaries of multilingual language identification.
@@ -63,6 +63,13 @@ The model underwent a rigorous fine-tuning process using Hugging Face's Trainer
63
 
64
  ## Dataset Used
65
  The corpus used for training is the corpus of © 2023 Universität Leipzig / Sächsische Akademie der Wissenschaften / InfAI.
 
 
 
 
 
 
 
66
 
67
 
68
  ## Technology Stack
@@ -132,6 +139,19 @@ f1 = eval_result["eval_f1"]
132
  ## Model Performance
133
  Table of Model Performance
134
 
135
- ## File Structure
 
 
 
 
 
 
 
 
 
 
 
 
 
136
 
137
  ## Contributing
 
12
  - recall
13
  library_name: transformers
14
  ---
15
+ # Fine Tuned HuggingFace Language Identification Model
 
16
 
17
  <img src="https://miro.medium.com/v2/resize:fit:1400/1*G5AyGtaUAQBcVLikpxu6CQ.png" style="border-radius: 5%;">
18
 
19
+ ## Language Supported:
20
+ 1. English (en)
21
+ 2. French (fr)
22
+ 3. German (de)
23
+ 4. Russian (ru)
24
+ 5. Arabc (ar)
25
+ ## Metrics:
26
+ - f1 - score
27
+ - Accuracy
28
+ - Precision
29
+ - Recall
30
+ ## Library_name:
31
+ Transformers
32
 
33
  ## Overview
34
  Language identification is a foundational task in Natural Language Processing (NLP). This project introduces a meticulously fine-tuned language identification model, rooted in the robust XLM-RoBERTa architecture. It excels at classifying text in five diverse languages: English, French, German, Arabic, and Russian. Delve into the intricate details of this cutting-edge model that pushes the boundaries of multilingual language identification.
 
63
 
64
  ## Dataset Used
65
  The corpus used for training is the corpus of © 2023 Universität Leipzig / Sächsische Akademie der Wissenschaften / InfAI.
66
+ | Language | Size of Corpus (in number of sentence) |
67
+ | -------- | -------- |
68
+ |**English**|50002|
69
+ |**French**|50002|
70
+ |**German**|50002|
71
+ |**Russian**|50002|
72
+ |**Arabic**|36888|
73
 
74
 
75
  ## Technology Stack
 
139
  ## Model Performance
140
  Table of Model Performance
141
 
142
+ | Language | Precision | Recall | F1 - Score | Accuracy |
143
+ | -------- | -------- | -------- | -------- | -------- |
144
+ |**English**|1.0000|0.9994|0.9997|0.9994|
145
+ |**French**|1.0000|0.9992|0.9996|0.9992|
146
+ |**German**|1.0000|0.9998|0.9999|0.9998|
147
+ |**Arabic**|1.0000|0.9997|0.9999|0.9997|
148
+ |**Russian**|1.0000|1.0000|1.0000|1.0000|
149
+
150
+ ## Project Files Structure
151
+ The project's structure is organized as follows:
152
+
153
+ - `data/` : Contains datasets used for training and testing the model
154
+ - `src/` : Source code and Google Collab Notebook
155
+ - `README.md` : This README file
156
 
157
  ## Contributing