Joshi-Aryan
/

Fine_Tuned_HF_Language_Identification_Model

Text Classification

text-embeddings-inference

Model card Files Files and versions

Joshi-Aryan commited on Nov 9, 2023

Commit

43ed7e7

·

1 Parent(s): 43cba70

Update README.md

Files changed (1) hide show

README.md +35 -15

README.md CHANGED Viewed

@@ -12,23 +12,23 @@ metrics:
 - recall
 library_name: transformers
 ---
-# Your Model Name
-**Fine_Tuned_HF_Language_Identification_Model:** Language Identification Model
 <img src="https://miro.medium.com/v2/resize:fit:1400/1*G5AyGtaUAQBcVLikpxu6CQ.png" style="border-radius: 5%;">
-## language:
-- en
-- fr
-- de
-- ru
-- ar
-## metrics:
-- f1
-- accuracy
-- precision
-- recall
-## library_name: transformers
 ## Overview
 Language identification is a foundational task in Natural Language Processing (NLP). This project introduces a meticulously fine-tuned language identification model, rooted in the robust XLM-RoBERTa architecture. It excels at classifying text in five diverse languages: English, French, German, Arabic, and Russian. Delve into the intricate details of this cutting-edge model that pushes the boundaries of multilingual language identification.
@@ -63,6 +63,13 @@ The model underwent a rigorous fine-tuning process using Hugging Face's Trainer
 ##  Dataset Used
 The corpus used for training is the corpus of © 2023 Universität Leipzig / Sächsische Akademie der Wissenschaften / InfAI.
 ## Technology Stack
@@ -132,6 +139,19 @@ f1 = eval_result["eval_f1"]
 ## Model Performance
 Table of Model Performance
-## File Structure
 ## Contributing

 - recall
 library_name: transformers
 ---
+# Fine Tuned HuggingFace Language Identification Model
 <img src="https://miro.medium.com/v2/resize:fit:1400/1*G5AyGtaUAQBcVLikpxu6CQ.png" style="border-radius: 5%;">
+## Language Supported:
+1. English (en)
+2. French (fr)
+3. German (de)
+4. Russian (ru)
+5. Arabc (ar)
+## Metrics:
+- f1 - score
+- Accuracy
+- Precision
+- Recall
+## Library_name:
+Transformers
 ## Overview
 Language identification is a foundational task in Natural Language Processing (NLP). This project introduces a meticulously fine-tuned language identification model, rooted in the robust XLM-RoBERTa architecture. It excels at classifying text in five diverse languages: English, French, German, Arabic, and Russian. Delve into the intricate details of this cutting-edge model that pushes the boundaries of multilingual language identification.
 ##  Dataset Used
 The corpus used for training is the corpus of © 2023 Universität Leipzig / Sächsische Akademie der Wissenschaften / InfAI.
+| Language | Size of Corpus (in number of sentence) |
+| -------- | -------- |
+|**English**|50002|
+|**French**|50002|
+|**German**|50002|
+|**Russian**|50002|
+|**Arabic**|36888|
 ## Technology Stack
 ## Model Performance
 Table of Model Performance
+| Language | Precision | Recall | F1 - Score | Accuracy |
+| -------- | -------- | -------- | -------- | -------- |
+|**English**|1.0000|0.9994|0.9997|0.9994|
+|**French**|1.0000|0.9992|0.9996|0.9992|
+|**German**|1.0000|0.9998|0.9999|0.9998|
+|**Arabic**|1.0000|0.9997|0.9999|0.9997|
+|**Russian**|1.0000|1.0000|1.0000|1.0000|
+## Project Files Structure
+The project's structure is organized as follows:
+- `data/`       :         Contains datasets used for training and testing the model
+- `src/`         :        Source code and Google Collab Notebook
+- `README.md`     :       This README file
 ## Contributing