nahiar commited on
Commit
0e8a150
·
verified ·
1 Parent(s): d471a57

Update README with proper attribution

Browse files
Files changed (1) hide show
  1. README.md +44 -8
README.md CHANGED
@@ -6,7 +6,32 @@ tags:
6
  - language-identification
7
  ---
8
 
9
- # fastText (Language Identification)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
 
11
  fastText is an open-source, free, lightweight library that allows users to learn text representations and text classifiers. It works on standard, generic hardware. Models can later be reduced in size to even fit on mobile devices. It was introduced in [this paper](https://arxiv.org/abs/1607.04606). The official website can be found [here](https://fasttext.cc/).
12
 
@@ -30,6 +55,10 @@ Here is how to use this model to detect the language of a given text:
30
  >>> import fasttext
31
  >>> from huggingface_hub import hf_hub_download
32
 
 
 
 
 
33
  >>> model_path = hf_hub_download(repo_id="facebook/fasttext-language-identification", filename="model.bin")
34
  >>> model = fasttext.load_model(model_path)
35
  >>> model.predict("Hello, world!")
@@ -38,13 +67,13 @@ Here is how to use this model to detect the language of a given text:
38
 
39
  >>> model.predict("Hello, world!", k=5)
40
 
41
- (('__label__eng_Latn', '__label__vie_Latn', '__label__nld_Latn', '__label__pol_Latn', '__label__deu_Latn'),
42
  array([0.61224753, 0.21323682, 0.09696738, 0.01359863, 0.01319415]))
43
  ```
44
 
45
  ### Limitations and bias
46
 
47
- Even if the training data used for this model could be characterized as fairly neutral, this model can have biased predictions.
48
 
49
  Cosine similarity can be used to measure the similarity between two different word vectors. If two two vectors are identical, the cosine similarity will be 1. For two completely unrelated vectors, the value will be 0. If two vectors have an opposite relationship, the value will be -1.
50
 
@@ -81,7 +110,7 @@ More information about the training of these models can be found in the article
81
 
82
  ### License
83
 
84
- The language identification model is distributed under the [*Creative Commons Attribution-NonCommercial 4.0 International Public License*](https://creativecommons.org/licenses/by-nc/4.0/).
85
 
86
  ### Evaluation datasets
87
 
@@ -91,7 +120,7 @@ The analogy evaluation datasets described in the paper are available here: [Fren
91
 
92
  Please cite [1] if using this code for learning word representations or [2] if using for text classification.
93
 
94
- [1] P. Bojanowski\*, E. Grave\*, A. Joulin, T. Mikolov, [*Enriching Word Vectors with Subword Information*](https://arxiv.org/abs/1607.04606)
95
 
96
  ```markup
97
  @article{bojanowski2016enriching,
@@ -102,7 +131,7 @@ Please cite [1] if using this code for learning word representations or [2] if u
102
  }
103
  ```
104
 
105
- [2] A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, [*Bag of Tricks for Efficient Text Classification*](https://arxiv.org/abs/1607.01759)
106
 
107
  ```markup
108
  @article{joulin2016bag,
@@ -113,7 +142,7 @@ Please cite [1] if using this code for learning word representations or [2] if u
113
  }
114
  ```
115
 
116
- [3] A. Joulin, E. Grave, P. Bojanowski, M. Douze, H. Jégou, T. Mikolov, [*FastText.zip: Compressing text classification models*](https://arxiv.org/abs/1612.03651)
117
 
118
  ```markup
119
  @article{joulin2016fasttext,
@@ -126,7 +155,7 @@ Please cite [1] if using this code for learning word representations or [2] if u
126
 
127
  If you use these word vectors, please cite the following paper:
128
 
129
- [4] E. Grave\*, P. Bojanowski\*, P. Gupta, A. Joulin, T. Mikolov, [*Learning Word Vectors for 157 Languages*](https://arxiv.org/abs/1802.06893)
130
 
131
  ```markup
132
  @inproceedings{grave2018learning,
@@ -139,3 +168,10 @@ If you use these word vectors, please cite the following paper:
139
 
140
  (\* These authors contributed equally.)
141
 
 
 
 
 
 
 
 
 
6
  - language-identification
7
  ---
8
 
9
+ # 🔗 FastText Language Identification - Mirror Repository
10
+
11
+ > **⚠️ IMPORTANT NOTICE**: This is a **mirror/fork** of the original Facebook FastText Language Identification model.
12
+ >
13
+ > **Original Repository**: [facebook/fasttext-language-identification](https://huggingface.co/facebook/fasttext-language-identification)
14
+ >
15
+ > **Original Authors**: Facebook Research Team
16
+ >
17
+ > **Purpose of this mirror**: Providing an alternative access point for the model / Personal backup / Testing purposes
18
+
19
+ ---
20
+
21
+ ## 📌 Attribution & Credits
22
+
23
+ **ALL CREDITS GO TO THE ORIGINAL AUTHORS AT FACEBOOK RESEARCH**
24
+
25
+ This model was developed by Facebook Research as part of the NLLB (No Language Left Behind) project. I do not claim any ownership or authorship of this model. This repository serves only as a mirror/backup.
26
+
27
+ - **Original Model Card**: [facebook/fasttext-language-identification](https://huggingface.co/facebook/fasttext-language-identification)
28
+ - **Paper**: [Enriching Word Vectors with Subword Information](https://arxiv.org/abs/1607.04606)
29
+ - **Official Website**: [fasttext.cc](https://fasttext.cc/)
30
+ - **License**: Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)
31
+
32
+ ---
33
+
34
+ # Original Model Description (from Facebook)
35
 
36
  fastText is an open-source, free, lightweight library that allows users to learn text representations and text classifiers. It works on standard, generic hardware. Models can later be reduced in size to even fit on mobile devices. It was introduced in [this paper](https://arxiv.org/abs/1607.04606). The official website can be found [here](https://fasttext.cc/).
37
 
 
55
  >>> import fasttext
56
  >>> from huggingface_hub import hf_hub_download
57
 
58
+ >>> # You can use either the original repo or this mirror
59
+ >>> # Original: repo_id="facebook/fasttext-language-identification"
60
+ >>> # Mirror: repo_id="nahiar/language-detection"
61
+ >>>
62
  >>> model_path = hf_hub_download(repo_id="facebook/fasttext-language-identification", filename="model.bin")
63
  >>> model = fasttext.load_model(model_path)
64
  >>> model.predict("Hello, world!")
 
67
 
68
  >>> model.predict("Hello, world!", k=5)
69
 
70
+ (('__label__eng_Latn', '__label__vie_Latn', '__label__nld_Latn', '__label__pol_Latn', '__label__deu_Latn'),
71
  array([0.61224753, 0.21323682, 0.09696738, 0.01359863, 0.01319415]))
72
  ```
73
 
74
  ### Limitations and bias
75
 
76
+ Even if the training data used for this model could be characterized as fairly neutral, this model can have biased predictions.
77
 
78
  Cosine similarity can be used to measure the similarity between two different word vectors. If two two vectors are identical, the cosine similarity will be 1. For two completely unrelated vectors, the value will be 0. If two vectors have an opposite relationship, the value will be -1.
79
 
 
110
 
111
  ### License
112
 
113
+ The language identification model is distributed under the [_Creative Commons Attribution-NonCommercial 4.0 International Public License_](https://creativecommons.org/licenses/by-nc/4.0/).
114
 
115
  ### Evaluation datasets
116
 
 
120
 
121
  Please cite [1] if using this code for learning word representations or [2] if using for text classification.
122
 
123
+ [1] P. Bojanowski\*, E. Grave\*, A. Joulin, T. Mikolov, [_Enriching Word Vectors with Subword Information_](https://arxiv.org/abs/1607.04606)
124
 
125
  ```markup
126
  @article{bojanowski2016enriching,
 
131
  }
132
  ```
133
 
134
+ [2] A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, [_Bag of Tricks for Efficient Text Classification_](https://arxiv.org/abs/1607.01759)
135
 
136
  ```markup
137
  @article{joulin2016bag,
 
142
  }
143
  ```
144
 
145
+ [3] A. Joulin, E. Grave, P. Bojanowski, M. Douze, H. Jégou, T. Mikolov, [_FastText.zip: Compressing text classification models_](https://arxiv.org/abs/1612.03651)
146
 
147
  ```markup
148
  @article{joulin2016fasttext,
 
155
 
156
  If you use these word vectors, please cite the following paper:
157
 
158
+ [4] E. Grave\*, P. Bojanowski\*, P. Gupta, A. Joulin, T. Mikolov, [_Learning Word Vectors for 157 Languages_](https://arxiv.org/abs/1802.06893)
159
 
160
  ```markup
161
  @inproceedings{grave2018learning,
 
168
 
169
  (\* These authors contributed equally.)
170
 
171
+ ---
172
+
173
+ ## 📝 Repository Maintainer Note
174
+
175
+ This repository is maintained by [@nahiar](https://huggingface.co/nahiar) for easier access and backup purposes only. For any issues with the model itself, please refer to the original repository or Facebook Research team.
176
+
177
+ **If you are the original author and have any concerns about this mirror, please contact me and I will immediately take appropriate action.**