Commit
·
4cf0f1e
1
Parent(s):
392d071
Add model
Browse files- README.md +28 -0
- config.json +31 -0
- model.safetensors +3 -0
- vocab.json +0 -0
README.md
ADDED
|
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
tags:
|
| 3 |
+
- text-classification
|
| 4 |
+
- language-identification
|
| 5 |
+
inference: false
|
| 6 |
+
license: cc-by-sa-3.0
|
| 7 |
+
language: multilingual
|
| 8 |
+
library_name: staticvectors
|
| 9 |
+
base_model:
|
| 10 |
+
- NeuML/language-id
|
| 11 |
+
---
|
| 12 |
+
|
| 13 |
+
# Language Detection with StaticVectors
|
| 14 |
+
|
| 15 |
+
This model is an export of this [FastText Language Identification model](https://fasttext.cc/docs/en/language-identification.html) for [`staticvectors`](https://github.com/neuml/staticvectors). `staticvectors` enables running inference Python with NumPy, helping it maintain solid runtime performance.
|
| 16 |
+
|
| 17 |
+
Language detection is an important task and identification with n-gram models is an efficient and highly accurate way to do it.
|
| 18 |
+
|
| 19 |
+
_This model is a quantized version of the [base language id model](https://hf.co/neuml/language-id). It's using 2x256 Product Quantization like the original quantized model from FastText. This shrinks this model down to 4MB with only a minor hit on accuracy._
|
| 20 |
+
|
| 21 |
+
## Usage with StaticVectors
|
| 22 |
+
|
| 23 |
+
```python
|
| 24 |
+
from staticvectors import StaticVectors
|
| 25 |
+
|
| 26 |
+
model = StaticVectors("NeuML/language-id-quantized")
|
| 27 |
+
model.predict(["What language is this text?"])
|
| 28 |
+
```
|
config.json
ADDED
|
@@ -0,0 +1,31 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"model_type": "staticvectors",
|
| 3 |
+
"format": "fasttext",
|
| 4 |
+
"source": "lid.176.bin",
|
| 5 |
+
"lr": 0.05,
|
| 6 |
+
"dim": 16,
|
| 7 |
+
"ws": 5,
|
| 8 |
+
"epoch": 10,
|
| 9 |
+
"min_count": 1000,
|
| 10 |
+
"min_count_label": 0,
|
| 11 |
+
"neg": 5,
|
| 12 |
+
"word_ngrams": 1,
|
| 13 |
+
"loss": "hs",
|
| 14 |
+
"model": "supervised",
|
| 15 |
+
"bucket": 2000000,
|
| 16 |
+
"minn": 2,
|
| 17 |
+
"maxn": 4,
|
| 18 |
+
"thread": 12,
|
| 19 |
+
"lr_update_rate": 100,
|
| 20 |
+
"t": 0.0001,
|
| 21 |
+
"label": "__label__",
|
| 22 |
+
"verbose": 2,
|
| 23 |
+
"pretrained_vectors": "",
|
| 24 |
+
"save_output": false,
|
| 25 |
+
"seed": 0,
|
| 26 |
+
"qout": false,
|
| 27 |
+
"retrain": false,
|
| 28 |
+
"qnorm": false,
|
| 29 |
+
"cutoff": 0,
|
| 30 |
+
"dsub": 2
|
| 31 |
+
}
|
model.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:a7a96c90618fcb1e2e6f5364f4a620bf2cd87a3f0d437d685c8c49eada1dc151
|
| 3 |
+
size 4107972
|
vocab.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|