File size: 3,433 Bytes
9bf0f66
 
3723df7
 
 
 
 
 
 
 
6faa22d
9bf0f66
3723df7
 
0ab3f49
 
 
c07051b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0ab3f49
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ec3f716
 
 
 
 
 
 
 
6faa22d
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
---
license: apache-2.0
language:
- zh
- ja
- ar
- en
- hi
metrics:
- accuracy
library_name: allennlp
---
## Language Identification

该模型是基于 AllenNLP 在 [qgyd2021/language_identification](https://huggingface.co/datasets/qgyd2021/language_identification) 数据集上训练的语种识别模型。



在 valid 验证集上的准确率情况:

| 语种 | 样本数量 |  准确率   |
| :--- | :----: |  ------: |
|  af  |  6221  |  0.8666  |
|  ar  |  19808  |  0.9994  |
|  bg  |  19913  |  0.9958  |
|  bn  |  7396  |  0.9968  |
|  bs  |  1653  |  0.8232  |
|  cs  |  19122  |  0.9615  |
|  da  |  19500  |  0.9727  |
|  de  |  19702  |  0.996  |
|  el  |  19455  |  0.9761  |
|  en  |  39710  |  0.9942  |
|  eo  |  18542  |  0.9944  |
|  es  |  19924  |  0.9937  |
|  et  |  19482  |  0.9727  |
|  fi  |  19223  |  0.9554  |
|  fo  |  4612  |  0.9697  |
|  fr  |  19990  |  0.9957  |
|  ga  |  19949  |  0.9973  |
|  gl  |  508  |  0.822  |
|  hi  |  19984  |  0.9965  |
|  hi_en  |  1358  |  0.951  |
|  hr  |  18840  |  0.9789  |
|  hu  |  669  |  0.8873  |
|  hy  |  124  |  0.9688  |
|  id  |  4669  |  0.9968  |
|  is  |  19795  |  0.9876  |
|  it  |  19742  |  0.9941  |
|  ja  |  20130  |  0.9996  |
|  ko  |  20098  |  0.9998  |
|  lt  |  19280  |  0.9721  |
|  lv  |  19459  |  0.9931  |
|  mr  |  10300  |  0.9961  |
|  mt  |  19708  |  0.993  |
|  nl  |  18452  |  0.9258  |
|  no  |  19404  |  0.9714  |
|  pl  |  19920  |  0.9973  |
|  pt  |  19996  |  0.9946  |
|  ro  |  19804  |  0.9944  |
|  ru  |  20003  |  0.9954  |
|  sk  |  19804  |  0.9861  |
|  sl  |  19665  |  0.9926  |
|  sv  |  18941  |  0.95  |
|  sw  |  19768  |  0.9871  |
|  th  |  19917  |  0.9991  |
|  tl  |  19572  |  0.9991  |
|  tn  |  19883  |  0.9933  |
|  tr  |  19809  |  0.9939  |
|  ts  |  19752  |  0.9854  |
|  uk  |  17643  |  0.9994  |
|  ur  |  19895  |  0.992  |
|  vi  |  19836  |  0.9982  |
|  yo  |  1936  |  0.9827  |
|  zh  |  40108  |  0.9996  |
|  zu  |  5406  |  0.9905  |




测试代码:
```python
#!/usr/bin/python3
# -*- coding: utf-8 -*-
import argparse
import time

from allennlp.models.archival import archive_model, load_archive
from allennlp.predictors.text_classifier import TextClassifierPredictor

from project_settings import project_path


def get_args():
    """
    python3 step_5_predict_by_archive.py
    :return:
    """
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "--text",
        default="hello guy.",
        type=str
    )
    parser.add_argument(
        "--archive_file",
        default=(project_path / "trained_models/language_identification").as_posix(),
        type=str
    )
    args = parser.parse_args()
    return args


def main():
    args = get_args()

    archive = load_archive(archive_file=args.archive_file)

    predictor = TextClassifierPredictor(
        model=archive.model,
        dataset_reader=archive.dataset_reader,
    )

    json_dict = {
        "sentence": args.text
    }

    begin_time = time.time()
    outputs = predictor.predict_json(
        json_dict
    )
    label = outputs["label"]
    prob = round(max(outputs["probs"]), 4)
    print(label)
    print(prob)

    print('time cost: {}'.format(time.time() - begin_time))
    return


if __name__ == '__main__':
    main()

```

requirements.txt
```text
allennlp==2.10.1
allennlp-models==2.10.1
torch==1.12.1
overrides==1.9.0
pytorch_pretrained_bert==0.6.2
```