| --- |
| language: |
| - en |
| - de |
| - es |
| - fr |
| - it |
| - nl |
| - pt |
| license: apache-2.0 |
| tags: |
| - sentence-classification |
| - text-classification |
| - onnx |
| - multilingual |
| datasets: |
| - TigreGotico/sentence-types-multilingual |
| --- |
| |
| # sentence-types |
|
|
| Multilingual sentence-type classifiers (ONNX) trained on |
| [TigreGotico/sentence-types-multilingual](https://huggingface.co/datasets/TigreGotico/sentence-types-multilingual) |
| (9,900 balanced samples per language, 6 classes). |
|
|
| Used by [little_questions](https://github.com/OpenJarbas/little_questions). |
|
|
| ## Classes |
|
|
| `command`, `exclamation`, `polar_question`, `request`, `statement`, `wh_question` |
|
|
| ## Models |
|
|
| | File | Language | |
| |------|----------| |
| | `sentence_type_EN_0.8.0.onnx` | English | |
| | `sentence_type_DE_0.8.0.onnx` | German | |
| | `sentence_type_ES_0.8.0.onnx` | Spanish | |
| | `sentence_type_FR_0.8.0.onnx` | French | |
| | `sentence_type_IT_0.8.0.onnx` | Italian | |
| | `sentence_type_NL_0.8.0.onnx` | Dutch | |
| | `sentence_type_PT_0.8.0.onnx` | Portuguese | |
|
|
| ## Accuracy |
|
|
| | Language | Accuracy | Macro F1 | |
| |----------|----------|----------| |
| | EN | 99.2% | 99.2% | |
| | NL | 98.8% | 98.8% | |
| | FR | 97.1% | 97.1% | |
| | IT | 97.0% | 97.0% | |
| | PT | 95.4% | 95.4% | |
| | DE | 85.6% | 84.9% | |
| | ES | 74.6% | 72.7% | |
|
|
| ## Inference |
|
|
| ```python |
| import onnxruntime as rt, numpy as np, json |
| |
| sess = rt.InferenceSession("sentence_type_EN_0.8.0.onnx") |
| classes = json.loads(sess.get_modelmeta().custom_metadata_map["classes"]) |
| inp = np.array(["Who invented the telephone?"], dtype=object) |
| label_idx, probs = sess.run(None, {"input": inp}) |
| print(classes[int(label_idx[0])]) # wh_question |
| ``` |
|
|