| | --- |
| | language: en |
| | thumbnail: https://huggingface.co/front/thumbnails/google.png |
| | license: apache-2.0 |
| | base_model: |
| | - cross-encoder/ms-marco-MiniLM-L-4-v2 |
| | pipeline_tag: text-classification |
| | library_name: transformers |
| | metrics: |
| | - f1 |
| | - precision |
| | - recall |
| | datasets: |
| | - Mozilla/autofill_dataset |
| | --- |
| | |
| | ## Cross-Encoder for MS Marco with TinyBert |
| |
|
| | This is a fine-tuned version of the model checkpointed at [cross-encoder/ms-marco-MiniLM-L-4-v2](https://huggingface.co/cross-encoder/ms-marco-MiniLM-L-4-v2). |
| |
|
| | It was fine-tuned on html tags and labels generated using [Fathom](https://mozilla.github.io/fathom/commands/label.html). |
| |
|
| | ## How to use this model in `transformers` |
| |
|
| | ```python |
| | from transformers import pipeline |
| | |
| | classifier = pipeline( |
| | "text-classification", |
| | model="Mozilla/tinybert-uncased-autofill" |
| | ) |
| | |
| | print( |
| | classifier('Card information input Card number cc-number <SEP> <SEP> input First name <SEP> <SEP>') |
| | ) |
| | |
| | ``` |
| |
|
| | ## Model Training Info |
| | ```python |
| | HyperParameters = { |
| | 'learning_rate': 2.3878733582558547e-05, |
| | 'num_train_epochs': 21, |
| | 'weight_decay': 0.0005288040458920454, |
| | 'per_device_train_batch_size': 32 |
| | } |
| | ``` |
| |
|
| | More information on how the model was trained can be found here: https://github.com/mozilla/smart_autofill |
| | |
| | # Model Performance |
| | ``` |
| | Test Performance: |
| | Precision: 0.913 |
| | Recall: 0.872 |
| | F1: 0.887 |
| | |
| | precision recall f1-score support |
| | |
| | cc-csc 0.943 0.950 0.946 139 |
| | cc-exp 1.000 0.883 0.938 60 |
| | cc-exp-month 0.954 0.922 0.938 90 |
| | cc-exp-year 0.904 0.934 0.919 91 |
| | cc-name 0.835 0.989 0.905 92 |
| | cc-number 0.953 0.970 0.961 167 |
| | cc-type 0.920 0.940 0.930 183 |
| | email 0.918 0.927 0.922 205 |
| | given-name 0.727 0.421 0.533 19 |
| | last-name 0.833 0.588 0.690 17 |
| | other 0.994 0.994 0.994 8000 |
| | postal-code 0.980 0.951 0.965 102 |
| | |
| | accuracy 0.985 9165 |
| | macro avg 0.913 0.872 0.887 9165 |
| | weighted avg 0.986 0.985 0.985 9165 |
| | ``` |