| Roberta-base trained with linearly increasing alpha for alpha-entmax (from 1.0 to 2.0). | |
| To run, do this: | |
| ```python | |
| from sparse_roberta import get_custom_model | |
| # Load tokenizer | |
| tokenizer = AutoTokenizer.from_pretrained('roberta-base') | |
| # Load the model | |
| model = get_custom_model( | |
| 'mtreviso/sparsemax-roberta', | |
| initial_alpha=2.0, | |
| use_triton_entmax=False, | |
| from_scratch=False, | |
| ) | |
| ``` | |
| To run glue tasks, you can use the `run_glue.py` script. For example: | |
| ``` | |
| python run_glue.py \ | |
| --model_name_or_path mtreviso/sparsemax-roberta \ | |
| --config_name roberta-base \ | |
| --tokenizer_name roberta-base \ | |
| --task_name rte \ | |
| --output_dir output-rte \ | |
| --do_train \ | |
| --do_eval \ | |
| --max_seq_length 512 \ | |
| --per_device_train_batch_size 32 \ | |
| --learning_rate 3e-5 \ | |
| --num_train_epochs 3 \ | |
| --save_steps 1000 \ | |
| --logging_steps 100 \ | |
| --save_total_limit 1 \ | |
| --overwrite_output_dir | |
| ``` |