mtreviso
/

sparsemax-roberta

Model card Files Files and versions

sparsemax-roberta / README.md

mtreviso's picture

Update README.md

68722be verified about 1 year ago

|

history blame contribute delete

897 Bytes

	Roberta-base trained with linearly increasing alpha for alpha-entmax (from 1.0 to 2.0).

	To run, do this:
	```python
	from sparse_roberta import get_custom_model

	# Load tokenizer
	tokenizer = AutoTokenizer.from_pretrained('roberta-base')

	# Load the model
	model = get_custom_model(
	'mtreviso/sparsemax-roberta',
	initial_alpha=2.0,
	use_triton_entmax=False,
	from_scratch=False,
	)
	```

	To run glue tasks, you can use the `run_glue.py` script. For example:
	```
	python run_glue.py \
	--model_name_or_path mtreviso/sparsemax-roberta \
	--config_name roberta-base \
	--tokenizer_name roberta-base \
	--task_name rte \
	--output_dir output-rte \
	--do_train \
	--do_eval \
	--max_seq_length 512 \
	--per_device_train_batch_size 32 \
	--learning_rate 3e-5 \
	--num_train_epochs 3 \
	--save_steps 1000 \
	--logging_steps 100 \
	--save_total_limit 1 \
	--overwrite_output_dir
	```