luffycodes
/

parallel-roberta-large

Model card Files Files and versions

luffycodes commited on Jun 8, 2023

Commit

d98c6b5

·

1 Parent(s): d73f03f

Update README.md

Files changed (1) hide show

README.md +14 -1

README.md CHANGED Viewed

@@ -4,7 +4,20 @@ language:
 - en
 ---
 ## Model weights for Parallel Roberta-Large model ##
-To use this model, you need to use the following [modeling_roberta.py](https://github.com/luffycodes/Parallel-Transformers-Pytorch/blob/main/paf_modeling_roberta.py) file.
 If you use this work, please cite:
 Investigating the Role of Feed-Forward Networks in Transformers Using Parallel Attention and Feed-Forward Net Design:

 - en
 ---
 ## Model weights for Parallel Roberta-Large model ##
+We provide the [weights](https://huggingface.co/luffycodes/Parallel-Roberta-Large) for the parallel attention and feedforward design for Roberta-Large.
+![pfa (1)](https://github.com/luffycodes/Parallel-Transformers-Pytorch/assets/22951144/e5b76b1c-5fb1-4263-a23b-a61742fe12ae)
+## Evaluation results
+When fine-tuned on downstream tasks, this model achieves the following results:
+Glue test results:
+| Task | MNLI | QQP  | QNLI | SST-2 | CoLA | STS-B | MRPC | RTE  |
+|:----:|:----:|:----:|:----:|:-----:|:----:|:-----:|:----:|:----:|
+|      | 89.3 | 91.7 | 94.3 | 96.2  | 64.0 | 91.0  | 90.4 | 80.1 |
 If you use this work, please cite:
 Investigating the Role of Feed-Forward Networks in Transformers Using Parallel Attention and Feed-Forward Net Design: