herrius
/

unit_test

Model card Files Files and versions

unit_test / data /other_results.md

herrius's picture

Upload 259 files

32b542e about 3 years ago

|

history blame contribute delete

2 kB

	## GLUE results
	We also evalute the language understanding performance of Uni-Perceiver on GLUE benchmarks.
	The results are listed as below.

	<table border="1" width="100%">
	<tr align="center">
	<th>Dataset</th>
	<th>MNLI</th>
	<th>QNLI</th> <th>QQP</th> <th>RTE</th> <th>SST-2</th> <th>MRPC</th> <th>CoLA</th>
	</tr>

	<tr align="center">
	<td>Metric</td><td>Acc</td><td>Acc</td><td>F1</td><td>Acc</td><td>Acc</td><td>F1</td><td>Acc</td>
	</tr>


	<tr align="center">
	<td>Uni-Perceiver<sub>BASE</sub> </td><td>79.7</td><td>87.3</td><td>86.7 </td><td>71.1 </td><td>89.3 </td><td>86.0 </td><td>43.1</td>
	</tr>
	<tr align="center">
	<td>Uni-Perceiver-MoE<sub>BASE</sub> </td><td>81.5</td><td>88.2</td><td>87.8 </td><td>75.8</td><td>90.9 </td><td>87.1 </td><td>52.2</td>
	</tr>
	<tr align="center">
	<td>Uni-Perceiver<sub>LARGE</sub> </td><td>82.5</td><td>89.2</td><td>87.7 </td><td>73.7</td><td>91.2 </td><td>90.2</td><td>52.0</td>
	</tr>
	<tr align="center">
	<td>Uni-Perceiver-MoE<sub>LARGE</sub> </td><td>85.7</td><td>91.9</td><td>89.5 </td><td>78.4</td><td>93.4 </td><td>91.2</td><td>57.4</td>
	</tr>
	</table>

	---

	* All fine-tuning experiments are performed on 1 GPU.

	* We use the hyper-parameters for GLUE tasks from [fair-seq](https://github.com/facebookresearch/fairseq/blob/main/examples/bart/README.glue.md)

	Model \| MNLI \| QNLI \| QQP \| RTE \| SST-2 \| MRPC \| CoLA \| STS-B
	---\|---\|---\|---\|---\|---\|---\|---\|---
	`--num-classes` \| 3 \| 2 \| 2 \| 2 \| 2 \| 2 \| 2 \| 1
	`--lr` \| 5e-6 \| 1e-5 \| 1e-5 \| 1e-5 \| 5e-6 \| 2e-5 \| 2e-5 \| 2e-5
	`bsz` \| 128 \| 32 \| 32 \| 32 \| 128 \| 64 \| 64 \| 32
	`--total-num-update` \| 30968 \| 33112 \| 113272 \| 1018 \| 5233 \| 1148 \| 1334 \| 1799
	`--warmup-updates` \| 1858 \| 1986 \| 6796 \| 61 \| 314 \| 68 \| 80 \| 107 \| 1334 \| 1799
	`--warmup-updates` \| 1858 \| 1986 \| 6796 \| 61 \| 314 \| 68 \| 80 \| 107

	* Following RoBerta, we finetune RTE, STS and MRPC starting from the
	MNLI single-task model, rather than the baseline
	pretrained model.