Include error counts in Popular Paper section
Browse files
README.md
CHANGED
|
@@ -82,53 +82,59 @@ Considering the following input annotated sentences:
|
|
| 82 |
The output for different modes and error_formats is:
|
| 83 |
```python
|
| 84 |
>>> faireval.compute(predictions=y_pred, references=y_true, mode='fair', error_format='count')
|
| 85 |
-
{
|
| 86 |
-
"trad_prec": 0.5,
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
"trad_prec": 0.0,
|
| 90 |
-
|
| 91 |
-
|
| 92 |
-
"trad_prec": 0.5,
|
| 93 |
-
|
| 94 |
-
|
| 95 |
-
|
| 96 |
-
|
| 97 |
-
|
| 98 |
-
|
| 99 |
-
|
| 100 |
-
|
| 101 |
-
|
| 102 |
-
|
| 103 |
-
|
| 104 |
-
|
| 105 |
-
|
| 106 |
```
|
| 107 |
|
| 108 |
```python
|
| 109 |
>>> faireval.compute(predictions=y_pred, references=y_true, mode='traditional', error_format='count')
|
| 110 |
-
{
|
| 111 |
-
|
| 112 |
-
|
| 113 |
-
|
| 114 |
-
|
| 115 |
-
|
| 116 |
-
|
| 117 |
-
|
| 118 |
-
|
|
|
|
|
|
|
|
|
|
| 119 |
```
|
| 120 |
|
| 121 |
```python
|
| 122 |
>>> faireval.compute(predictions=y_pred, references=y_true, mode='traditional', error_format='error_ratio')
|
| 123 |
-
{
|
| 124 |
-
|
| 125 |
-
|
| 126 |
-
|
| 127 |
-
|
| 128 |
-
|
| 129 |
-
|
| 130 |
-
|
| 131 |
-
|
|
|
|
|
|
|
|
|
|
| 132 |
```
|
| 133 |
|
| 134 |
#### Values from Popular Papers
|
|
@@ -143,6 +149,46 @@ A basic [DistilBERT model](https://huggingface.co/docs/transformers/model_doc/di
|
|
| 143 |
| seqeval strict | 0.2222 | 0.3425 | 0.0413 | 0.3598 | 0.0 | 0.0408 | 0.0 |
|
| 144 |
| seqeval relaxed | 0.2803 | 0.4124 | 0.0412 | 0.4105 | 0.0 | 0.1985 | 0.0 |
|
| 145 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 146 |
## Limitations and Bias
|
| 147 |
The metric is restricted to the input schemes admitted by seqeval. For example, the application does not support numerical
|
| 148 |
label inputs (odd for Beginning, even for Inside and zero for Outside).
|
|
|
|
| 82 |
The output for different modes and error_formats is:
|
| 83 |
```python
|
| 84 |
>>> faireval.compute(predictions=y_pred, references=y_true, mode='fair', error_format='count')
|
| 85 |
+
{"PER": {"precision": 1.0,"recall": 0.5,"f1": 0.6666,
|
| 86 |
+
"trad_prec": 0.5,"trad_rec": 0.5,"trad_f1": 0.5,
|
| 87 |
+
"TP": 1,"FP": 0.0,"FN": 1.0,"LE": 0.0,"BE": 0.0,"LBE": 0.0},
|
| 88 |
+
"INT": {"precision": 0.0,"recall": 0.0,"f1": 0.0,
|
| 89 |
+
"trad_prec": 0.0,"trad_rec": 0.0,"trad_f1": 0.0,
|
| 90 |
+
"TP": 0,"FP": 0.0,"FN": 0.0,"LE": 0.0,"BE": 1.0,"LBE": 1.0},
|
| 91 |
+
"OUT": {"precision": 0.6666,"recall": 0.6666,"f1": 0.666,
|
| 92 |
+
"trad_prec": 0.5,"trad_rec": 0.5,"trad_f1": 0.5,
|
| 93 |
+
"TP": 1,"FP": 0.0,"FN": 0.0,"LE": 1.0,"BE": 0.0,"LBE": 0.0},
|
| 94 |
+
"overall_precision": 0.5714,
|
| 95 |
+
"overall_recall": 0.4444,
|
| 96 |
+
"overall_f1": 0.5,
|
| 97 |
+
"overall_trad_prec": 0.4,
|
| 98 |
+
"overall_trad_rec": 0.3333,
|
| 99 |
+
"overall_trad_f1": 0.3636,
|
| 100 |
+
"TP": 2,
|
| 101 |
+
"FP": 0.0,
|
| 102 |
+
"FN": 1.0,
|
| 103 |
+
"LE": 1.0,
|
| 104 |
+
"BE": 1.0,
|
| 105 |
+
"LBE": 1.0}
|
| 106 |
```
|
| 107 |
|
| 108 |
```python
|
| 109 |
>>> faireval.compute(predictions=y_pred, references=y_true, mode='traditional', error_format='count')
|
| 110 |
+
{"PER": {"precision": 0.5,"recall": 0.5,"f1": 0.5,
|
| 111 |
+
"TP": 1,"FP": 1.0,"FN": 1.0},
|
| 112 |
+
"INT": {"precision": 0.0,"recall": 0.0,"f1": 0.0,
|
| 113 |
+
"TP": 0,"FP": 1.0,"FN": 2.0},
|
| 114 |
+
"OUT": {"precision": 0.5,"recall": 0.5,"f1": 0.5,
|
| 115 |
+
"TP": 1,"FP": 1.0,"FN": 1.0},
|
| 116 |
+
"overall_precision": 0.4,
|
| 117 |
+
"overall_recall": 0.3333,
|
| 118 |
+
"overall_f1": 0.3636,
|
| 119 |
+
"TP": 2,
|
| 120 |
+
"FP": 3.0,
|
| 121 |
+
"FN": 4.0}
|
| 122 |
```
|
| 123 |
|
| 124 |
```python
|
| 125 |
>>> faireval.compute(predictions=y_pred, references=y_true, mode='traditional', error_format='error_ratio')
|
| 126 |
+
{"PER": {"precision": 0.5,"recall": 0.5,"f1": 0.5,
|
| 127 |
+
"TP": 1,"FP": 0.1428,"FN": 0.1428},
|
| 128 |
+
"INT": {"precision": 0.0,"recall": 0.0,"f1": 0.0,
|
| 129 |
+
"TP": 0,"FP": 0.14285714285714285,"FN": 0.2857},
|
| 130 |
+
"OUT": {"precision": 0.5,"recall": 0.5,"f1": 0.5,
|
| 131 |
+
"TP": 1,"FP": 0.1428,"FN": 0.1428},
|
| 132 |
+
"overall_precision": 0.4,
|
| 133 |
+
"overall_recall": 0.3333,
|
| 134 |
+
"overall_f1": 0.3636,
|
| 135 |
+
"TP": 2,
|
| 136 |
+
"FP": 0.4285,
|
| 137 |
+
"FN": 0.5714}
|
| 138 |
```
|
| 139 |
|
| 140 |
#### Values from Popular Papers
|
|
|
|
| 149 |
| seqeval strict | 0.2222 | 0.3425 | 0.0413 | 0.3598 | 0.0 | 0.0408 | 0.0 |
|
| 150 |
| seqeval relaxed | 0.2803 | 0.4124 | 0.0412 | 0.4105 | 0.0 | 0.1985 | 0.0 |
|
| 151 |
|
| 152 |
+
The traditional count of evaluation parameters would be:
|
| 153 |
+
|
| 154 |
+
| | Overall | Location | Group | Person | Creative Work | Corporation | Product |
|
| 155 |
+
|----|---------|----------|-------|--------|---------------|-------------|---------|
|
| 156 |
+
| TP | 211 | 53 | 4 | 140 | 0 | 14 | 0 |
|
| 157 |
+
| FP | 353 | 42 | 42 | 174 | 1 | 70 | 0 |
|
| 158 |
+
| FN | 730 | 144 | 144 | 228 | 116 | 43 | 114 |
|
| 159 |
+
|
| 160 |
+
While the fair evaluation parameter count (`error_format='count'`) is:
|
| 161 |
+
|
| 162 |
+
| | Overall | Location | Group | Person | Creative Work | Corporation | Product |
|
| 163 |
+
|-----|---------|----------|-------|--------|---------------|-------------|---------|
|
| 164 |
+
| TP | 211 | 53 | 4 | 140 | 0 | 0 | 0 |
|
| 165 |
+
| FP | 125 | 9 | 21 | 62 | 1 | 32 | 0 |
|
| 166 |
+
| FN | 544 | 59 | 115 | 153 | 95 | 34 | 88 |
|
| 167 |
+
| BE | 105 | 11 | 4 | 87 | 0 | 3 | 0 |
|
| 168 |
+
| LE | 66 | 7 | 20 | 12 | 7 | 6 | 14 |
|
| 169 |
+
| LBE | 57 | 10 | 6 | 9 | 15 | 2 | 15 |
|
| 170 |
+
|
| 171 |
+
Thus, ratio of each fair error parameter with respect to the total number of errors (`error_format='error_ratio'`) is:
|
| 172 |
+
|
| 173 |
+
| | Overall | Location | Group | Person | Creative Work | Corporation | Product |
|
| 174 |
+
|-----|---------|----------|--------|--------|---------------|-------------|---------|
|
| 175 |
+
| FP | 13,94% | 1,00% | 2,34% | 6,91% | 0,11% | 3,57% | 0,00% |
|
| 176 |
+
| FN | 60,65% | 6,58% | 12,82% | 17,06% | 10,59% | 3,79% | 9,81% |
|
| 177 |
+
| BE | 11,71% | 1,23% | 0,45% | 9,70% | 0,00% | 0,33% | 0,00% |
|
| 178 |
+
| LE | 7,36% | 0,78% | 2,23% | 1,34% | 0,78% | 0,67% | 1,56% |
|
| 179 |
+
| LBE | 6,35% | 1,11% | 0,67% | 1,00% | 1,67% | 0,22% | 1,67% |
|
| 180 |
+
|
| 181 |
+
And the ratio of each fair parameter with respect to the total number of entities (`error_format='entity_ratio'`) is:
|
| 182 |
+
|
| 183 |
+
| | Overall | Location | Group | Person | Creative Work | Corporation | Product |
|
| 184 |
+
|-----|---------|----------|--------|--------|---------------|-------------|---------|
|
| 185 |
+
| TP | 19,04% | 4,78% | 0,36% | 12,64% | 0,00% | 0,00% | 0,00% |
|
| 186 |
+
| FP | 11,28% | 0,81% | 1,90% | 5,60% | 0,09% | 2,89% | 0,00% |
|
| 187 |
+
| FN | 49,10% | 5,32% | 10,38% | 13,81% | 8,57% | 3,07% | 7,94% |
|
| 188 |
+
| BE | 9,48% | 0,99% | 0,36% | 7,85% | 0,00% | 0,27% | 0,00% |
|
| 189 |
+
| LE | 5,96% | 0,63% | 1,81% | 1,08% | 0,63% | 0,54% | 1,26% |
|
| 190 |
+
| LBE | 5,14% | 0,90% | 0,54% | 0,81% | 1,35% | 0,18% | 1,35% |
|
| 191 |
+
|
| 192 |
## Limitations and Bias
|
| 193 |
The metric is restricted to the input schemes admitted by seqeval. For example, the application does not support numerical
|
| 194 |
label inputs (odd for Beginning, even for Inside and zero for Outside).
|