Upload README.md
Browse files
README.md
CHANGED
|
@@ -5,7 +5,7 @@ license: mit
|
|
| 5 |
|
| 6 |
# model-card-testing
|
| 7 |
|
| 8 |
-
model-card-testing is a
|
| 9 |
|
| 10 |
## Model Details
|
| 11 |
|
|
@@ -24,8 +24,6 @@ Use the code below to get started with the model.
|
|
| 24 |
|
| 25 |
|
| 26 |
|
| 27 |
-
|
| 28 |
-
|
| 29 |
Here is how to use this model to get the features of a given text in Pytorch:
|
| 30 |
|
| 31 |
NOTE: This will need customization/fixing.
|
|
@@ -78,6 +76,11 @@ Using the model in high-stakes settings is out of scope for this model. The mod
|
|
| 78 |
Significant research has explored bias and fairness issues with models for language generation (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)). This model also has persistent bias issues, as highlighted in these demonstrative examples below. Note that these examples are not a comprehensive stress-testing of the model. Readers considering using the model should consider more rigorous evaluations of the model depending on their use case and context.
|
| 79 |
|
| 80 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 81 |
|
| 82 |
|
| 83 |
|
|
|
|
| 5 |
|
| 6 |
# model-card-testing
|
| 7 |
|
| 8 |
+
model-card-testing is a distilled language model that can be used for text generation. Users of this model card should also consider information about the design, training, and limitations of gpt2.
|
| 9 |
|
| 10 |
## Model Details
|
| 11 |
|
|
|
|
| 24 |
|
| 25 |
|
| 26 |
|
|
|
|
|
|
|
| 27 |
Here is how to use this model to get the features of a given text in Pytorch:
|
| 28 |
|
| 29 |
NOTE: This will need customization/fixing.
|
|
|
|
| 76 |
Significant research has explored bias and fairness issues with models for language generation (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)). This model also has persistent bias issues, as highlighted in these demonstrative examples below. Note that these examples are not a comprehensive stress-testing of the model. Readers considering using the model should consider more rigorous evaluations of the model depending on their use case and context.
|
| 77 |
|
| 78 |
|
| 79 |
+
The impact of model compression techniques, such as knowledge distillation, on bias and fairness issues associated with language models is an active area of research. For example:
|
| 80 |
+
- [Silva, Tambwekar and Gombolay (2021)](https://aclanthology.org/2021.naacl-main.189.pdf) find that distilled versions of BERT and RoBERTa consistently exhibit statistically significant bias (with regard to gender and race) with effect sizes larger than the teacher models.
|
| 81 |
+
- [Xu and Hu (2022)](https://arxiv.org/pdf/2201.08542.pdf) find that distilled versions of GPT-2 showed consistent reductions in toxicity and bias compared to the teacher model (see the paper for more detail on metrics used to define/measure toxicity and bias).
|
| 82 |
+
- [Gupta et al. (2022)](https://arxiv.org/pdf/2203.12574.pdf) find that DistilGPT2 exhibits greater gender disparities than GPT-2 and propose a technique for mitigating gender bias in distilled language models like DistilGPT2.
|
| 83 |
+
|
| 84 |
|
| 85 |
|
| 86 |
|