Commit
·
0c85c74
1
Parent(s):
6318c88
updated readme
Browse files
README.md
CHANGED
|
@@ -285,7 +285,7 @@ pipeline_tag: zero-shot-classification
|
|
| 285 |
|
| 286 |
# Model Card for DeBERTa-v3-base-tasksource-nli
|
| 287 |
|
| 288 |
-
This is [DeBERTa-v3-base](https://hf.co/microsoft/deberta-v3-base) fine-tuned with multi-task learning on 600 tasks
|
| 289 |
This checkpoint has strong zero-shot validation performance on many tasks (e.g. 70% on WNLI), and can be used for:
|
| 290 |
- Zero-shot entailment-based classification pipeline (similar to bart-mnli), see [ZS].
|
| 291 |
- Natural language inference, and many other tasks with tasksource-adapters, see [TA]
|
|
@@ -299,40 +299,3 @@ classifier = pipeline("zero-shot-classification",model="Azma-AI/deberta-base-mul
|
|
| 299 |
text = "one day I will see the world"
|
| 300 |
candidate_labels = ['travel', 'cooking', 'dancing']
|
| 301 |
classifier(text, candidate_labels)
|
| 302 |
-
```
|
| 303 |
-
|
| 304 |
-
|
| 305 |
-
## Evaluation
|
| 306 |
-
This model ranked 1st among all models with the microsoft/deberta-v3-base architecture according to the IBM model recycling evaluation.
|
| 307 |
-
https://ibm.github.io/model-recycling/
|
| 308 |
-
|
| 309 |
-
### Software and training details
|
| 310 |
-
|
| 311 |
-
The model was trained on 600 tasks for 200k steps with a batch size of 384 and a peak learning rate of 2e-5. Training took 12 days on Nvidia A30 24GB gpu.
|
| 312 |
-
This is the shared model with the MNLI classifier on top. Each task had a specific CLS embedding, which is dropped 10% of the time to facilitate model use without it. All multiple-choice model used the same classification layers. For classification tasks, models shared weights if their labels matched.
|
| 313 |
-
|
| 314 |
-
|
| 315 |
-
https://github.com/sileod/tasksource/ \
|
| 316 |
-
https://github.com/sileod/tasknet/ \
|
| 317 |
-
Training code: https://colab.research.google.com/drive/1iB4Oxl9_B5W3ZDzXoWJN-olUbqLBxgQS?usp=sharing
|
| 318 |
-
|
| 319 |
-
# Citation
|
| 320 |
-
|
| 321 |
-
More details on this [article:](https://arxiv.org/abs/2301.05948)
|
| 322 |
-
```
|
| 323 |
-
@article{sileo2023tasksource,
|
| 324 |
-
title={tasksource: Structured Dataset Preprocessing Annotations for Frictionless Extreme Multi-Task Learning and Evaluation},
|
| 325 |
-
author={Sileo, Damien},
|
| 326 |
-
url= {https://arxiv.org/abs/2301.05948},
|
| 327 |
-
journal={arXiv preprint arXiv:2301.05948},
|
| 328 |
-
year={2023}
|
| 329 |
-
}
|
| 330 |
-
```
|
| 331 |
-
|
| 332 |
-
|
| 333 |
-
# Model Card Contact
|
| 334 |
-
|
| 335 |
-
damien.sileo@inria.fr
|
| 336 |
-
|
| 337 |
-
|
| 338 |
-
</details>
|
|
|
|
| 285 |
|
| 286 |
# Model Card for DeBERTa-v3-base-tasksource-nli
|
| 287 |
|
| 288 |
+
This is [DeBERTa-v3-base](https://hf.co/microsoft/deberta-v3-base) fine-tuned with multi-task learning on 600 tasks.
|
| 289 |
This checkpoint has strong zero-shot validation performance on many tasks (e.g. 70% on WNLI), and can be used for:
|
| 290 |
- Zero-shot entailment-based classification pipeline (similar to bart-mnli), see [ZS].
|
| 291 |
- Natural language inference, and many other tasks with tasksource-adapters, see [TA]
|
|
|
|
| 299 |
text = "one day I will see the world"
|
| 300 |
candidate_labels = ['travel', 'cooking', 'dancing']
|
| 301 |
classifier(text, candidate_labels)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|