| ## π Introduction | |
| **Instruction-Tagger** is a powerful model for labeling instructions with task tags. It allows users to easily adjust the proportion of tasks in a dataset. | |
| #### Example Input | |
| >What are the main differences between Type 1 and Type 2 diabetes, and how do their treatment approaches differ?" | |
| #### Example Output | |
| >Medicine | |
| ## π Quick Start | |
| Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents. | |
| ```python | |
| import torch | |
| from transformers import DebertaV2Tokenizer,DebertaV2ForSequenceClassification, Trainer, TrainingArguments | |
| model = DebertaV2ForSequenceClassification.from_pretrained('deberta_cls', num_labels=33).cuda() | |
| tokenizer = DebertaV2Tokenizer.from_pretrained('alibaba-pai/Instruction-Tagger') | |
| labels={14: 'Writting', | |
| 0: 'Common-Sense', | |
| 28: 'Ecology', | |
| 22: 'Medicine', | |
| 17: 'Grammar', | |
| 3: 'Code Generation', | |
| 31: 'Others', | |
| 20: 'Paraphrase', | |
| 19: 'Economy', | |
| 6: 'Code Debug', | |
| 21: 'Reasoning', | |
| 18: 'Computer Science', | |
| 4: 'Technology', | |
| 13: 'Math', | |
| 32: 'Literature', | |
| 26: 'Chemistry', | |
| 15: 'Complex Format', | |
| 25: 'Ethics', | |
| 27: 'Multilingual', | |
| 29: 'Roleplay', | |
| 30: 'Entertainment', | |
| 23: 'Biology', | |
| 16: 'Art', | |
| 10: 'Academic Writing', | |
| 24: 'Health', | |
| 11: 'Philosophy', | |
| 5: 'Sport', | |
| 1: 'History', | |
| 12: 'Music', | |
| 7: 'Toxicity', | |
| 2: 'Law', | |
| 9: 'Physics', | |
| 8: 'Counterfactual'} | |
| def task_cls(pp): | |
| inputs = tokenizer(pp, return_tensors="pt",padding=True).to("cuda") | |
| with torch.no_grad(): | |
| logits = model(**inputs).logits | |
| predicted_class_id = logits.argmax().item() | |
| return labels[predicted_class_id] | |
| instruct=""" | |
| What are the main differences between Type 1 and Type 2 diabetes, and how do their treatment approaches differ?" | |
| """ | |
| tag=task_cls(instruct) | |
| ``` | |
| ## π Evaluation | |
| To assess the accuracy of task classification, we manually evaluate a sample set of 100 entries (not in the training set), resulting in a classification precision of 92%. | |
| ## π Citation | |
| If you find our work helpful, please cite it! | |
| ``` | |
| @misc{TAPIR, | |
| title={Distilling Instruction-following Abilities of Large Language Models with Task-aware Curriculum Planning}, | |
| author={Yuanhao Yue and Chengyu Wang and Jun Huang and Peng Wang}, | |
| year={2024}, | |
| eprint={2405.13448}, | |
| archivePrefix={arXiv}, | |
| primaryClass={cs.CL}, | |
| url={https://arxiv.org/abs/2405.13448}, | |
| } | |
| ``` |