alibaba-pai
/

Instruction-Tagger

Model card Files Files and versions

Instruction-Tagger / README.md

Bohr's picture

Create README.md

de43994 verified over 1 year ago

|

history blame contribute delete

2.43 kB

	## 📖 Introduction

	Instruction-Tagger is a powerful model for labeling instructions with task tags. It allows users to easily adjust the proportion of tasks in a dataset.

	#### Example Input

	>What are the main differences between Type 1 and Type 2 diabetes, and how do their treatment approaches differ?"

	#### Example Output
	>Medicine


	## 🚀 Quick Start

	Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents.

	```python
	import torch
	from transformers import DebertaV2Tokenizer,DebertaV2ForSequenceClassification, Trainer, TrainingArguments

	model = DebertaV2ForSequenceClassification.from_pretrained('deberta_cls', num_labels=33).cuda()
	tokenizer = DebertaV2Tokenizer.from_pretrained('alibaba-pai/Instruction-Tagger')

	labels={14: 'Writting',
	0: 'Common-Sense',
	28: 'Ecology',
	22: 'Medicine',
	17: 'Grammar',
	3: 'Code Generation',
	31: 'Others',
	20: 'Paraphrase',
	19: 'Economy',
	6: 'Code Debug',
	21: 'Reasoning',
	18: 'Computer Science',
	4: 'Technology',
	13: 'Math',
	32: 'Literature',
	26: 'Chemistry',
	15: 'Complex Format',
	25: 'Ethics',
	27: 'Multilingual',
	29: 'Roleplay',
	30: 'Entertainment',
	23: 'Biology',
	16: 'Art',
	10: 'Academic Writing',
	24: 'Health',
	11: 'Philosophy',
	5: 'Sport',
	1: 'History',
	12: 'Music',
	7: 'Toxicity',
	2: 'Law',
	9: 'Physics',
	8: 'Counterfactual'}

	def task_cls(pp):
	inputs = tokenizer(pp, return_tensors="pt",padding=True).to("cuda")

	with torch.no_grad():
	logits = model(**inputs).logits

	predicted_class_id = logits.argmax().item()

	return labels[predicted_class_id]

	instruct="""
	What are the main differences between Type 1 and Type 2 diabetes, and how do their treatment approaches differ?"
	"""

	tag=task_cls(instruct)
	```

	## 🔍 Evaluation

	To assess the accuracy of task classification, we manually evaluate a sample set of 100 entries (not in the training set), resulting in a classification precision of 92%.

	## 📜 Citation

	If you find our work helpful, please cite it!

	```
	@misc{TAPIR,
	title={Distilling Instruction-following Abilities of Large Language Models with Task-aware Curriculum Planning},
	author={Yuanhao Yue and Chengyu Wang and Jun Huang and Peng Wang},
	year={2024},
	eprint={2405.13448},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2405.13448},
	}
	```