| | --- |
| | license: apache-2.0 |
| | datasets: |
| | - OFA-Sys/OccuQuest |
| | language: |
| | - en |
| | --- |
| | |
| | This is the OccuLLaMA-7B model in [OccuQuest: Mitigating Occupational Bias for Inclusive Large Language Models](https://arxiv.org/abs/2310.16517). |
| |
|
| | The dataset is on [OccuQuest](https://huggingface.co/datasets/OFA-Sys/OccuQuest). |
| |
|
| | Abstract: |
| | The emergence of large language models (LLMs) has revolutionized natural language processing tasks. |
| | However, existing instruction-tuning datasets suffer from occupational bias: the majority of data relates to only a few occupations, which hampers the instruction-tuned LLMs to generate helpful responses to professional queries from practitioners in specific fields. |
| | To mitigate this issue and promote occupation-inclusive LLMs, we create an instruction-tuning dataset named OccuQuest, which contains 110,000+ prompt-completion pairs and 30,000+ dialogues covering over 1,000 occupations in 26 occupational categories. |
| | We systematically request ChatGPT, organizing queries hierarchically based on Occupation, Responsibility, Topic, and Question, to ensure a comprehensive coverage of occupational specialty inquiries. |
| | By comparing with three commonly used datasets (Dolly, ShareGPT, and WizardLM), we observe that OccuQuest exhibits a more balanced distribution across occupations. |
| | Furthermore, we assemble three test sets for comprehensive evaluation, an occu-test set covering 25 occupational categories, an estate set focusing on real estate, and an occu-quora set containing real-world questions from Quora. |
| | We then fine-tune LLaMA on OccuQuest to obtain OccuLLaMA, which significantly outperforms state-of-the-art LLaMA variants (Vicuna, Tulu, and WizardLM) on professional questions in GPT-4 and human evaluations. |
| | Notably, on the occu-quora set, OccuLLaMA reaches a high win rate of 86.4\% against WizardLM. |
| | Furthermore, we demonstrate the potential of combining OccuQuest with other instruction-tuning datasets to enhance the overall performance of LLMs. |
| | By fine-tuning LLaMA on a mixture of OccuQuest and Tulu datasets, we introduce ProLLaMA, which excels in addressing occupational questions and exhibits superior performance in comprehensive evaluations such as MMLU, GSM8K, BBH, and HumanEval. |
| | Among the different LLaMA variants, the 7B and 13B ProLLaMA models achieve the highest performance on MMLU and GSM8K, with the 7B ProLLaMA model demonstrating an improvement of more than 4 points over the other 7B variants on GSM8K. |
| | We open release the dataset and models. |
| |
|
| | Please cite if you use this model: |
| | ``` |
| | @misc{xue2023occuquest, |
| | title={OccuQuest: Mitigating Occupational Bias for Inclusive Large Language Models}, |
| | author={Mingfeng Xue and Dayiheng Liu and Kexin Yang and Guanting Dong and Wenqiang Lei and Zheng Yuan and Chang Zhou and Jingren Zhou}, |
| | year={2023}, |
| | eprint={2310.16517}, |
| | archivePrefix={arXiv}, |
| | primaryClass={cs.CL} |
| | } |
| | ``` |