| | --- |
| | license: apache-2.0 |
| | language: |
| | - en |
| | base_model: |
| | - Qwen/Qwen2.5-7B |
| | tags: |
| | - capability-tagging |
| | - cognition |
| | - qwen |
| | --- |
| | # Model Card for CDT-Cognition-Tagger |
| | This model is a key component of the **Cognition-Domain-Task (CDT) framework**, a comprehensive capability framework for Large Language Models presented in our paper CDT: A Comprehensive Capability Framework for Large Language Models Across Cognition, Domain, and Task. It is specifically fine-tuned to classify a given instruction into one of 18 cognitive abilities defined by the CDT framework. |
| | ## Model Details |
| |
|
| | ### Model Description |
| | The Cognition dimension of the CDT framework is inspired by the **Cattell-Horn-Carroll (CHC) theory** of cognitive abilities, adapted for the context of LLMs. This model analyzes an instruction and identifies the primary cognitive skills required to fulfill it. |
| |
|
| | - **Model type:** Qwen2ForCausalLM |
| | - **Language(s) (NLP):** English |
| | - **License:** Apache 2.0 |
| | - **Finetuned from model:** Qwen2.5-7B-Base |
| |
|
| | ### Model Sources |
| |
|
| | <!-- Provide the basic links for the model. --> |
| |
|
| | - **Repository:** https://github.com/Alessa-mo/CDT |
| | - **Paper Link:** https://arxiv.org/abs/2509.24422 |
| |
|
| | ### Basic Usage |
| | Please refer to https://github.com/Alessa-mo/CDT. You can run the following scripts to tag the cognition labels. |
| | ```bash |
| | cd tag_annotate |
| | export CUDA_VISIBLE_DEVICES=0 |
| | python annotate.py \ |
| | --data_path path/to/your/data \ |
| | --output_dir path/to/output/dir \ |
| | --model_path CDT-Cognition-Tagger \ |
| | --prompt_file ./prompt/annotation_prompt.jsonl \ |
| | --cognition_skill_file ./prompt/cognition.json \ |
| | --domain_skill_file ./prompt/domain.json \ |
| | --task_skill_file ./prompt/task.json \ |
| | --tag_type "cognition" \ |
| | --batch_size 32 |
| | ``` |
| | **Note**: Make sure your data is a JSON file and has the following format: |
| | ```json |
| | [ |
| | { |
| | "messages": [ |
| | { |
| | "role": "user", |
| | "content": "xxxx" |
| | }, |
| | { |
| | "role": "assistant", |
| | "content": "xxxx" |
| | } |
| | ] |
| | }, |
| | ] |
| | ``` |
| | ## Citation |
| | If you find this model useful, please cite: |
| | ```bash |
| | @misc{mo2025cdtcomprehensivecapabilityframework, |
| | title={CDT: A Comprehensive Capability Framework for Large Language Models Across Cognition, Domain, and Task}, |
| | author={Haosi Mo and Xinyu Ma and Xuebo Liu and Derek F. Wong and Yu Li and Jie Liu and Min Zhang}, |
| | year={2025}, |
| | eprint={2509.24422}, |
| | archivePrefix={arXiv}, |
| | primaryClass={cs.CL}, |
| | url={https://arxiv.org/abs/2509.24422}, |
| | } |
| | |
| | ``` |