File size: 6,911 Bytes
95ba793
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f514833
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
---

datasets:
- ZhenghanYU/CFunSet
language:
- zho
- eng
- fra
- spa
- por
- deu
- ita
- rus
- jpn
- kor
- vie
- tha
- ara
base_model:
- Qwen/Qwen2.5-7B-Instruct
---

# CFunModel: A Comprehensive Language Model for Chinese Humor Understanding and Generation

CFunModel is a comprehensive language model designed for Chinese humor understanding, generation, and processing. Built on top of **Qwen2.5-7B-Instruct**, CFunModel is fine-tuned on **CFunSet**, a diverse multi-task dataset that aggregates various Chinese humor-related tasks. 

CFunModel outperforms several existing large language models in humor-related tasks, including joke generation, humor recognition, crosstalk response selection, and humor explanation, etc.


### 🔥 Key Features
- 🎭 **Multi-Task Capability:** Supports joke generation, humor recognition, crosstalk continuation, humor explanation, and more.
- 📚 **Extensive Fine-Tuning:** Trained on over **160,000+** humor-related samples aggregated from Tieba-JokeBar, CrossDial, Chumor, HumorWB, and other datasets.
- 🎯 **High Performance:** Consistently achieves superior results in humor tasks compared to baseline models like GPT-4o, DeepSeek-V3, and Qwen2.5-7B-Instruct.
- 🧠 **Understanding and Generation:** Excels at generating humorous responses, explaining humor, and classifying humorous content.

## 🎯 Supported Tasks

CFunModel is capable of solving a variety of humor-related tasks, here are several examples:

### 1. **Joke Continuation**
- **Prompt:** 我将给你笑话的第一句话,请你生成整个笑话。笑话的第一句话如下:
- **Input:** 我对女儿说:再看10分钟的动画片就该睡觉了。
- **Output:** 她抗议说时间太短。“那就600秒,够长了吧?”我说。女儿说:“够了,够了。”

### 2. **Crosstalk Response Selection**
- **Prompt:** 给定一段相声台词,请从多个备选项中选择最合适的逗哏回复。
- **Input:** 已有台词:我是肖飞。|噢。|给大家说一段。|好。|吃五谷杂粮没有不得病的。|是。|我是大病上医院,小病上药店。|小病干嘛不找大夫呀?|挂号费够买药的了。|噢。
候选回复:早上天还没亮,三狗屁就来到了工程队。
反正很重,去买药,下台阶儿。
那天我脚气犯了,上药店买点儿药吧。
都快半夜了,上哪找药店去?
- **Output:** 那天我脚气犯了,上药店买点儿药吧。

### 3. **Humor Recognition**
- **Prompt:** 以下是一段文本,请分析它是否具有幽默性。幽默性指该文本是否可能引起读者发笑,或通过语言技巧(如双关语、讽刺、夸张、荒诞或逻辑上的意外)营造幽默效果。只需要输出“幽默”或“不幽默”。
- **Input:** 问一下已经步入社会的大人们,假如有机会带着现在的记忆回到高一那年,你最想做什么?
- **Output:** 不幽默

### 4. **Joke Continuation**
- **Prompt:** 生成一个关键词为“砍价”的笑话。
- **Input:** -
- **Output:** 老板:一块五一斤。|我:太贵了,五块钱三斤吧。|老板:不行不行。

### 5. **Crosstalk Continuation**
- **Prompt:** 以下是一段相声的前一半内容,请你根据给定的部分续写完成这段相声。
- **Input:** 0:“常言道,烟酒不分家。来,抽一根儿!”|1:“你小子真鬼,用你的孬烟换我的好酒!”|0:“吝啬鬼,你能喝什么好酒?”|1:“谁不知道你呀,犯了瘾十烟屁股吸。”|0:“你馋的不要脸,蹭人家的酒喝。”
- **Output:** 1:“你讨厌讨厌真讨厌,公共场合放毒烟;害人害己害环境,吞云吐雾造污染。浪费开支烧金钱,口干舌燥得咽炎;慢性自杀损寿命,戒烟戒烟快戒烟!”|0:“你混蛋混蛋真混蛋,一喝就高吐一滩;伤头伤胃伤心肝,醉生梦死早完蛋……”|1:“你完蛋!”|0:“你完蛋!”|1:“你是烟鬼,你滚蛋!”

---

### 📊 Model Performance

CFunModel consistently achieves high performance across multiple humor-related tasks. Below is a performance comparison of CFunModel with other state-of-the-art models:

| **Model**              | **Dougen Response (Acc)** | **Penggen Response (Acc)** | **HumorWB (Acc)** |
|------------------------|----------------------------|-----------------------------|-------------------|
| GPT-4o                 | 79.67                      | 73.88                       | 83.41             |
| GPT-4o mini            | 74.14                      | 67.45                       | 84.78             |
| DeepSeek-V3            | 83.66                      | 78.16                       | 85.15             |
| Qwen2.5-7B-Instruct    | 24.74                      | 20.87                       | 79.56             |
| ERNIE                  | 84.54                      | -                           | -                 |
| RoBERTa                | -                          | 76.19                       | -                 |
| **CFunModel (Ours)**    | **91.70**                  | **88.99**                   | **85.98**         |

✅ CFunModel significantly improves on the base model, especially in humor-related tasks, showcasing superior performance and understanding.

---
### Quickstart

Here provides a code similar with the structure of Qwen2.5-7B-Instruct to show you how to use CFunModel to generate humor-related answers. 

```python

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Qwen/Qwen2.5-7B-Instruct"

model = AutoModelForCausalLM.from_pretrained(

    model_name,

    torch_dtype="auto",

    device_map="auto"

)

tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "生成一个主题为家庭琐事的笑话。"

messages = [

    {"role": "system", "content": "You are a helpful assistant."},

    {"role": "user", "content": prompt}

]

text = tokenizer.apply_chat_template(

    messages,

    tokenize=False,

    add_generation_prompt=True

)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(

    **model_inputs,

    max_new_tokens=512

)

generated_ids = [

    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)

]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

```

## 🤝 Citation

If you use CFunModel in your research or applications, please cite:
```

@misc{yu2025cfunmodelfunnylanguagemodel,

title={CFunModel: A "Funny" Language Model Capable of Chinese Humor Generation and Processing},

author={Zhenghan Yu and Xinyu Hu and Xiaojun Wan},

year={2025},

eprint={2503.20417},

archivePrefix={arXiv},

primaryClass={cs.CL},

url={https://arxiv.org/abs/2503.20417}, }

```


🎉 **Happy Experimenting with CFunSet!** 🎉