DianJin
/

DianJin-R1-7B

Model card Files Files and versions

MarvelCQ commited on Apr 23, 2025

Commit

71dbd6d

·

verified ·

1 Parent(s): 2078c1a

Update README.md

Files changed (1) hide show

README.md +0 -31

README.md CHANGED Viewed

@@ -1,35 +1,4 @@
 ---
 license: mit
 ---
-## DianJin-R1-Data
-<div align="center">
-  <img alt="image" src="https://raw.githubusercontent.com/aliyun/qwen-dianjin/refs/heads/master/images/dianjin_logo.png">
-  <p align="center">
-        <a href="https://tongyi.aliyun.com/dianjin">Qwen DianJin Platform</a>  |
-        <a href="https://huggingface.co/DianJin">HuggingFace</a>  |
-        <a href="https://modelscope.cn/organization/tongyi_dianjin">ModelScope</a>
-    </p>
-</div>
-### Introduction
-We propose DianJin-R1, a novel framework that enhances financial reasoning in LLMs through reasoning-augmented supervision and reinforcement learning. Central to our approach is DianJin-R1-Data, a high-quality dataset constructed from CFLUE, FinQA, and a proprietary compliance corpus (Chinese Compliance Check, CCC), combining diverse financial reasoning scenarios with verified annotations. We adopt a structured training paradigm where models generate reasoning steps and final answers using supervised fine-tuning. To further improve reasoning quality, we use Group Relative Policy Optimization (GRPO), a reinforcement learning algorithm that incorporates dual reward signals for output structure and answer accuracy. \
-\
-We open-source enhanced versions of CFLUE and Fin-QA datasets. However, due to sensitivity concerns, CCC scenario data will not be made publicly available.
-<div align="center">
-  <img alt="image" src="https://github.com/aliyun/qwen-dianjin/blob/master/DianJin-R1/images/2-step-training.png?raw=true">
-</div>
-#### CFLUE
-It is an open-source Chinese benchmark designed to assess the performance of LLMs on a variety of natural language processing (NLP) tasks within the financial domain. Our enhanced versions of CFLUE includes two parts: \
-\
-**Multiple-Choice Questions** ($CFLUE_{MCQ}$): We leverage DeepSeek-R1, a model known for its strong reasoning capabilities, to generate a chain-of- thought (CoT) along with a predicted answer. Then, we verify the predicted answers by comparing them with the glod answers and select the correct ones to construct this dataset.\
-\
-**Open-ended Questions** ($CFLUE_{OE}$): First, we begin by using GPT-4o to convert each multiple-choice question of CFLUE into an open-ended format. Then, we leverage DeepSeek-R1 to generate a chain-of- thought (CoT) along with a predicted answer. Finally, we employ GPT-4o as a verifier to assess two key aspects of the generated output: (1) whether the predicted answer matches the gold answer, and (2) whether the generated reasoning is consistent with the reference explanation ei. If both conditions are satisfied, we retain the instance as a valid reasoning sample.
-#### Fin-QA
-It is an open-source English benchmark containing 8,281 financial question-answer pairs that require numerical reasoning over financial reports. \
-\
-Different from instances in CFLUE, the QA pairs in FinQA are already in an open-ended format. We leverage DeepSeek-R1 to generate a chain-of- thought (CoT) along with a predicted answer, then use GPT-4o to verify the answers and select the correct ones to construct this dataset.

 ---
 license: mit
 ---