TIGER-Lab
/

Qwen2.5-32B-Instruct-CFT

@@ -1,57 +1,69 @@
----
-license: apache-2.0
-datasets:
-- TIGER-Lab/WebInstruct-CFT
-language:
-- en
-base_model:
-- Qwen/Qwen2.5-32B-Instruct
-tags:
-- cft
-- math
-- reasoning
-pipeline_tag: text-generation
-library_name: transformers
----
-# Qwen2.5-32B-Instruct-CFT
-<div style="display: flex; gap: 4px; align-items: center">
-  <a target="_blank" href="https://github.com/TIGER-AI-Lab/CritiqueFinetuning">
-    <img style="height:18pt" src="https://img.shields.io/badge/-Code-black?style=flat&logo=github"/>
-  </a>
-  <a target="_blank" href="https://arxiv.org/abs/2501.17703">
-    <img style="height:18pt" src="https://img.shields.io/badge/-Paper-green?style=flat&logo=arxiv"/>
-  </a>
-  <a target="_blank" href="https://tiger-ai-lab.github.io/CritiqueFineTuning">
-    <img style="height:18pt" src="https://img.shields.io/badge/-📖%20Website-red?style=flat"/>
-  </a>
-  <a target="_blank" href="https://huggingface.co/datasets/TIGER-Lab/WebInstruct-CFT">
-    <img style="height:18pt" src="https://img.shields.io/badge/-🤗%20Dataset-red?style=flat"/>
-  </a>
-</div>
-## Introduction
-Qwen2.5-32B-Instruct-CFT is a 32B parameter model fine-tuned using our novel Critique Fine-Tuning (CFT) approach. Built upon the Qwen2.5-32B-Instruct base model, this variant is trained to critique and analyze responses rather than simply imitate them, leading to enhanced reasoning capabilities.
-## Key Features
-- Built on the powerful Qwen2.5-32B-Instruct foundation
-- Trained using Critique Fine-Tuning (CFT) methodology
-- Highly efficient training with minimal data requirements
-- Inherits the strong instruction-following capabilities of the base model
-## Training Details
-### Training Data
-- Dataset: [WebInstruct-CFT-4K](https://huggingface.co/datasets/TIGER-Lab/WebInstruct-CFT-4K)
-- Training format: (input=[query; noisy response], output=critique)
-- Teacher model: GPT-4o for generating critiques
-### Training Infrastructure
-- Framework: LLaMA-Factory
-- Hardware: 8x NVIDIA H100 GPUs
-- Training time: ~1.5 hours with DeepSpeed Zero-3
 For more details about the model architecture, methodology, and comprehensive evaluation results, please visit our [project webpage](https://tiger-ai-lab.github.io/CritiqueFineTuning).

+---
+license: apache-2.0
+datasets:
+- TIGER-Lab/WebInstruct-CFT
+language:
+- zho
+- eng
+- fra
+- spa
+- por
+- deu
+- ita
+- rus
+- jpn
+- kor
+- vie
+- tha
+- ara
+base_model:
+- Qwen/Qwen2.5-32B-Instruct
+tags:
+- cft
+- math
+- reasoning
+pipeline_tag: text-generation
+library_name: transformers
+---
+# Qwen2.5-32B-Instruct-CFT
+<div style="display: flex; gap: 4px; align-items: center">
+  <a target="_blank" href="https://github.com/TIGER-AI-Lab/CritiqueFinetuning">
+    <img style="height:18pt" src="https://img.shields.io/badge/-Code-black?style=flat&logo=github"/>
+  </a>
+  <a target="_blank" href="https://arxiv.org/abs/2501.17703">
+    <img style="height:18pt" src="https://img.shields.io/badge/-Paper-green?style=flat&logo=arxiv"/>
+  </a>
+  <a target="_blank" href="https://tiger-ai-lab.github.io/CritiqueFineTuning">
+    <img style="height:18pt" src="https://img.shields.io/badge/-📖%20Website-red?style=flat"/>
+  </a>
+  <a target="_blank" href="https://huggingface.co/datasets/TIGER-Lab/WebInstruct-CFT">
+    <img style="height:18pt" src="https://img.shields.io/badge/-🤗%20Dataset-red?style=flat"/>
+  </a>
+</div>
+## Introduction
+Qwen2.5-32B-Instruct-CFT is a 32B parameter model fine-tuned using our novel Critique Fine-Tuning (CFT) approach. Built upon the Qwen2.5-32B-Instruct base model, this variant is trained to critique and analyze responses rather than simply imitate them, leading to enhanced reasoning capabilities.
+## Key Features
+- Built on the powerful Qwen2.5-32B-Instruct foundation
+- Trained using Critique Fine-Tuning (CFT) methodology
+- Highly efficient training with minimal data requirements
+- Inherits the strong instruction-following capabilities of the base model
+## Training Details
+### Training Data
+- Dataset: [WebInstruct-CFT-4K](https://huggingface.co/datasets/TIGER-Lab/WebInstruct-CFT-4K)
+- Training format: (input=[query; noisy response], output=critique)
+- Teacher model: GPT-4o for generating critiques
+### Training Infrastructure
+- Framework: LLaMA-Factory
+- Hardware: 8x NVIDIA H100 GPUs
+- Training time: ~1.5 hours with DeepSpeed Zero-3
 For more details about the model architecture, methodology, and comprehensive evaluation results, please visit our [project webpage](https://tiger-ai-lab.github.io/CritiqueFineTuning).