Update README.md
Browse files
README.md
CHANGED
|
@@ -49,7 +49,7 @@ The model demonstrates that learning to critique is more effective than learning
|
|
| 49 |
|
| 50 |
|
| 51 |
### Training Data
|
| 52 |
-
- Dataset: [WebInstruct-CFT-50K](https://huggingface.co/datasets/TIGER-Lab/WebInstruct-CFT
|
| 53 |
- Training format: (input=[query; noisy response], output=critique)
|
| 54 |
- Teacher model: GPT-4o for generating critiques
|
| 55 |
|
|
@@ -60,6 +60,11 @@ The model demonstrates that learning to critique is more effective than learning
|
|
| 60 |
- Training time: ~1 hour with DeepSpeed Zero-3
|
| 61 |
|
| 62 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 63 |
For more details about the model architecture, methodology, and comprehensive evaluation results, please visit our [project webpage](https://tiger-ai-lab.github.io/CritiqueFineTuning).
|
| 64 |
|
| 65 |
|
|
|
|
| 49 |
|
| 50 |
|
| 51 |
### Training Data
|
| 52 |
+
- Dataset: [WebInstruct-CFT-50K](https://huggingface.co/datasets/TIGER-Lab/WebInstruct-CFT)
|
| 53 |
- Training format: (input=[query; noisy response], output=critique)
|
| 54 |
- Teacher model: GPT-4o for generating critiques
|
| 55 |
|
|
|
|
| 60 |
- Training time: ~1 hour with DeepSpeed Zero-3
|
| 61 |
|
| 62 |
|
| 63 |
+
## Evaluation Results
|
| 64 |
+
|
| 65 |
+
|
| 66 |
+

|
| 67 |
+
|
| 68 |
For more details about the model architecture, methodology, and comprehensive evaluation results, please visit our [project webpage](https://tiger-ai-lab.github.io/CritiqueFineTuning).
|
| 69 |
|
| 70 |
|