wenhu commited on
Commit
6f413ca
·
1 Parent(s): 440f6e6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -24,7 +24,7 @@ The models are fine-tuned with the MetricInstruct dataset using the original Lla
24
 
25
  TIGERScore significantly surpasses traditional metrics, i.e. BLUE, ROUGE, BARTScore, and BLEURT, and emerging LLM-based metrics as reference-free metrics. Though our dataset was originally sourced from ChatGPT, our distilled model actually outperforms ChatGPT itself, which proves the effectiveness of our filtering strategy. On the unseen task of story generation, TIGERScore also demonstrates reasonable generalization capability.
26
 
27
- | Tasks→ | Summarization | Translation | Data2Text | Long-form QA | MathQA | Instruction | Story-Gen | Average |
28
  |-------------------------------------------|----------------|----------------|----------------|-----------------|----------------|----------------|----------------|----------------|
29
  | GPT-3.5-turbo (few-shot) | **38.50** | 40.53 | 40.20 | 29.33 | **66.46** | 23.20 | 4.77 | 34.71 |
30
  | GPT-4 (zero-shot) | 36.46 | **43.87** | **44.04** | **48.95** | 51.71 | **58.53** | **32.48** | **45.15** |
 
24
 
25
  TIGERScore significantly surpasses traditional metrics, i.e. BLUE, ROUGE, BARTScore, and BLEURT, and emerging LLM-based metrics as reference-free metrics. Though our dataset was originally sourced from ChatGPT, our distilled model actually outperforms ChatGPT itself, which proves the effectiveness of our filtering strategy. On the unseen task of story generation, TIGERScore also demonstrates reasonable generalization capability.
26
 
27
+ | Tasks→ | Summarization | Translation | Data2Text | Long-form QA | MathQA | Instruction Following | Story-Gen | Average |
28
  |-------------------------------------------|----------------|----------------|----------------|-----------------|----------------|----------------|----------------|----------------|
29
  | GPT-3.5-turbo (few-shot) | **38.50** | 40.53 | 40.20 | 29.33 | **66.46** | 23.20 | 4.77 | 34.71 |
30
  | GPT-4 (zero-shot) | 36.46 | **43.87** | **44.04** | **48.95** | 51.71 | **58.53** | **32.48** | **45.15** |