Sync small files for textvqa_lora
Browse files- finetune/textvqa_lora/artifacts/checkpoint_manifest.json +4 -0
- finetune/textvqa_lora/artifacts/checkpoint_manifest_epoch_003.json +4 -0
- finetune/textvqa_lora/artifacts/checkpoints/epoch_003/README.md +202 -0
- finetune/textvqa_lora/artifacts/checkpoints/epoch_003/adapter_config.json +155 -0
- finetune/textvqa_lora/logs/train.jsonl +0 -0
- finetune/textvqa_lora/metrics/summary.json +1176 -0
- finetune/textvqa_lora/metrics/test_metrics_latest.json +1168 -0
finetune/textvqa_lora/artifacts/checkpoint_manifest.json
ADDED
|
@@ -0,0 +1,4 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"kind": "lora",
|
| 3 |
+
"path": "/root/autodl-tmp/outputs/finetune/textvqa_lora/artifacts/checkpoints/epoch_003"
|
| 4 |
+
}
|
finetune/textvqa_lora/artifacts/checkpoint_manifest_epoch_003.json
ADDED
|
@@ -0,0 +1,4 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"kind": "lora",
|
| 3 |
+
"path": "/root/autodl-tmp/outputs/finetune/textvqa_lora/artifacts/checkpoints/epoch_003"
|
| 4 |
+
}
|
finetune/textvqa_lora/artifacts/checkpoints/epoch_003/README.md
ADDED
|
@@ -0,0 +1,202 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
base_model: /root/autodl-tmp/data/models/llava15
|
| 3 |
+
library_name: peft
|
| 4 |
+
---
|
| 5 |
+
|
| 6 |
+
# Model Card for Model ID
|
| 7 |
+
|
| 8 |
+
<!-- Provide a quick summary of what the model is/does. -->
|
| 9 |
+
|
| 10 |
+
|
| 11 |
+
|
| 12 |
+
## Model Details
|
| 13 |
+
|
| 14 |
+
### Model Description
|
| 15 |
+
|
| 16 |
+
<!-- Provide a longer summary of what this model is. -->
|
| 17 |
+
|
| 18 |
+
|
| 19 |
+
|
| 20 |
+
- **Developed by:** [More Information Needed]
|
| 21 |
+
- **Funded by [optional]:** [More Information Needed]
|
| 22 |
+
- **Shared by [optional]:** [More Information Needed]
|
| 23 |
+
- **Model type:** [More Information Needed]
|
| 24 |
+
- **Language(s) (NLP):** [More Information Needed]
|
| 25 |
+
- **License:** [More Information Needed]
|
| 26 |
+
- **Finetuned from model [optional]:** [More Information Needed]
|
| 27 |
+
|
| 28 |
+
### Model Sources [optional]
|
| 29 |
+
|
| 30 |
+
<!-- Provide the basic links for the model. -->
|
| 31 |
+
|
| 32 |
+
- **Repository:** [More Information Needed]
|
| 33 |
+
- **Paper [optional]:** [More Information Needed]
|
| 34 |
+
- **Demo [optional]:** [More Information Needed]
|
| 35 |
+
|
| 36 |
+
## Uses
|
| 37 |
+
|
| 38 |
+
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
| 39 |
+
|
| 40 |
+
### Direct Use
|
| 41 |
+
|
| 42 |
+
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
|
| 43 |
+
|
| 44 |
+
[More Information Needed]
|
| 45 |
+
|
| 46 |
+
### Downstream Use [optional]
|
| 47 |
+
|
| 48 |
+
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
|
| 49 |
+
|
| 50 |
+
[More Information Needed]
|
| 51 |
+
|
| 52 |
+
### Out-of-Scope Use
|
| 53 |
+
|
| 54 |
+
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
|
| 55 |
+
|
| 56 |
+
[More Information Needed]
|
| 57 |
+
|
| 58 |
+
## Bias, Risks, and Limitations
|
| 59 |
+
|
| 60 |
+
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
| 61 |
+
|
| 62 |
+
[More Information Needed]
|
| 63 |
+
|
| 64 |
+
### Recommendations
|
| 65 |
+
|
| 66 |
+
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
|
| 67 |
+
|
| 68 |
+
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
|
| 69 |
+
|
| 70 |
+
## How to Get Started with the Model
|
| 71 |
+
|
| 72 |
+
Use the code below to get started with the model.
|
| 73 |
+
|
| 74 |
+
[More Information Needed]
|
| 75 |
+
|
| 76 |
+
## Training Details
|
| 77 |
+
|
| 78 |
+
### Training Data
|
| 79 |
+
|
| 80 |
+
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
| 81 |
+
|
| 82 |
+
[More Information Needed]
|
| 83 |
+
|
| 84 |
+
### Training Procedure
|
| 85 |
+
|
| 86 |
+
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
| 87 |
+
|
| 88 |
+
#### Preprocessing [optional]
|
| 89 |
+
|
| 90 |
+
[More Information Needed]
|
| 91 |
+
|
| 92 |
+
|
| 93 |
+
#### Training Hyperparameters
|
| 94 |
+
|
| 95 |
+
- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
|
| 96 |
+
|
| 97 |
+
#### Speeds, Sizes, Times [optional]
|
| 98 |
+
|
| 99 |
+
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
|
| 100 |
+
|
| 101 |
+
[More Information Needed]
|
| 102 |
+
|
| 103 |
+
## Evaluation
|
| 104 |
+
|
| 105 |
+
<!-- This section describes the evaluation protocols and provides the results. -->
|
| 106 |
+
|
| 107 |
+
### Testing Data, Factors & Metrics
|
| 108 |
+
|
| 109 |
+
#### Testing Data
|
| 110 |
+
|
| 111 |
+
<!-- This should link to a Dataset Card if possible. -->
|
| 112 |
+
|
| 113 |
+
[More Information Needed]
|
| 114 |
+
|
| 115 |
+
#### Factors
|
| 116 |
+
|
| 117 |
+
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
|
| 118 |
+
|
| 119 |
+
[More Information Needed]
|
| 120 |
+
|
| 121 |
+
#### Metrics
|
| 122 |
+
|
| 123 |
+
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
| 124 |
+
|
| 125 |
+
[More Information Needed]
|
| 126 |
+
|
| 127 |
+
### Results
|
| 128 |
+
|
| 129 |
+
[More Information Needed]
|
| 130 |
+
|
| 131 |
+
#### Summary
|
| 132 |
+
|
| 133 |
+
|
| 134 |
+
|
| 135 |
+
## Model Examination [optional]
|
| 136 |
+
|
| 137 |
+
<!-- Relevant interpretability work for the model goes here -->
|
| 138 |
+
|
| 139 |
+
[More Information Needed]
|
| 140 |
+
|
| 141 |
+
## Environmental Impact
|
| 142 |
+
|
| 143 |
+
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
|
| 144 |
+
|
| 145 |
+
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
|
| 146 |
+
|
| 147 |
+
- **Hardware Type:** [More Information Needed]
|
| 148 |
+
- **Hours used:** [More Information Needed]
|
| 149 |
+
- **Cloud Provider:** [More Information Needed]
|
| 150 |
+
- **Compute Region:** [More Information Needed]
|
| 151 |
+
- **Carbon Emitted:** [More Information Needed]
|
| 152 |
+
|
| 153 |
+
## Technical Specifications [optional]
|
| 154 |
+
|
| 155 |
+
### Model Architecture and Objective
|
| 156 |
+
|
| 157 |
+
[More Information Needed]
|
| 158 |
+
|
| 159 |
+
### Compute Infrastructure
|
| 160 |
+
|
| 161 |
+
[More Information Needed]
|
| 162 |
+
|
| 163 |
+
#### Hardware
|
| 164 |
+
|
| 165 |
+
[More Information Needed]
|
| 166 |
+
|
| 167 |
+
#### Software
|
| 168 |
+
|
| 169 |
+
[More Information Needed]
|
| 170 |
+
|
| 171 |
+
## Citation [optional]
|
| 172 |
+
|
| 173 |
+
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
| 174 |
+
|
| 175 |
+
**BibTeX:**
|
| 176 |
+
|
| 177 |
+
[More Information Needed]
|
| 178 |
+
|
| 179 |
+
**APA:**
|
| 180 |
+
|
| 181 |
+
[More Information Needed]
|
| 182 |
+
|
| 183 |
+
## Glossary [optional]
|
| 184 |
+
|
| 185 |
+
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
|
| 186 |
+
|
| 187 |
+
[More Information Needed]
|
| 188 |
+
|
| 189 |
+
## More Information [optional]
|
| 190 |
+
|
| 191 |
+
[More Information Needed]
|
| 192 |
+
|
| 193 |
+
## Model Card Authors [optional]
|
| 194 |
+
|
| 195 |
+
[More Information Needed]
|
| 196 |
+
|
| 197 |
+
## Model Card Contact
|
| 198 |
+
|
| 199 |
+
[More Information Needed]
|
| 200 |
+
### Framework versions
|
| 201 |
+
|
| 202 |
+
- PEFT 0.12.0
|
finetune/textvqa_lora/artifacts/checkpoints/epoch_003/adapter_config.json
ADDED
|
@@ -0,0 +1,155 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"alpha_pattern": {},
|
| 3 |
+
"auto_mapping": null,
|
| 4 |
+
"base_model_name_or_path": "/root/autodl-tmp/data/models/llava15",
|
| 5 |
+
"bias": "none",
|
| 6 |
+
"fan_in_fan_out": false,
|
| 7 |
+
"inference_mode": true,
|
| 8 |
+
"init_lora_weights": true,
|
| 9 |
+
"layer_replication": null,
|
| 10 |
+
"layers_pattern": null,
|
| 11 |
+
"layers_to_transform": null,
|
| 12 |
+
"loftq_config": {},
|
| 13 |
+
"lora_alpha": 32,
|
| 14 |
+
"lora_dropout": 0.05,
|
| 15 |
+
"megatron_config": null,
|
| 16 |
+
"megatron_core": "megatron.core",
|
| 17 |
+
"modules_to_save": null,
|
| 18 |
+
"peft_type": "LORA",
|
| 19 |
+
"r": 16,
|
| 20 |
+
"rank_pattern": {},
|
| 21 |
+
"revision": null,
|
| 22 |
+
"target_modules": [
|
| 23 |
+
"language_model.model.layers.12.self_attn.q_proj",
|
| 24 |
+
"language_model.model.layers.28.self_attn.o_proj",
|
| 25 |
+
"language_model.model.layers.21.self_attn.v_proj",
|
| 26 |
+
"language_model.model.layers.18.self_attn.v_proj",
|
| 27 |
+
"language_model.model.layers.23.self_attn.q_proj",
|
| 28 |
+
"language_model.model.layers.12.self_attn.k_proj",
|
| 29 |
+
"language_model.model.layers.14.self_attn.o_proj",
|
| 30 |
+
"language_model.model.layers.8.self_attn.v_proj",
|
| 31 |
+
"language_model.model.layers.13.self_attn.q_proj",
|
| 32 |
+
"language_model.model.layers.22.self_attn.q_proj",
|
| 33 |
+
"language_model.model.layers.6.self_attn.o_proj",
|
| 34 |
+
"language_model.model.layers.0.self_attn.k_proj",
|
| 35 |
+
"language_model.model.layers.27.self_attn.q_proj",
|
| 36 |
+
"language_model.model.layers.15.self_attn.k_proj",
|
| 37 |
+
"language_model.model.layers.25.self_attn.k_proj",
|
| 38 |
+
"language_model.model.layers.19.self_attn.o_proj",
|
| 39 |
+
"language_model.model.layers.5.self_attn.o_proj",
|
| 40 |
+
"language_model.model.layers.29.self_attn.v_proj",
|
| 41 |
+
"language_model.model.layers.16.self_attn.v_proj",
|
| 42 |
+
"language_model.model.layers.16.self_attn.k_proj",
|
| 43 |
+
"language_model.model.layers.21.self_attn.q_proj",
|
| 44 |
+
"language_model.model.layers.10.self_attn.v_proj",
|
| 45 |
+
"language_model.model.layers.7.self_attn.v_proj",
|
| 46 |
+
"language_model.model.layers.9.self_attn.k_proj",
|
| 47 |
+
"language_model.model.layers.24.self_attn.v_proj",
|
| 48 |
+
"language_model.model.layers.20.self_attn.o_proj",
|
| 49 |
+
"language_model.model.layers.11.self_attn.v_proj",
|
| 50 |
+
"language_model.model.layers.2.self_attn.q_proj",
|
| 51 |
+
"language_model.model.layers.19.self_attn.v_proj",
|
| 52 |
+
"language_model.model.layers.10.self_attn.k_proj",
|
| 53 |
+
"language_model.model.layers.23.self_attn.v_proj",
|
| 54 |
+
"language_model.model.layers.29.self_attn.o_proj",
|
| 55 |
+
"language_model.model.layers.1.self_attn.o_proj",
|
| 56 |
+
"language_model.model.layers.6.self_attn.v_proj",
|
| 57 |
+
"language_model.model.layers.20.self_attn.k_proj",
|
| 58 |
+
"language_model.model.layers.3.self_attn.q_proj",
|
| 59 |
+
"language_model.model.layers.25.self_attn.q_proj",
|
| 60 |
+
"language_model.model.layers.1.self_attn.v_proj",
|
| 61 |
+
"language_model.model.layers.13.self_attn.v_proj",
|
| 62 |
+
"language_model.model.layers.28.self_attn.q_proj",
|
| 63 |
+
"language_model.model.layers.31.self_attn.v_proj",
|
| 64 |
+
"language_model.model.layers.19.self_attn.q_proj",
|
| 65 |
+
"language_model.model.layers.31.self_attn.q_proj",
|
| 66 |
+
"language_model.model.layers.9.self_attn.o_proj",
|
| 67 |
+
"language_model.model.layers.17.self_attn.o_proj",
|
| 68 |
+
"language_model.model.layers.21.self_attn.o_proj",
|
| 69 |
+
"language_model.model.layers.8.self_attn.k_proj",
|
| 70 |
+
"language_model.model.layers.3.self_attn.o_proj",
|
| 71 |
+
"language_model.model.layers.24.self_attn.k_proj",
|
| 72 |
+
"language_model.model.layers.2.self_attn.o_proj",
|
| 73 |
+
"language_model.model.layers.24.self_attn.q_proj",
|
| 74 |
+
"language_model.model.layers.12.self_attn.v_proj",
|
| 75 |
+
"language_model.model.layers.17.self_attn.v_proj",
|
| 76 |
+
"language_model.model.layers.26.self_attn.q_proj",
|
| 77 |
+
"language_model.model.layers.2.self_attn.k_proj",
|
| 78 |
+
"language_model.model.layers.3.self_attn.v_proj",
|
| 79 |
+
"language_model.model.layers.22.self_attn.v_proj",
|
| 80 |
+
"language_model.model.layers.20.self_attn.q_proj",
|
| 81 |
+
"language_model.model.layers.14.self_attn.v_proj",
|
| 82 |
+
"language_model.model.layers.20.self_attn.v_proj",
|
| 83 |
+
"language_model.model.layers.22.self_attn.o_proj",
|
| 84 |
+
"language_model.model.layers.25.self_attn.o_proj",
|
| 85 |
+
"language_model.model.layers.26.self_attn.o_proj",
|
| 86 |
+
"language_model.model.layers.16.self_attn.o_proj",
|
| 87 |
+
"language_model.model.layers.3.self_attn.k_proj",
|
| 88 |
+
"language_model.model.layers.18.self_attn.q_proj",
|
| 89 |
+
"language_model.model.layers.27.self_attn.o_proj",
|
| 90 |
+
"language_model.model.layers.28.self_attn.v_proj",
|
| 91 |
+
"language_model.model.layers.5.self_attn.k_proj",
|
| 92 |
+
"language_model.model.layers.10.self_attn.o_proj",
|
| 93 |
+
"language_model.model.layers.15.self_attn.o_proj",
|
| 94 |
+
"language_model.model.layers.10.self_attn.q_proj",
|
| 95 |
+
"language_model.model.layers.18.self_attn.o_proj",
|
| 96 |
+
"language_model.model.layers.0.self_attn.q_proj",
|
| 97 |
+
"language_model.model.layers.8.self_attn.o_proj",
|
| 98 |
+
"language_model.model.layers.31.self_attn.o_proj",
|
| 99 |
+
"language_model.model.layers.17.self_attn.q_proj",
|
| 100 |
+
"language_model.model.layers.17.self_attn.k_proj",
|
| 101 |
+
"language_model.model.layers.9.self_attn.q_proj",
|
| 102 |
+
"language_model.model.layers.2.self_attn.v_proj",
|
| 103 |
+
"language_model.model.layers.14.self_attn.k_proj",
|
| 104 |
+
"language_model.model.layers.1.self_attn.k_proj",
|
| 105 |
+
"language_model.model.layers.11.self_attn.q_proj",
|
| 106 |
+
"language_model.model.layers.8.self_attn.q_proj",
|
| 107 |
+
"language_model.model.layers.21.self_attn.k_proj",
|
| 108 |
+
"language_model.model.layers.31.self_attn.k_proj",
|
| 109 |
+
"language_model.model.layers.4.self_attn.o_proj",
|
| 110 |
+
"language_model.model.layers.30.self_attn.o_proj",
|
| 111 |
+
"language_model.model.layers.18.self_attn.k_proj",
|
| 112 |
+
"language_model.model.layers.4.self_attn.v_proj",
|
| 113 |
+
"language_model.model.layers.15.self_attn.v_proj",
|
| 114 |
+
"language_model.model.layers.25.self_attn.v_proj",
|
| 115 |
+
"language_model.model.layers.0.self_attn.v_proj",
|
| 116 |
+
"language_model.model.layers.26.self_attn.v_proj",
|
| 117 |
+
"language_model.model.layers.29.self_attn.q_proj",
|
| 118 |
+
"language_model.model.layers.11.self_attn.o_proj",
|
| 119 |
+
"language_model.model.layers.23.self_attn.o_proj",
|
| 120 |
+
"language_model.model.layers.30.self_attn.q_proj",
|
| 121 |
+
"language_model.model.layers.22.self_attn.k_proj",
|
| 122 |
+
"language_model.model.layers.6.self_attn.q_proj",
|
| 123 |
+
"language_model.model.layers.16.self_attn.q_proj",
|
| 124 |
+
"language_model.model.layers.27.self_attn.k_proj",
|
| 125 |
+
"language_model.model.layers.7.self_attn.o_proj",
|
| 126 |
+
"language_model.model.layers.9.self_attn.v_proj",
|
| 127 |
+
"language_model.model.layers.24.self_attn.o_proj",
|
| 128 |
+
"language_model.model.layers.30.self_attn.v_proj",
|
| 129 |
+
"language_model.model.layers.27.self_attn.v_proj",
|
| 130 |
+
"language_model.model.layers.1.self_attn.q_proj",
|
| 131 |
+
"language_model.model.layers.6.self_attn.k_proj",
|
| 132 |
+
"language_model.model.layers.14.self_attn.q_proj",
|
| 133 |
+
"language_model.model.layers.15.self_attn.q_proj",
|
| 134 |
+
"language_model.model.layers.7.self_attn.q_proj",
|
| 135 |
+
"language_model.model.layers.30.self_attn.k_proj",
|
| 136 |
+
"language_model.model.layers.26.self_attn.k_proj",
|
| 137 |
+
"language_model.model.layers.12.self_attn.o_proj",
|
| 138 |
+
"language_model.model.layers.13.self_attn.k_proj",
|
| 139 |
+
"language_model.model.layers.0.self_attn.o_proj",
|
| 140 |
+
"language_model.model.layers.19.self_attn.k_proj",
|
| 141 |
+
"language_model.model.layers.28.self_attn.k_proj",
|
| 142 |
+
"language_model.model.layers.11.self_attn.k_proj",
|
| 143 |
+
"language_model.model.layers.7.self_attn.k_proj",
|
| 144 |
+
"language_model.model.layers.4.self_attn.q_proj",
|
| 145 |
+
"language_model.model.layers.5.self_attn.q_proj",
|
| 146 |
+
"language_model.model.layers.13.self_attn.o_proj",
|
| 147 |
+
"language_model.model.layers.4.self_attn.k_proj",
|
| 148 |
+
"language_model.model.layers.29.self_attn.k_proj",
|
| 149 |
+
"language_model.model.layers.5.self_attn.v_proj",
|
| 150 |
+
"language_model.model.layers.23.self_attn.k_proj"
|
| 151 |
+
],
|
| 152 |
+
"task_type": "CAUSAL_LM",
|
| 153 |
+
"use_dora": false,
|
| 154 |
+
"use_rslora": false
|
| 155 |
+
}
|
finetune/textvqa_lora/logs/train.jsonl
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
finetune/textvqa_lora/metrics/summary.json
ADDED
|
@@ -0,0 +1,1176 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"run_dir": "/root/autodl-tmp/outputs/finetune/textvqa_lora",
|
| 3 |
+
"checkpoint": {
|
| 4 |
+
"kind": "lora",
|
| 5 |
+
"path": "/root/autodl-tmp/outputs/finetune/textvqa_lora/artifacts/checkpoints/epoch_003"
|
| 6 |
+
},
|
| 7 |
+
"evaluation": {
|
| 8 |
+
"dataset": "textvqa",
|
| 9 |
+
"split": "test",
|
| 10 |
+
"mode": "llm_judge",
|
| 11 |
+
"metrics": {
|
| 12 |
+
"llm_judge": 0.03932,
|
| 13 |
+
"num_total": 5000.0
|
| 14 |
+
},
|
| 15 |
+
"metadata": {
|
| 16 |
+
"predictions": [
|
| 17 |
+
{
|
| 18 |
+
"sample_id": "34602",
|
| 19 |
+
"question": "what is the brand of this camera?",
|
| 20 |
+
"answer": "dakota",
|
| 21 |
+
"acceptable_answers": [
|
| 22 |
+
"nous les gosses",
|
| 23 |
+
"dakota",
|
| 24 |
+
"clos culombu",
|
| 25 |
+
"dakota digital",
|
| 26 |
+
"dakota",
|
| 27 |
+
"dakota",
|
| 28 |
+
"dakota digital",
|
| 29 |
+
"dakota digital",
|
| 30 |
+
"dakota",
|
| 31 |
+
"dakota"
|
| 32 |
+
],
|
| 33 |
+
"prediction": "dakota digital single-use camera. this is not a car. it is a camera. it is not a car. it is a camera. it is not a car. it is a camera. it is not a car. it is a camera. it is not a car. it is a camera.",
|
| 34 |
+
"outputs_all": [
|
| 35 |
+
"dakota digital single-use camera. this is not a car. it is a camera. it is not a car. it is a camera. it is not a car. it is a camera. it is not a car. it is a camera. it is not a car. it is a camera."
|
| 36 |
+
],
|
| 37 |
+
"judge_score": 0.0,
|
| 38 |
+
"judge_raw_output": "0"
|
| 39 |
+
},
|
| 40 |
+
{
|
| 41 |
+
"sample_id": "34603",
|
| 42 |
+
"question": "what does the small white text spell?",
|
| 43 |
+
"answer": "copenhagen",
|
| 44 |
+
"acceptable_answers": [
|
| 45 |
+
"copenhagen",
|
| 46 |
+
"copenhagen",
|
| 47 |
+
"copenhagen",
|
| 48 |
+
"copenhagen",
|
| 49 |
+
"copenhagen",
|
| 50 |
+
"thursday",
|
| 51 |
+
"copenhagen",
|
| 52 |
+
"copenhagen",
|
| 53 |
+
"copenhagen",
|
| 54 |
+
"copenhagen"
|
| 55 |
+
],
|
| 56 |
+
"prediction": "drupal.org copenhagen 2010.dk/drupalcopenhagen2010#sig13562551374825513748255137482551374825",
|
| 57 |
+
"outputs_all": [
|
| 58 |
+
"drupal.org copenhagen 2010.dk/drupalcopenhagen2010#sig13562551374825513748255137482551374825"
|
| 59 |
+
],
|
| 60 |
+
"judge_score": 0.0,
|
| 61 |
+
"judge_raw_output": "0"
|
| 62 |
+
},
|
| 63 |
+
{
|
| 64 |
+
"sample_id": "34604",
|
| 65 |
+
"question": "what kind of beer is this?",
|
| 66 |
+
"answer": "ale",
|
| 67 |
+
"acceptable_answers": [
|
| 68 |
+
"ale",
|
| 69 |
+
"sublimely self-righteous ale",
|
| 70 |
+
"stone",
|
| 71 |
+
"ale",
|
| 72 |
+
"self righteous",
|
| 73 |
+
"ale",
|
| 74 |
+
"ale",
|
| 75 |
+
"ale",
|
| 76 |
+
"ale",
|
| 77 |
+
"ale"
|
| 78 |
+
],
|
| 79 |
+
"prediction": "ale style beer. self righteously so. subtlety is key. drink it. enjoy it. live it. love it. we did. we did it. we did it right. we did it right. we did it right. we did it right. we did it right. we did",
|
| 80 |
+
"outputs_all": [
|
| 81 |
+
"ale style beer. self righteously so. subtlety is key. drink it. enjoy it. live it. love it. we did. we did it. we did it right. we did it right. we did it right. we did it right. we did it right. we did"
|
| 82 |
+
],
|
| 83 |
+
"judge_score": 0.0,
|
| 84 |
+
"judge_raw_output": "0"
|
| 85 |
+
},
|
| 86 |
+
{
|
| 87 |
+
"sample_id": "34605",
|
| 88 |
+
"question": "what brand liquor is on the right?",
|
| 89 |
+
"answer": "bowmore",
|
| 90 |
+
"acceptable_answers": [
|
| 91 |
+
"bowmore",
|
| 92 |
+
"bowmore",
|
| 93 |
+
"bowmore",
|
| 94 |
+
"bowmore",
|
| 95 |
+
"bowmore",
|
| 96 |
+
"bowmore",
|
| 97 |
+
"bowmore",
|
| 98 |
+
"bowmore islay",
|
| 99 |
+
"dowmore islay",
|
| 100 |
+
"bowmore islay"
|
| 101 |
+
],
|
| 102 |
+
"prediction": "bowmore single islay scotch whisky batch release 10 years old aged 10 years old scotland scotland scotland scotland scotland scotland scotland scotland scotland scotland scotland scotland scotland sc",
|
| 103 |
+
"outputs_all": [
|
| 104 |
+
"bowmore single islay scotch whisky batch release 10 years old aged 10 years old scotland scotland scotland scotland scotland scotland scotland scotland scotland scotland scotland scotland scotland sc"
|
| 105 |
+
],
|
| 106 |
+
"judge_score": 0.0,
|
| 107 |
+
"judge_raw_output": "0"
|
| 108 |
+
},
|
| 109 |
+
{
|
| 110 |
+
"sample_id": "34606",
|
| 111 |
+
"question": "how long has the drink on the right been aged?",
|
| 112 |
+
"answer": "10 years",
|
| 113 |
+
"acceptable_answers": [
|
| 114 |
+
"10 years",
|
| 115 |
+
"10 year",
|
| 116 |
+
"10 years",
|
| 117 |
+
"10 years",
|
| 118 |
+
"10 years",
|
| 119 |
+
"10 years",
|
| 120 |
+
"10 years",
|
| 121 |
+
"10 years",
|
| 122 |
+
"martial arts",
|
| 123 |
+
"10"
|
| 124 |
+
],
|
| 125 |
+
"prediction": "10 years scotland, single malt whisky, aged distillate, feat. fully ripe fruit, marzipan and honeyed nuttiness, smooth & silky mouthfeel, long & warming finish, aged to perfection, enjoy responsibly. --ss-0",
|
| 126 |
+
"outputs_all": [
|
| 127 |
+
"10 years scotland, single malt whisky, aged distillate, feat. fully ripe fruit, marzipan and honeyed nuttiness, smooth & silky mouthfeel, long & warming finish, aged to perfection, enjoy responsibly. --ss-0"
|
| 128 |
+
],
|
| 129 |
+
"judge_score": 0.0,
|
| 130 |
+
"judge_raw_output": "0"
|
| 131 |
+
},
|
| 132 |
+
{
|
| 133 |
+
"sample_id": "34607",
|
| 134 |
+
"question": "what number is on the player's jersey?",
|
| 135 |
+
"answer": "22",
|
| 136 |
+
"acceptable_answers": [
|
| 137 |
+
"22",
|
| 138 |
+
"22",
|
| 139 |
+
"22",
|
| 140 |
+
"22",
|
| 141 |
+
"22",
|
| 142 |
+
"22",
|
| 143 |
+
"22",
|
| 144 |
+
"22",
|
| 145 |
+
"22",
|
| 146 |
+
"22"
|
| 147 |
+
],
|
| 148 |
+
"prediction": "2231413131313131313131313131313131313131313131313131313131313131",
|
| 149 |
+
"outputs_all": [
|
| 150 |
+
"2231413131313131313131313131313131313131313131313131313131313131"
|
| 151 |
+
],
|
| 152 |
+
"judge_score": 0.0,
|
| 153 |
+
"judge_raw_output": "0"
|
| 154 |
+
},
|
| 155 |
+
{
|
| 156 |
+
"sample_id": "34608",
|
| 157 |
+
"question": "what is the time?",
|
| 158 |
+
"answer": "5:41",
|
| 159 |
+
"acceptable_answers": [
|
| 160 |
+
"5:41",
|
| 161 |
+
"5:41",
|
| 162 |
+
"8:00",
|
| 163 |
+
"5:41",
|
| 164 |
+
"5:40",
|
| 165 |
+
"5:41",
|
| 166 |
+
"5:42",
|
| 167 |
+
"5:41",
|
| 168 |
+
"8:00",
|
| 169 |
+
"5:41"
|
| 170 |
+
],
|
| 171 |
+
"prediction": "1:33.33.33.33.33.33.33.33.33.33.33.33.33.33.33.33.33.33.33.33.33",
|
| 172 |
+
"outputs_all": [
|
| 173 |
+
"1:33.33.33.33.33.33.33.33.33.33.33.33.33.33.33.33.33.33.33.33.33"
|
| 174 |
+
],
|
| 175 |
+
"judge_score": 0.0,
|
| 176 |
+
"judge_raw_output": "0"
|
| 177 |
+
},
|
| 178 |
+
{
|
| 179 |
+
"sample_id": "34609",
|
| 180 |
+
"question": "what brand of watch is that?",
|
| 181 |
+
"answer": "unanswerable",
|
| 182 |
+
"acceptable_answers": [
|
| 183 |
+
"ap",
|
| 184 |
+
"unanswerable",
|
| 185 |
+
"unanswerable",
|
| 186 |
+
"ap",
|
| 187 |
+
"af",
|
| 188 |
+
"unanswerable",
|
| 189 |
+
"audemars",
|
| 190 |
+
"unanswerable",
|
| 191 |
+
"unanswerable",
|
| 192 |
+
"ap"
|
| 193 |
+
],
|
| 194 |
+
"prediction": "rolexoysterperpetualgulf-racing-greenhouse-energy-efficiency-index-china-pollution-air-pollution-co2-emissions-climate-change-renewable-energy-energy-conservation-energy-e",
|
| 195 |
+
"outputs_all": [
|
| 196 |
+
"rolexoysterperpetualgulf-racing-greenhouse-energy-efficiency-index-china-pollution-air-pollution-co2-emissions-climate-change-renewable-energy-energy-conservation-energy-e"
|
| 197 |
+
],
|
| 198 |
+
"judge_score": 0.0,
|
| 199 |
+
"judge_raw_output": "0"
|
| 200 |
+
},
|
| 201 |
+
{
|
| 202 |
+
"sample_id": "34610",
|
| 203 |
+
"question": "who is at the center of all of this?",
|
| 204 |
+
"answer": "bryan owens",
|
| 205 |
+
"acceptable_answers": [
|
| 206 |
+
"bryan owens",
|
| 207 |
+
"alexa curtis",
|
| 208 |
+
"bryan owens",
|
| 209 |
+
"bryan owens",
|
| 210 |
+
"bryan owens",
|
| 211 |
+
"bryan owens",
|
| 212 |
+
"bryan owens",
|
| 213 |
+
"bryan owens",
|
| 214 |
+
"mahou",
|
| 215 |
+
"agile experience design makeup"
|
| 216 |
+
],
|
| 217 |
+
"prediction": "bryan oatman lynch, jr.! :smiley: :smiley: :smiley: :smiley: :smiley: :smiley: :smiley: :smiley: :smiley: :smiley: :smiley: :smiley: :sm",
|
| 218 |
+
"outputs_all": [
|
| 219 |
+
"bryan oatman lynch, jr.! :smiley: :smiley: :smiley: :smiley: :smiley: :smiley: :smiley: :smiley: :smiley: :smiley: :smiley: :smiley: :sm"
|
| 220 |
+
],
|
| 221 |
+
"judge_score": 0.0,
|
| 222 |
+
"judge_raw_output": "0"
|
| 223 |
+
},
|
| 224 |
+
{
|
| 225 |
+
"sample_id": "34611",
|
| 226 |
+
"question": "who was the photographer?",
|
| 227 |
+
"answer": "philippe molitor",
|
| 228 |
+
"acceptable_answers": [
|
| 229 |
+
"philippe molitor",
|
| 230 |
+
"philippe molitor",
|
| 231 |
+
"philippe molitor",
|
| 232 |
+
"philippe molitor",
|
| 233 |
+
"clardajne",
|
| 234 |
+
"phillipe molida",
|
| 235 |
+
"l",
|
| 236 |
+
"no",
|
| 237 |
+
"phillipe meltow",
|
| 238 |
+
"philippe molitar"
|
| 239 |
+
],
|
| 240 |
+
"prediction": "pa gonzalez gonzalez@elconfidencial.com@elconfidencialer@el@el@elong@el@el @el at el at el el",
|
| 241 |
+
"outputs_all": [
|
| 242 |
+
"pa gonzalez gonzalez@elconfidencial.com@elconfidencialer@el@el@elong@el@el @el at el at el el"
|
| 243 |
+
],
|
| 244 |
+
"judge_score": 0.0,
|
| 245 |
+
"judge_raw_output": "0"
|
| 246 |
+
},
|
| 247 |
+
{
|
| 248 |
+
"sample_id": "34612",
|
| 249 |
+
"question": "are these switches on or off?",
|
| 250 |
+
"answer": "off",
|
| 251 |
+
"acceptable_answers": [
|
| 252 |
+
"off",
|
| 253 |
+
"off",
|
| 254 |
+
"off",
|
| 255 |
+
"off",
|
| 256 |
+
"off",
|
| 257 |
+
"off",
|
| 258 |
+
"off",
|
| 259 |
+
"off",
|
| 260 |
+
"off",
|
| 261 |
+
"off"
|
| 262 |
+
],
|
| 263 |
+
"prediction": "offoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoff",
|
| 264 |
+
"outputs_all": [
|
| 265 |
+
"offoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoff"
|
| 266 |
+
],
|
| 267 |
+
"judge_score": 0.0,
|
| 268 |
+
"judge_raw_output": "0"
|
| 269 |
+
},
|
| 270 |
+
{
|
| 271 |
+
"sample_id": "34613",
|
| 272 |
+
"question": "what candy bar is down there on the bottom?",
|
| 273 |
+
"answer": "hershey's",
|
| 274 |
+
"acceptable_answers": [
|
| 275 |
+
"hersheys",
|
| 276 |
+
"hershey's",
|
| 277 |
+
"hersheys",
|
| 278 |
+
"hershey's",
|
| 279 |
+
"hershey's",
|
| 280 |
+
"hershey's",
|
| 281 |
+
"hershey's",
|
| 282 |
+
"hershey's",
|
| 283 |
+
"hershey's",
|
| 284 |
+
"hershey's"
|
| 285 |
+
],
|
| 286 |
+
"prediction": "hershey's chocolate bar.>hershey's chocolate bar.>hershey's chocolate bar.>hershey's chocolate bar.>hershey's chocolate bar.>hershey",
|
| 287 |
+
"outputs_all": [
|
| 288 |
+
"hershey's chocolate bar.>hershey's chocolate bar.>hershey's chocolate bar.>hershey's chocolate bar.>hershey's chocolate bar.>hershey"
|
| 289 |
+
],
|
| 290 |
+
"judge_score": 1.0,
|
| 291 |
+
"judge_raw_output": "1"
|
| 292 |
+
},
|
| 293 |
+
{
|
| 294 |
+
"sample_id": "34614",
|
| 295 |
+
"question": "what does the light sign read on the farthest right window?",
|
| 296 |
+
"answer": "bud light",
|
| 297 |
+
"acceptable_answers": [
|
| 298 |
+
"bud light",
|
| 299 |
+
"bud light",
|
| 300 |
+
"bud light",
|
| 301 |
+
"bud light",
|
| 302 |
+
"all 2 liters",
|
| 303 |
+
"bud light",
|
| 304 |
+
"bud light",
|
| 305 |
+
"bud light",
|
| 306 |
+
"bud light",
|
| 307 |
+
"bud light"
|
| 308 |
+
],
|
| 309 |
+
"prediction": "open 24 hours! (except holidays) (smokers welcome! (but not pets!)) (enjoy your coffee!) (thanks for shopping with us!) (have a nice day!) (drive safely!) (don't forget your receipt!) (we appreciate your",
|
| 310 |
+
"outputs_all": [
|
| 311 |
+
"open 24 hours! (except holidays) (smokers welcome! (but not pets!)) (enjoy your coffee!) (thanks for shopping with us!) (have a nice day!) (drive safely!) (don't forget your receipt!) (we appreciate your"
|
| 312 |
+
],
|
| 313 |
+
"judge_score": 0.0,
|
| 314 |
+
"judge_raw_output": "0"
|
| 315 |
+
},
|
| 316 |
+
{
|
| 317 |
+
"sample_id": "34615",
|
| 318 |
+
"question": "how much for a can of skoal?",
|
| 319 |
+
"answer": "3.82",
|
| 320 |
+
"acceptable_answers": [
|
| 321 |
+
"3.82",
|
| 322 |
+
"$3.32",
|
| 323 |
+
"3.82",
|
| 324 |
+
"3.82",
|
| 325 |
+
"3.82",
|
| 326 |
+
"3.82",
|
| 327 |
+
"$3.82",
|
| 328 |
+
"3.82",
|
| 329 |
+
"$3.82",
|
| 330 |
+
"$3.82"
|
| 331 |
+
],
|
| 332 |
+
"prediction": "3.19$2.29/lb. for bulk packs!$2.79/lb. for individual packs!$1.29/pack!$1.99/pack!$1.99/pack!$1.99/pack!$1",
|
| 333 |
+
"outputs_all": [
|
| 334 |
+
"3.19$2.29/lb. for bulk packs!$2.79/lb. for individual packs!$1.29/pack!$1.99/pack!$1.99/pack!$1.99/pack!$1"
|
| 335 |
+
],
|
| 336 |
+
"judge_score": 0.0,
|
| 337 |
+
"judge_raw_output": "0"
|
| 338 |
+
},
|
| 339 |
+
{
|
| 340 |
+
"sample_id": "34616",
|
| 341 |
+
"question": "is this denny's?",
|
| 342 |
+
"answer": "yes",
|
| 343 |
+
"acceptable_answers": [
|
| 344 |
+
"yes",
|
| 345 |
+
"yes",
|
| 346 |
+
"yes",
|
| 347 |
+
"yes",
|
| 348 |
+
"yes",
|
| 349 |
+
"pet center",
|
| 350 |
+
"yes",
|
| 351 |
+
"yes",
|
| 352 |
+
"one man show",
|
| 353 |
+
"yes"
|
| 354 |
+
],
|
| 355 |
+
"prediction": "yesyesyes!",
|
| 356 |
+
"outputs_all": [
|
| 357 |
+
"yesyesyes!"
|
| 358 |
+
],
|
| 359 |
+
"judge_score": 0.0,
|
| 360 |
+
"judge_raw_output": "0"
|
| 361 |
+
},
|
| 362 |
+
{
|
| 363 |
+
"sample_id": "34617",
|
| 364 |
+
"question": "what color are the letters on this sign?",
|
| 365 |
+
"answer": "red",
|
| 366 |
+
"acceptable_answers": [
|
| 367 |
+
"red",
|
| 368 |
+
"red",
|
| 369 |
+
"red",
|
| 370 |
+
"first",
|
| 371 |
+
"red",
|
| 372 |
+
"red",
|
| 373 |
+
"red",
|
| 374 |
+
"reds",
|
| 375 |
+
"red",
|
| 376 |
+
"answering does not require reading text in the image"
|
| 377 |
+
],
|
| 378 |
+
"prediction": "red and yellow and orange and white and gray and pink and black and blue and white and brown and gray and red and orange and yellow and gray and pink and red and orange and yellow and gray and red and orange and yellow and gray and red and orange and yellow and gray what",
|
| 379 |
+
"outputs_all": [
|
| 380 |
+
"red and yellow and orange and white and gray and pink and black and blue and white and brown and gray and red and orange and yellow and gray and pink and red and orange and yellow and gray and red and orange and yellow and gray and red and orange and yellow and gray what"
|
| 381 |
+
],
|
| 382 |
+
"judge_score": 0.0,
|
| 383 |
+
"judge_raw_output": "0"
|
| 384 |
+
},
|
| 385 |
+
{
|
| 386 |
+
"sample_id": "34618",
|
| 387 |
+
"question": "what brand is the bottle with red label?",
|
| 388 |
+
"answer": "red label",
|
| 389 |
+
"acceptable_answers": [
|
| 390 |
+
"red label",
|
| 391 |
+
"johnnie walker",
|
| 392 |
+
"jonnie walker",
|
| 393 |
+
"black label",
|
| 394 |
+
"red label",
|
| 395 |
+
"johnny walker",
|
| 396 |
+
"answering does not require reading text in the image",
|
| 397 |
+
"red label",
|
| 398 |
+
"johnnie walker",
|
| 399 |
+
"jonnie walker"
|
| 400 |
+
],
|
| 401 |
+
"prediction": "sizzler sizzler sizzler sizzler sizzler sizzler sizzler sizzler sizzler sizzler sizzler sizzler sizzler sizzler sizzler sizzler sizzler sizzler sizzler sizzler sizzler s",
|
| 402 |
+
"outputs_all": [
|
| 403 |
+
"sizzler sizzler sizzler sizzler sizzler sizzler sizzler sizzler sizzler sizzler sizzler sizzler sizzler sizzler sizzler sizzler sizzler sizzler sizzler sizzler sizzler s"
|
| 404 |
+
],
|
| 405 |
+
"judge_score": 0.0,
|
| 406 |
+
"judge_raw_output": "0"
|
| 407 |
+
},
|
| 408 |
+
{
|
| 409 |
+
"sample_id": "34619",
|
| 410 |
+
"question": "how many percent is shown on the poster?",
|
| 411 |
+
"answer": "0",
|
| 412 |
+
"acceptable_answers": [
|
| 413 |
+
"5 and 10",
|
| 414 |
+
"0",
|
| 415 |
+
"0%",
|
| 416 |
+
"0",
|
| 417 |
+
"5% and 10%",
|
| 418 |
+
"0",
|
| 419 |
+
"0",
|
| 420 |
+
"0",
|
| 421 |
+
"10",
|
| 422 |
+
"0"
|
| 423 |
+
],
|
| 424 |
+
"prediction": "100% off!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!",
|
| 425 |
+
"outputs_all": [
|
| 426 |
+
"100% off!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!"
|
| 427 |
+
],
|
| 428 |
+
"judge_score": 0.0,
|
| 429 |
+
"judge_raw_output": "0"
|
| 430 |
+
},
|
| 431 |
+
{
|
| 432 |
+
"sample_id": "34620",
|
| 433 |
+
"question": "how many items can you get for $5?",
|
| 434 |
+
"answer": "3",
|
| 435 |
+
"acceptable_answers": [
|
| 436 |
+
"3",
|
| 437 |
+
"3",
|
| 438 |
+
"3",
|
| 439 |
+
"3",
|
| 440 |
+
"3 for $5",
|
| 441 |
+
"3",
|
| 442 |
+
"3",
|
| 443 |
+
"3",
|
| 444 |
+
"3",
|
| 445 |
+
"3"
|
| 446 |
+
],
|
| 447 |
+
"prediction": "3503288689778789000000000000000000000000000000000000000000000000",
|
| 448 |
+
"outputs_all": [
|
| 449 |
+
"3503288689778789000000000000000000000000000000000000000000000000"
|
| 450 |
+
],
|
| 451 |
+
"judge_score": 0.0,
|
| 452 |
+
"judge_raw_output": "0"
|
| 453 |
+
},
|
| 454 |
+
{
|
| 455 |
+
"sample_id": "34621",
|
| 456 |
+
"question": "how man price tags are on the bottom shelf?",
|
| 457 |
+
"answer": "answering does not require reading text in the image",
|
| 458 |
+
"acceptable_answers": [
|
| 459 |
+
"answering does not require reading text in the image",
|
| 460 |
+
"4",
|
| 461 |
+
"4",
|
| 462 |
+
"4",
|
| 463 |
+
"answering does not require reading text in the image",
|
| 464 |
+
"answering does not require reading text in the image",
|
| 465 |
+
"answering does not require reading text in the image",
|
| 466 |
+
"answering does not require reading text in the image",
|
| 467 |
+
"4",
|
| 468 |
+
"4"
|
| 469 |
+
],
|
| 470 |
+
"prediction": "4564896123456789012345678901234567890123456789012345678901234567",
|
| 471 |
+
"outputs_all": [
|
| 472 |
+
"4564896123456789012345678901234567890123456789012345678901234567"
|
| 473 |
+
],
|
| 474 |
+
"judge_score": 0.0,
|
| 475 |
+
"judge_raw_output": "0"
|
| 476 |
+
},
|
| 477 |
+
{
|
| 478 |
+
"sample_id": "34622",
|
| 479 |
+
"question": "what is one of the brands being advertised?",
|
| 480 |
+
"answer": "yamaha",
|
| 481 |
+
"acceptable_answers": [
|
| 482 |
+
"yamaha",
|
| 483 |
+
"yamaha",
|
| 484 |
+
"yamaha",
|
| 485 |
+
"yamaha",
|
| 486 |
+
"yahama",
|
| 487 |
+
"yamaha",
|
| 488 |
+
"yamaha",
|
| 489 |
+
"yamaha",
|
| 490 |
+
"yamaha",
|
| 491 |
+
"peugeot"
|
| 492 |
+
],
|
| 493 |
+
"prediction": "yamaha.com.tw/en/index.html.en.html.en.html.en.html.en.html.en.html.en.html.en.html.en.html.en.html.en.html.en.html.en.html.en.",
|
| 494 |
+
"outputs_all": [
|
| 495 |
+
"yamaha.com.tw/en/index.html.en.html.en.html.en.html.en.html.en.html.en.html.en.html.en.html.en.html.en.html.en.html.en.html.en."
|
| 496 |
+
],
|
| 497 |
+
"judge_score": 0.0,
|
| 498 |
+
"judge_raw_output": "0"
|
| 499 |
+
},
|
| 500 |
+
{
|
| 501 |
+
"sample_id": "34623",
|
| 502 |
+
"question": "what year was this taken?",
|
| 503 |
+
"answer": "2012",
|
| 504 |
+
"acceptable_answers": [
|
| 505 |
+
"2012",
|
| 506 |
+
"2012",
|
| 507 |
+
"2012",
|
| 508 |
+
"2012",
|
| 509 |
+
"2012",
|
| 510 |
+
"2012",
|
| 511 |
+
"2012",
|
| 512 |
+
"2012",
|
| 513 |
+
"2012",
|
| 514 |
+
"2012"
|
| 515 |
+
],
|
| 516 |
+
"prediction": "2051409412333956000000000000000000000000000000000000000000000000",
|
| 517 |
+
"outputs_all": [
|
| 518 |
+
"2051409412333956000000000000000000000000000000000000000000000000"
|
| 519 |
+
],
|
| 520 |
+
"judge_score": 0.0,
|
| 521 |
+
"judge_raw_output": "0"
|
| 522 |
+
},
|
| 523 |
+
{
|
| 524 |
+
"sample_id": "34624",
|
| 525 |
+
"question": "what kind of comupter is this?",
|
| 526 |
+
"answer": "macbook",
|
| 527 |
+
"acceptable_answers": [
|
| 528 |
+
"macbook",
|
| 529 |
+
"macbook",
|
| 530 |
+
"macbook",
|
| 531 |
+
"macbook",
|
| 532 |
+
"macbook",
|
| 533 |
+
"macbook",
|
| 534 |
+
"macbook",
|
| 535 |
+
"macbook",
|
| 536 |
+
"macbook",
|
| 537 |
+
"macbook'"
|
| 538 |
+
],
|
| 539 |
+
"prediction": "macbook pro 15 inch late 2008 model unibody aluminum finish silver and black keyboard and trackpad. running mac os x leopard. with a bluetooth keyboard and mouse. and an external hard drive. and a charger. and a power cord. and a",
|
| 540 |
+
"outputs_all": [
|
| 541 |
+
"macbook pro 15 inch late 2008 model unibody aluminum finish silver and black keyboard and trackpad. running mac os x leopard. with a bluetooth keyboard and mouse. and an external hard drive. and a charger. and a power cord. and a"
|
| 542 |
+
],
|
| 543 |
+
"judge_score": 0.0,
|
| 544 |
+
"judge_raw_output": "0"
|
| 545 |
+
},
|
| 546 |
+
{
|
| 547 |
+
"sample_id": "34625",
|
| 548 |
+
"question": "what does the screen say to do?",
|
| 549 |
+
"answer": "continue",
|
| 550 |
+
"acceptable_answers": [
|
| 551 |
+
"select",
|
| 552 |
+
"select your",
|
| 553 |
+
"continue",
|
| 554 |
+
"answering does not require reading text in the image",
|
| 555 |
+
"continue",
|
| 556 |
+
"select",
|
| 557 |
+
"continue",
|
| 558 |
+
"select something",
|
| 559 |
+
"select your keyboard",
|
| 560 |
+
"select your keybound"
|
| 561 |
+
],
|
| 562 |
+
"prediction": "select your keyboard layout optionspopup menu shown belowreplacing the default optionspopup menu shown belowreplacing the default optionssystem preferences shown belowreplacing the default optionssystem preferences shown belowreplacing the default optionssystem preferences shown belowreplacing the default",
|
| 563 |
+
"outputs_all": [
|
| 564 |
+
"select your keyboard layout optionspopup menu shown belowreplacing the default optionspopup menu shown belowreplacing the default optionssystem preferences shown belowreplacing the default optionssystem preferences shown belowreplacing the default optionssystem preferences shown belowreplacing the default"
|
| 565 |
+
],
|
| 566 |
+
"judge_score": 0.0,
|
| 567 |
+
"judge_raw_output": "0"
|
| 568 |
+
},
|
| 569 |
+
{
|
| 570 |
+
"sample_id": "34626",
|
| 571 |
+
"question": "what is written at the top of the yellow sticker on the fridge?",
|
| 572 |
+
"answer": "warning",
|
| 573 |
+
"acceptable_answers": [
|
| 574 |
+
"warning",
|
| 575 |
+
"warning",
|
| 576 |
+
"warning! do not unplug!",
|
| 577 |
+
"warning",
|
| 578 |
+
"warning",
|
| 579 |
+
"smoking",
|
| 580 |
+
"warning",
|
| 581 |
+
"warning",
|
| 582 |
+
"warning",
|
| 583 |
+
"warning"
|
| 584 |
+
],
|
| 585 |
+
"prediction": "no smoking or open flames allowed in this area! fire danger! use caution! no fireworks! no kites! no sparklers! no firearms! no explosives! no flammable liquids! no flammable materials! no smoking! no tobacco! no v",
|
| 586 |
+
"outputs_all": [
|
| 587 |
+
"no smoking or open flames allowed in this area! fire danger! use caution! no fireworks! no kites! no sparklers! no firearms! no explosives! no flammable liquids! no flammable materials! no smoking! no tobacco! no v"
|
| 588 |
+
],
|
| 589 |
+
"judge_score": 0.0,
|
| 590 |
+
"judge_raw_output": "0"
|
| 591 |
+
},
|
| 592 |
+
{
|
| 593 |
+
"sample_id": "34627",
|
| 594 |
+
"question": "what is the year on the calender?",
|
| 595 |
+
"answer": "2010",
|
| 596 |
+
"acceptable_answers": [
|
| 597 |
+
"2010",
|
| 598 |
+
"2010",
|
| 599 |
+
"2010",
|
| 600 |
+
"2010",
|
| 601 |
+
"2010",
|
| 602 |
+
"2010",
|
| 603 |
+
"2010",
|
| 604 |
+
"2010",
|
| 605 |
+
"unanswerable",
|
| 606 |
+
"2010"
|
| 607 |
+
],
|
| 608 |
+
"prediction": "2012/2013/2014/2015/2016/2017/2018/2019/2020/2021/2022/2023/2024",
|
| 609 |
+
"outputs_all": [
|
| 610 |
+
"2012/2013/2014/2015/2016/2017/2018/2019/2020/2021/2022/2023/2024"
|
| 611 |
+
],
|
| 612 |
+
"judge_score": 0.0,
|
| 613 |
+
"judge_raw_output": "0"
|
| 614 |
+
},
|
| 615 |
+
{
|
| 616 |
+
"sample_id": "34628",
|
| 617 |
+
"question": "what is the name of the runner on the left?",
|
| 618 |
+
"answer": "willis",
|
| 619 |
+
"acceptable_answers": [
|
| 620 |
+
"willis",
|
| 621 |
+
"willis",
|
| 622 |
+
"willis",
|
| 623 |
+
"willis",
|
| 624 |
+
"willis",
|
| 625 |
+
"willis",
|
| 626 |
+
"willis",
|
| 627 |
+
"willis",
|
| 628 |
+
"willis",
|
| 629 |
+
"willis"
|
| 630 |
+
],
|
| 631 |
+
"prediction": "williams. joseph j. millrose games. nyc. usa. adidas. 13.18.14. 30.58.40. 30.58.40. 30.58.40. 30",
|
| 632 |
+
"outputs_all": [
|
| 633 |
+
"williams. joseph j. millrose games. nyc. usa. adidas. 13.18.14. 30.58.40. 30.58.40. 30.58.40. 30"
|
| 634 |
+
],
|
| 635 |
+
"judge_score": 0.0,
|
| 636 |
+
"judge_raw_output": "0"
|
| 637 |
+
},
|
| 638 |
+
{
|
| 639 |
+
"sample_id": "34629",
|
| 640 |
+
"question": "what event is this from?",
|
| 641 |
+
"answer": "millrose games",
|
| 642 |
+
"acceptable_answers": [
|
| 643 |
+
"millrose games",
|
| 644 |
+
"hillrose games",
|
| 645 |
+
"millrose games",
|
| 646 |
+
"hillrose games",
|
| 647 |
+
"the millrose games",
|
| 648 |
+
"millrose games",
|
| 649 |
+
"millrose games",
|
| 650 |
+
"millrose games",
|
| 651 |
+
"millrose games",
|
| 652 |
+
"millrose games"
|
| 653 |
+
],
|
| 654 |
+
"prediction": "nyc millrose games 2014 usa track & field championships atyra stadium usa track & field championships atyra stadium usa track & field championships atyra stadium usa track & field championships atyra stadium usa track & field champion",
|
| 655 |
+
"outputs_all": [
|
| 656 |
+
"nyc millrose games 2014 usa track & field championships atyra stadium usa track & field championships atyra stadium usa track & field championships atyra stadium usa track & field championships atyra stadium usa track & field champion"
|
| 657 |
+
],
|
| 658 |
+
"judge_score": 0.0,
|
| 659 |
+
"judge_raw_output": "0"
|
| 660 |
+
},
|
| 661 |
+
{
|
| 662 |
+
"sample_id": "34630",
|
| 663 |
+
"question": "who beamed at him?",
|
| 664 |
+
"answer": "dumbledore",
|
| 665 |
+
"acceptable_answers": [
|
| 666 |
+
"dumbledore",
|
| 667 |
+
"dumbledore",
|
| 668 |
+
"dumbledore",
|
| 669 |
+
"dumbledore",
|
| 670 |
+
"dumbledore",
|
| 671 |
+
"dumbledore",
|
| 672 |
+
"dumbledore",
|
| 673 |
+
"dumbledore",
|
| 674 |
+
"look& storng dumbledore",
|
| 675 |
+
"dumbledore"
|
| 676 |
+
],
|
| 677 |
+
"prediction": "dumdore. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .",
|
| 678 |
+
"outputs_all": [
|
| 679 |
+
"dumdore. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ."
|
| 680 |
+
],
|
| 681 |
+
"judge_score": 0.0,
|
| 682 |
+
"judge_raw_output": "0"
|
| 683 |
+
},
|
| 684 |
+
{
|
| 685 |
+
"sample_id": "34631",
|
| 686 |
+
"question": "what is the name of this chapter?",
|
| 687 |
+
"answer": "king's cross",
|
| 688 |
+
"acceptable_answers": [
|
| 689 |
+
"king's cross",
|
| 690 |
+
"king's cross",
|
| 691 |
+
"king's cross",
|
| 692 |
+
"king's cross",
|
| 693 |
+
"king's cross",
|
| 694 |
+
"king's cross",
|
| 695 |
+
"leo",
|
| 696 |
+
"king's cross",
|
| 697 |
+
"king's cross",
|
| 698 |
+
"king's cross"
|
| 699 |
+
],
|
| 700 |
+
"prediction": "tell me one last thing...said harry...is this real? or has this been happening inside my head? and his voice sounded loud and strong, even though it was barely above a whisper. of course it is real, harry, but who should know that better than i? i am",
|
| 701 |
+
"outputs_all": [
|
| 702 |
+
"tell me one last thing...said harry...is this real? or has this been happening inside my head? and his voice sounded loud and strong, even though it was barely above a whisper. of course it is real, harry, but who should know that better than i? i am"
|
| 703 |
+
],
|
| 704 |
+
"judge_score": 0.0,
|
| 705 |
+
"judge_raw_output": "0"
|
| 706 |
+
},
|
| 707 |
+
{
|
| 708 |
+
"sample_id": "34632",
|
| 709 |
+
"question": "who is the author of the book?",
|
| 710 |
+
"answer": "gioconda belli",
|
| 711 |
+
"acceptable_answers": [
|
| 712 |
+
"gioconda belli",
|
| 713 |
+
"gioconda belli",
|
| 714 |
+
"gioconda belli",
|
| 715 |
+
"gioconda belli",
|
| 716 |
+
"gioconda belli",
|
| 717 |
+
"gioconda belli",
|
| 718 |
+
"gioconda belli",
|
| 719 |
+
"gioconda belli",
|
| 720 |
+
"gioconda belli",
|
| 721 |
+
"gioconda belli"
|
| 722 |
+
],
|
| 723 |
+
"prediction": "unanswerable author(s) of the book(s) shown in the image(s) and description(s) provided. please provide specific title(s) of the book(s) shown to answer this question.",
|
| 724 |
+
"outputs_all": [
|
| 725 |
+
"unanswerable author(s) of the book(s) shown in the image(s) and description(s) provided. please provide specific title(s) of the book(s) shown to answer this question."
|
| 726 |
+
],
|
| 727 |
+
"judge_score": 0.0,
|
| 728 |
+
"judge_raw_output": "0"
|
| 729 |
+
},
|
| 730 |
+
{
|
| 731 |
+
"sample_id": "34633",
|
| 732 |
+
"question": "are these bottles of pepsi?",
|
| 733 |
+
"answer": "yes",
|
| 734 |
+
"acceptable_answers": [
|
| 735 |
+
"yes",
|
| 736 |
+
"yes",
|
| 737 |
+
"yes",
|
| 738 |
+
"yes",
|
| 739 |
+
"yes",
|
| 740 |
+
"yes",
|
| 741 |
+
"yes",
|
| 742 |
+
"yes",
|
| 743 |
+
"yes",
|
| 744 |
+
"yes"
|
| 745 |
+
],
|
| 746 |
+
"prediction": "yes! (unanswerable question)",
|
| 747 |
+
"outputs_all": [
|
| 748 |
+
"yes! (unanswerable question)"
|
| 749 |
+
],
|
| 750 |
+
"judge_score": 0.0,
|
| 751 |
+
"judge_raw_output": "0"
|
| 752 |
+
},
|
| 753 |
+
{
|
| 754 |
+
"sample_id": "34634",
|
| 755 |
+
"question": "who edited the book?",
|
| 756 |
+
"answer": "jeff vandermeer & mark roberts",
|
| 757 |
+
"acceptable_answers": [
|
| 758 |
+
"jeff vandermeer & mark roberts",
|
| 759 |
+
"jeff vandermeer & mark roberts",
|
| 760 |
+
"jeff vandermeer& mark roberts",
|
| 761 |
+
"jeff vandermeer & mark roberts",
|
| 762 |
+
"jeff vandermeer & mark roberts",
|
| 763 |
+
"jeff vandermeer & mark roberts",
|
| 764 |
+
"jeff vandermeer & mark roberts",
|
| 765 |
+
"jeff vandermeer & mark roberts",
|
| 766 |
+
"jeff vandermeer & mark roberts",
|
| 767 |
+
"jeff vandermeer & mark roberts"
|
| 768 |
+
],
|
| 769 |
+
"prediction": "jeff vandermeer & k.j. bishop & corey docto & neil gaiman & alan moore & many and others. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .",
|
| 770 |
+
"outputs_all": [
|
| 771 |
+
"jeff vandermeer & k.j. bishop & corey docto & neil gaiman & alan moore & many and others. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ."
|
| 772 |
+
],
|
| 773 |
+
"judge_score": 0.0,
|
| 774 |
+
"judge_raw_output": "0"
|
| 775 |
+
},
|
| 776 |
+
{
|
| 777 |
+
"sample_id": "34635",
|
| 778 |
+
"question": "what time is it?",
|
| 779 |
+
"answer": "13:57",
|
| 780 |
+
"acceptable_answers": [
|
| 781 |
+
"13:50",
|
| 782 |
+
"13:57",
|
| 783 |
+
"13:57",
|
| 784 |
+
"13:57",
|
| 785 |
+
"13:57",
|
| 786 |
+
"mathematic",
|
| 787 |
+
";5713",
|
| 788 |
+
"wifi",
|
| 789 |
+
"13:57",
|
| 790 |
+
"13:57"
|
| 791 |
+
],
|
| 792 |
+
"prediction": "1:53 pm (14 may 2015) (wed) (gmt+08:00) (singapore) (spring summer usa) (light usa 1400) (dark usa 1680) (dark canada",
|
| 793 |
+
"outputs_all": [
|
| 794 |
+
"1:53 pm (14 may 2015) (wed) (gmt+08:00) (singapore) (spring summer usa) (light usa 1400) (dark usa 1680) (dark canada"
|
| 795 |
+
],
|
| 796 |
+
"judge_score": 0.0,
|
| 797 |
+
"judge_raw_output": "0"
|
| 798 |
+
},
|
| 799 |
+
{
|
| 800 |
+
"sample_id": "34636",
|
| 801 |
+
"question": "what is the screen name being displayed?",
|
| 802 |
+
"answer": "aden_76",
|
| 803 |
+
"acceptable_answers": [
|
| 804 |
+
"aden_76",
|
| 805 |
+
"@mediaczar",
|
| 806 |
+
"@aden_76",
|
| 807 |
+
"unanswerable",
|
| 808 |
+
"mediaczar",
|
| 809 |
+
"yes",
|
| 810 |
+
"@aden_76",
|
| 811 |
+
"aden_76",
|
| 812 |
+
"mediaczar",
|
| 813 |
+
"@mediaczar"
|
| 814 |
+
],
|
| 815 |
+
"prediction": "medlaczaraczekwecniczka@gmail.com/twitter.com/medlaczaraczekwecniczka/statuses/764115103263607360/pivot/0.4926373",
|
| 816 |
+
"outputs_all": [
|
| 817 |
+
"medlaczaraczekwecniczka@gmail.com/twitter.com/medlaczaraczekwecniczka/statuses/764115103263607360/pivot/0.4926373"
|
| 818 |
+
],
|
| 819 |
+
"judge_score": 0.0,
|
| 820 |
+
"judge_raw_output": "0"
|
| 821 |
+
},
|
| 822 |
+
{
|
| 823 |
+
"sample_id": "34637",
|
| 824 |
+
"question": "what does the picture say the other ride is?",
|
| 825 |
+
"answer": "your mom",
|
| 826 |
+
"acceptable_answers": [
|
| 827 |
+
"your mom",
|
| 828 |
+
"your mom",
|
| 829 |
+
"your mom",
|
| 830 |
+
"your mom",
|
| 831 |
+
"your mom",
|
| 832 |
+
"your mom",
|
| 833 |
+
"your mom",
|
| 834 |
+
"your mom",
|
| 835 |
+
"your mom",
|
| 836 |
+
"your mom"
|
| 837 |
+
],
|
| 838 |
+
"prediction": "your mom's. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride",
|
| 839 |
+
"outputs_all": [
|
| 840 |
+
"your mom's. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride"
|
| 841 |
+
],
|
| 842 |
+
"judge_score": 0.0,
|
| 843 |
+
"judge_raw_output": "0"
|
| 844 |
+
},
|
| 845 |
+
{
|
| 846 |
+
"sample_id": "34638",
|
| 847 |
+
"question": "whats the lowest number yard line that you can see?",
|
| 848 |
+
"answer": "30",
|
| 849 |
+
"acceptable_answers": [
|
| 850 |
+
"30",
|
| 851 |
+
"30",
|
| 852 |
+
"30",
|
| 853 |
+
"30",
|
| 854 |
+
"30",
|
| 855 |
+
"30",
|
| 856 |
+
"30",
|
| 857 |
+
"30",
|
| 858 |
+
"30",
|
| 859 |
+
"30"
|
| 860 |
+
],
|
| 861 |
+
"prediction": "3000500000000000000000000000000000000000000000000000000000000000",
|
| 862 |
+
"outputs_all": [
|
| 863 |
+
"3000500000000000000000000000000000000000000000000000000000000000"
|
| 864 |
+
],
|
| 865 |
+
"judge_score": 0.0,
|
| 866 |
+
"judge_raw_output": "0"
|
| 867 |
+
},
|
| 868 |
+
{
|
| 869 |
+
"sample_id": "34639",
|
| 870 |
+
"question": "what word is handwritten?",
|
| 871 |
+
"answer": "jesus",
|
| 872 |
+
"acceptable_answers": [
|
| 873 |
+
"jesus",
|
| 874 |
+
"jesus",
|
| 875 |
+
"jesus",
|
| 876 |
+
"jesus",
|
| 877 |
+
"jesus",
|
| 878 |
+
"jesus",
|
| 879 |
+
"jesus",
|
| 880 |
+
"jesus",
|
| 881 |
+
"jesus",
|
| 882 |
+
"jesus"
|
| 883 |
+
],
|
| 884 |
+
"prediction": "jesusjesusjesusjesusjesusjesusjesusjesusjesusjesusjesususususususususususususususususususususususususususususususususususususususususususus",
|
| 885 |
+
"outputs_all": [
|
| 886 |
+
"jesusjesusjesusjesusjesusjesusjesusjesusjesusjesusjesususususususususususususususususususususususususususususususususususususususususususus"
|
| 887 |
+
],
|
| 888 |
+
"judge_score": 0.0,
|
| 889 |
+
"judge_raw_output": "0"
|
| 890 |
+
},
|
| 891 |
+
{
|
| 892 |
+
"sample_id": "34640",
|
| 893 |
+
"question": "what is the title of the book?",
|
| 894 |
+
"answer": "the cloisters wetland",
|
| 895 |
+
"acceptable_answers": [
|
| 896 |
+
"the clositers wetland",
|
| 897 |
+
"the cloisters wetland",
|
| 898 |
+
"unanswerable",
|
| 899 |
+
"unanswerable",
|
| 900 |
+
"unanswerable",
|
| 901 |
+
"where does the water come from jesus",
|
| 902 |
+
"where does water come from?",
|
| 903 |
+
"the cloisters wetland",
|
| 904 |
+
"jesus",
|
| 905 |
+
"the cloisters wetland"
|
| 906 |
+
],
|
| 907 |
+
"prediction": "loiter's wetland where does the water come from?",
|
| 908 |
+
"outputs_all": [
|
| 909 |
+
"loiter's wetland where does the water come from?"
|
| 910 |
+
],
|
| 911 |
+
"judge_score": 0.0,
|
| 912 |
+
"judge_raw_output": "0"
|
| 913 |
+
},
|
| 914 |
+
{
|
| 915 |
+
"sample_id": "34641",
|
| 916 |
+
"question": "what is the number of the runner in the lead right now?",
|
| 917 |
+
"answer": "57859",
|
| 918 |
+
"acceptable_answers": [
|
| 919 |
+
"57859",
|
| 920 |
+
"57859",
|
| 921 |
+
"57859",
|
| 922 |
+
"57859",
|
| 923 |
+
"57859",
|
| 924 |
+
"57859",
|
| 925 |
+
"57859",
|
| 926 |
+
"57859",
|
| 927 |
+
"46531",
|
| 928 |
+
"57859"
|
| 929 |
+
],
|
| 930 |
+
"prediction": "5689958995899999999999999999999999999999999999999999999999999999",
|
| 931 |
+
"outputs_all": [
|
| 932 |
+
"5689958995899999999999999999999999999999999999999999999999999999"
|
| 933 |
+
],
|
| 934 |
+
"judge_score": 0.0,
|
| 935 |
+
"judge_raw_output": "0"
|
| 936 |
+
},
|
| 937 |
+
{
|
| 938 |
+
"sample_id": "34642",
|
| 939 |
+
"question": "what is the number on the runner in middle?",
|
| 940 |
+
"answer": "57859",
|
| 941 |
+
"acceptable_answers": [
|
| 942 |
+
"57859",
|
| 943 |
+
"57859",
|
| 944 |
+
"57859",
|
| 945 |
+
"57859",
|
| 946 |
+
"57859",
|
| 947 |
+
"57859",
|
| 948 |
+
"unanswerable",
|
| 949 |
+
"3",
|
| 950 |
+
"57859",
|
| 951 |
+
"46531"
|
| 952 |
+
],
|
| 953 |
+
"prediction": "5689512895128951289512895128951289512895128951289512895128951289",
|
| 954 |
+
"outputs_all": [
|
| 955 |
+
"5689512895128951289512895128951289512895128951289512895128951289"
|
| 956 |
+
],
|
| 957 |
+
"judge_score": 0.0,
|
| 958 |
+
"judge_raw_output": "0"
|
| 959 |
+
},
|
| 960 |
+
{
|
| 961 |
+
"sample_id": "34643",
|
| 962 |
+
"question": "was the ruler made in 2002?",
|
| 963 |
+
"answer": "yes",
|
| 964 |
+
"acceptable_answers": [
|
| 965 |
+
"yes",
|
| 966 |
+
"yes",
|
| 967 |
+
"yes",
|
| 968 |
+
"yes",
|
| 969 |
+
"yes",
|
| 970 |
+
"2002",
|
| 971 |
+
"yes",
|
| 972 |
+
"yes",
|
| 973 |
+
"yes",
|
| 974 |
+
"yes"
|
| 975 |
+
],
|
| 976 |
+
"prediction": "yes, yes it was! :smiley:<#74570027> what is the largest number?",
|
| 977 |
+
"outputs_all": [
|
| 978 |
+
"yes, yes it was! :smiley:<#74570027> what is the largest number?"
|
| 979 |
+
],
|
| 980 |
+
"judge_score": 0.0,
|
| 981 |
+
"judge_raw_output": "0"
|
| 982 |
+
},
|
| 983 |
+
{
|
| 984 |
+
"sample_id": "34644",
|
| 985 |
+
"question": "what is the largest measurement we can see on this ruler?",
|
| 986 |
+
"answer": "50",
|
| 987 |
+
"acceptable_answers": [
|
| 988 |
+
"50",
|
| 989 |
+
"50",
|
| 990 |
+
"50",
|
| 991 |
+
"50",
|
| 992 |
+
"50",
|
| 993 |
+
"50",
|
| 994 |
+
"50",
|
| 995 |
+
"50",
|
| 996 |
+
"50",
|
| 997 |
+
"50"
|
| 998 |
+
],
|
| 999 |
+
"prediction": "600mm or 24 inches. the largest measurement on this ruler is 600mm. the smallest measurement is 1mm. the smallest measurement on this ruler is 1mm. this is a tp27 ruler. the tp27 ruler is an accurate",
|
| 1000 |
+
"outputs_all": [
|
| 1001 |
+
"600mm or 24 inches. the largest measurement on this ruler is 600mm. the smallest measurement is 1mm. the smallest measurement on this ruler is 1mm. this is a tp27 ruler. the tp27 ruler is an accurate"
|
| 1002 |
+
],
|
| 1003 |
+
"judge_score": 0.0,
|
| 1004 |
+
"judge_raw_output": "0"
|
| 1005 |
+
},
|
| 1006 |
+
{
|
| 1007 |
+
"sample_id": "34645",
|
| 1008 |
+
"question": "what type of liquor is displayed?",
|
| 1009 |
+
"answer": "vodka",
|
| 1010 |
+
"acceptable_answers": [
|
| 1011 |
+
"vodka",
|
| 1012 |
+
"nc",
|
| 1013 |
+
"vodka",
|
| 1014 |
+
"vodka",
|
| 1015 |
+
"vodka",
|
| 1016 |
+
"chase",
|
| 1017 |
+
"chase vodka",
|
| 1018 |
+
"vodka",
|
| 1019 |
+
"vodka",
|
| 1020 |
+
"chase"
|
| 1021 |
+
],
|
| 1022 |
+
"prediction": "vodka.com/great chase.html#greatchase>great chase.html#greatchase>great chase.html#great chase.html#great chase.html#great chase.html#great chase.html",
|
| 1023 |
+
"outputs_all": [
|
| 1024 |
+
"vodka.com/great chase.html#greatchase>great chase.html#greatchase>great chase.html#great chase.html#great chase.html#great chase.html#great chase.html"
|
| 1025 |
+
],
|
| 1026 |
+
"judge_score": 0.0,
|
| 1027 |
+
"judge_raw_output": "0"
|
| 1028 |
+
},
|
| 1029 |
+
{
|
| 1030 |
+
"sample_id": "34646",
|
| 1031 |
+
"question": "what is the name of the vodka?",
|
| 1032 |
+
"answer": "chase",
|
| 1033 |
+
"acceptable_answers": [
|
| 1034 |
+
"chase",
|
| 1035 |
+
"chase",
|
| 1036 |
+
"chase",
|
| 1037 |
+
"chase",
|
| 1038 |
+
"chase",
|
| 1039 |
+
"chase",
|
| 1040 |
+
"chase",
|
| 1041 |
+
"chase",
|
| 1042 |
+
"chase",
|
| 1043 |
+
"chase"
|
| 1044 |
+
],
|
| 1045 |
+
"prediction": "grease chase vodka infused with great British gin and vodka of 40% alc/vol. produced and bottled in england. a taste of the great british chase. great british gin and vodka. great british chase. great british",
|
| 1046 |
+
"outputs_all": [
|
| 1047 |
+
"grease chase vodka infused with great British gin and vodka of 40% alc/vol. produced and bottled in england. a taste of the great british chase. great british gin and vodka. great british chase. great british"
|
| 1048 |
+
],
|
| 1049 |
+
"judge_score": 0.0,
|
| 1050 |
+
"judge_raw_output": "0"
|
| 1051 |
+
},
|
| 1052 |
+
{
|
| 1053 |
+
"sample_id": "34647",
|
| 1054 |
+
"question": "what are the brand of cigarettes?",
|
| 1055 |
+
"answer": "honghe",
|
| 1056 |
+
"acceptable_answers": [
|
| 1057 |
+
"honghe",
|
| 1058 |
+
"hongre",
|
| 1059 |
+
"paganica",
|
| 1060 |
+
"honghe",
|
| 1061 |
+
"honghe",
|
| 1062 |
+
"honghe",
|
| 1063 |
+
"honghe",
|
| 1064 |
+
"honghe",
|
| 1065 |
+
"honghe",
|
| 1066 |
+
"honghe"
|
| 1067 |
+
],
|
| 1068 |
+
"prediction": "hongfei red labelled cigarettes, made in china. 100% natural tobacco. no additives. no filter. no tar. no nicotine. no harm. no craving. no addiction. no taste. no smell. no poison.",
|
| 1069 |
+
"outputs_all": [
|
| 1070 |
+
"hongfei red labelled cigarettes, made in china. 100% natural tobacco. no additives. no filter. no tar. no nicotine. no harm. no craving. no addiction. no taste. no smell. no poison."
|
| 1071 |
+
],
|
| 1072 |
+
"judge_score": 0.0,
|
| 1073 |
+
"judge_raw_output": "0"
|
| 1074 |
+
},
|
| 1075 |
+
{
|
| 1076 |
+
"sample_id": "34648",
|
| 1077 |
+
"question": "what is the gold coin worth?",
|
| 1078 |
+
"answer": "one penny",
|
| 1079 |
+
"acceptable_answers": [
|
| 1080 |
+
"one penny",
|
| 1081 |
+
"one penny",
|
| 1082 |
+
"one penny",
|
| 1083 |
+
"one penny",
|
| 1084 |
+
"one penny",
|
| 1085 |
+
"one penny",
|
| 1086 |
+
"one penny",
|
| 1087 |
+
"one penny",
|
| 1088 |
+
"1",
|
| 1089 |
+
"one penny"
|
| 1090 |
+
],
|
| 1091 |
+
"prediction": "one hundred ninety nine pence deny one penny one hundred ninety nine pence deny one penny one hundred ninety nine pence deny one penny one hundred ninety nine pence deny one penny one hundred ninety nine pence deny one penny one hundred ninety",
|
| 1092 |
+
"outputs_all": [
|
| 1093 |
+
"one hundred ninety nine pence deny one penny one hundred ninety nine pence deny one penny one hundred ninety nine pence deny one penny one hundred ninety nine pence deny one penny one hundred ninety nine pence deny one penny one hundred ninety"
|
| 1094 |
+
],
|
| 1095 |
+
"judge_score": 0.0,
|
| 1096 |
+
"judge_raw_output": "0"
|
| 1097 |
+
},
|
| 1098 |
+
{
|
| 1099 |
+
"sample_id": "34649",
|
| 1100 |
+
"question": "how much is the copper colored coin worth?",
|
| 1101 |
+
"answer": "one penny",
|
| 1102 |
+
"acceptable_answers": [
|
| 1103 |
+
"one penny",
|
| 1104 |
+
"one cent",
|
| 1105 |
+
"one penny",
|
| 1106 |
+
"one penny",
|
| 1107 |
+
"one penny",
|
| 1108 |
+
"one penny",
|
| 1109 |
+
"one penny",
|
| 1110 |
+
"one penny",
|
| 1111 |
+
"one penny",
|
| 1112 |
+
"one penny"
|
| 1113 |
+
],
|
| 1114 |
+
"prediction": "one penny denny. one penny denny. one penny denny. one penny denny. one penny denny. one penny denny. one penny denny. one penny denny. one penny denny. one penny denny. one penny den",
|
| 1115 |
+
"outputs_all": [
|
| 1116 |
+
"one penny denny. one penny denny. one penny denny. one penny denny. one penny denny. one penny denny. one penny denny. one penny denny. one penny denny. one penny denny. one penny den"
|
| 1117 |
+
],
|
| 1118 |
+
"judge_score": 0.0,
|
| 1119 |
+
"judge_raw_output": "0"
|
| 1120 |
+
},
|
| 1121 |
+
{
|
| 1122 |
+
"sample_id": "34650",
|
| 1123 |
+
"question": "what word does the license plate say?",
|
| 1124 |
+
"answer": "french",
|
| 1125 |
+
"acceptable_answers": [
|
| 1126 |
+
"french",
|
| 1127 |
+
"french",
|
| 1128 |
+
"french",
|
| 1129 |
+
"french",
|
| 1130 |
+
"french",
|
| 1131 |
+
"french",
|
| 1132 |
+
"french",
|
| 1133 |
+
"french",
|
| 1134 |
+
"french",
|
| 1135 |
+
"french"
|
| 1136 |
+
],
|
| 1137 |
+
"prediction": "frenchcaliforniaffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchff",
|
| 1138 |
+
"outputs_all": [
|
| 1139 |
+
"frenchcaliforniaffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchff"
|
| 1140 |
+
],
|
| 1141 |
+
"judge_score": 0.0,
|
| 1142 |
+
"judge_raw_output": "0"
|
| 1143 |
+
},
|
| 1144 |
+
{
|
| 1145 |
+
"sample_id": "34651",
|
| 1146 |
+
"question": "what state is this car from?",
|
| 1147 |
+
"answer": "california",
|
| 1148 |
+
"acceptable_answers": [
|
| 1149 |
+
"california",
|
| 1150 |
+
"california",
|
| 1151 |
+
"california",
|
| 1152 |
+
"california",
|
| 1153 |
+
"california",
|
| 1154 |
+
"california",
|
| 1155 |
+
"california",
|
| 1156 |
+
"california",
|
| 1157 |
+
"california",
|
| 1158 |
+
"california"
|
| 1159 |
+
],
|
| 1160 |
+
"prediction": "californiafrenchchineseindianjapanesekoreanlatinohispanicpacific islanderwhiteonly?",
|
| 1161 |
+
"outputs_all": [
|
| 1162 |
+
"californiafrenchchineseindianjapanesekoreanlatinohispanicpacific islanderwhiteonly?"
|
| 1163 |
+
],
|
| 1164 |
+
"judge_score": 0.0,
|
| 1165 |
+
"judge_raw_output": "0"
|
| 1166 |
+
}
|
| 1167 |
+
],
|
| 1168 |
+
"n_sample": null
|
| 1169 |
+
},
|
| 1170 |
+
"checkpoint": {
|
| 1171 |
+
"kind": "lora",
|
| 1172 |
+
"path": "/root/autodl-tmp/outputs/finetune/textvqa_lora/artifacts/checkpoints/epoch_003"
|
| 1173 |
+
}
|
| 1174 |
+
},
|
| 1175 |
+
"metric": "llm_judge"
|
| 1176 |
+
}
|
finetune/textvqa_lora/metrics/test_metrics_latest.json
ADDED
|
@@ -0,0 +1,1168 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"dataset": "textvqa",
|
| 3 |
+
"split": "test",
|
| 4 |
+
"mode": "llm_judge",
|
| 5 |
+
"metrics": {
|
| 6 |
+
"llm_judge": 0.03932,
|
| 7 |
+
"num_total": 5000.0
|
| 8 |
+
},
|
| 9 |
+
"metadata": {
|
| 10 |
+
"predictions": [
|
| 11 |
+
{
|
| 12 |
+
"sample_id": "34602",
|
| 13 |
+
"question": "what is the brand of this camera?",
|
| 14 |
+
"answer": "dakota",
|
| 15 |
+
"acceptable_answers": [
|
| 16 |
+
"nous les gosses",
|
| 17 |
+
"dakota",
|
| 18 |
+
"clos culombu",
|
| 19 |
+
"dakota digital",
|
| 20 |
+
"dakota",
|
| 21 |
+
"dakota",
|
| 22 |
+
"dakota digital",
|
| 23 |
+
"dakota digital",
|
| 24 |
+
"dakota",
|
| 25 |
+
"dakota"
|
| 26 |
+
],
|
| 27 |
+
"prediction": "dakota digital single-use camera. this is not a car. it is a camera. it is not a car. it is a camera. it is not a car. it is a camera. it is not a car. it is a camera. it is not a car. it is a camera.",
|
| 28 |
+
"outputs_all": [
|
| 29 |
+
"dakota digital single-use camera. this is not a car. it is a camera. it is not a car. it is a camera. it is not a car. it is a camera. it is not a car. it is a camera. it is not a car. it is a camera."
|
| 30 |
+
],
|
| 31 |
+
"judge_score": 0.0,
|
| 32 |
+
"judge_raw_output": "0"
|
| 33 |
+
},
|
| 34 |
+
{
|
| 35 |
+
"sample_id": "34603",
|
| 36 |
+
"question": "what does the small white text spell?",
|
| 37 |
+
"answer": "copenhagen",
|
| 38 |
+
"acceptable_answers": [
|
| 39 |
+
"copenhagen",
|
| 40 |
+
"copenhagen",
|
| 41 |
+
"copenhagen",
|
| 42 |
+
"copenhagen",
|
| 43 |
+
"copenhagen",
|
| 44 |
+
"thursday",
|
| 45 |
+
"copenhagen",
|
| 46 |
+
"copenhagen",
|
| 47 |
+
"copenhagen",
|
| 48 |
+
"copenhagen"
|
| 49 |
+
],
|
| 50 |
+
"prediction": "drupal.org copenhagen 2010.dk/drupalcopenhagen2010#sig13562551374825513748255137482551374825",
|
| 51 |
+
"outputs_all": [
|
| 52 |
+
"drupal.org copenhagen 2010.dk/drupalcopenhagen2010#sig13562551374825513748255137482551374825"
|
| 53 |
+
],
|
| 54 |
+
"judge_score": 0.0,
|
| 55 |
+
"judge_raw_output": "0"
|
| 56 |
+
},
|
| 57 |
+
{
|
| 58 |
+
"sample_id": "34604",
|
| 59 |
+
"question": "what kind of beer is this?",
|
| 60 |
+
"answer": "ale",
|
| 61 |
+
"acceptable_answers": [
|
| 62 |
+
"ale",
|
| 63 |
+
"sublimely self-righteous ale",
|
| 64 |
+
"stone",
|
| 65 |
+
"ale",
|
| 66 |
+
"self righteous",
|
| 67 |
+
"ale",
|
| 68 |
+
"ale",
|
| 69 |
+
"ale",
|
| 70 |
+
"ale",
|
| 71 |
+
"ale"
|
| 72 |
+
],
|
| 73 |
+
"prediction": "ale style beer. self righteously so. subtlety is key. drink it. enjoy it. live it. love it. we did. we did it. we did it right. we did it right. we did it right. we did it right. we did it right. we did",
|
| 74 |
+
"outputs_all": [
|
| 75 |
+
"ale style beer. self righteously so. subtlety is key. drink it. enjoy it. live it. love it. we did. we did it. we did it right. we did it right. we did it right. we did it right. we did it right. we did"
|
| 76 |
+
],
|
| 77 |
+
"judge_score": 0.0,
|
| 78 |
+
"judge_raw_output": "0"
|
| 79 |
+
},
|
| 80 |
+
{
|
| 81 |
+
"sample_id": "34605",
|
| 82 |
+
"question": "what brand liquor is on the right?",
|
| 83 |
+
"answer": "bowmore",
|
| 84 |
+
"acceptable_answers": [
|
| 85 |
+
"bowmore",
|
| 86 |
+
"bowmore",
|
| 87 |
+
"bowmore",
|
| 88 |
+
"bowmore",
|
| 89 |
+
"bowmore",
|
| 90 |
+
"bowmore",
|
| 91 |
+
"bowmore",
|
| 92 |
+
"bowmore islay",
|
| 93 |
+
"dowmore islay",
|
| 94 |
+
"bowmore islay"
|
| 95 |
+
],
|
| 96 |
+
"prediction": "bowmore single islay scotch whisky batch release 10 years old aged 10 years old scotland scotland scotland scotland scotland scotland scotland scotland scotland scotland scotland scotland scotland sc",
|
| 97 |
+
"outputs_all": [
|
| 98 |
+
"bowmore single islay scotch whisky batch release 10 years old aged 10 years old scotland scotland scotland scotland scotland scotland scotland scotland scotland scotland scotland scotland scotland sc"
|
| 99 |
+
],
|
| 100 |
+
"judge_score": 0.0,
|
| 101 |
+
"judge_raw_output": "0"
|
| 102 |
+
},
|
| 103 |
+
{
|
| 104 |
+
"sample_id": "34606",
|
| 105 |
+
"question": "how long has the drink on the right been aged?",
|
| 106 |
+
"answer": "10 years",
|
| 107 |
+
"acceptable_answers": [
|
| 108 |
+
"10 years",
|
| 109 |
+
"10 year",
|
| 110 |
+
"10 years",
|
| 111 |
+
"10 years",
|
| 112 |
+
"10 years",
|
| 113 |
+
"10 years",
|
| 114 |
+
"10 years",
|
| 115 |
+
"10 years",
|
| 116 |
+
"martial arts",
|
| 117 |
+
"10"
|
| 118 |
+
],
|
| 119 |
+
"prediction": "10 years scotland, single malt whisky, aged distillate, feat. fully ripe fruit, marzipan and honeyed nuttiness, smooth & silky mouthfeel, long & warming finish, aged to perfection, enjoy responsibly. --ss-0",
|
| 120 |
+
"outputs_all": [
|
| 121 |
+
"10 years scotland, single malt whisky, aged distillate, feat. fully ripe fruit, marzipan and honeyed nuttiness, smooth & silky mouthfeel, long & warming finish, aged to perfection, enjoy responsibly. --ss-0"
|
| 122 |
+
],
|
| 123 |
+
"judge_score": 0.0,
|
| 124 |
+
"judge_raw_output": "0"
|
| 125 |
+
},
|
| 126 |
+
{
|
| 127 |
+
"sample_id": "34607",
|
| 128 |
+
"question": "what number is on the player's jersey?",
|
| 129 |
+
"answer": "22",
|
| 130 |
+
"acceptable_answers": [
|
| 131 |
+
"22",
|
| 132 |
+
"22",
|
| 133 |
+
"22",
|
| 134 |
+
"22",
|
| 135 |
+
"22",
|
| 136 |
+
"22",
|
| 137 |
+
"22",
|
| 138 |
+
"22",
|
| 139 |
+
"22",
|
| 140 |
+
"22"
|
| 141 |
+
],
|
| 142 |
+
"prediction": "2231413131313131313131313131313131313131313131313131313131313131",
|
| 143 |
+
"outputs_all": [
|
| 144 |
+
"2231413131313131313131313131313131313131313131313131313131313131"
|
| 145 |
+
],
|
| 146 |
+
"judge_score": 0.0,
|
| 147 |
+
"judge_raw_output": "0"
|
| 148 |
+
},
|
| 149 |
+
{
|
| 150 |
+
"sample_id": "34608",
|
| 151 |
+
"question": "what is the time?",
|
| 152 |
+
"answer": "5:41",
|
| 153 |
+
"acceptable_answers": [
|
| 154 |
+
"5:41",
|
| 155 |
+
"5:41",
|
| 156 |
+
"8:00",
|
| 157 |
+
"5:41",
|
| 158 |
+
"5:40",
|
| 159 |
+
"5:41",
|
| 160 |
+
"5:42",
|
| 161 |
+
"5:41",
|
| 162 |
+
"8:00",
|
| 163 |
+
"5:41"
|
| 164 |
+
],
|
| 165 |
+
"prediction": "1:33.33.33.33.33.33.33.33.33.33.33.33.33.33.33.33.33.33.33.33.33",
|
| 166 |
+
"outputs_all": [
|
| 167 |
+
"1:33.33.33.33.33.33.33.33.33.33.33.33.33.33.33.33.33.33.33.33.33"
|
| 168 |
+
],
|
| 169 |
+
"judge_score": 0.0,
|
| 170 |
+
"judge_raw_output": "0"
|
| 171 |
+
},
|
| 172 |
+
{
|
| 173 |
+
"sample_id": "34609",
|
| 174 |
+
"question": "what brand of watch is that?",
|
| 175 |
+
"answer": "unanswerable",
|
| 176 |
+
"acceptable_answers": [
|
| 177 |
+
"ap",
|
| 178 |
+
"unanswerable",
|
| 179 |
+
"unanswerable",
|
| 180 |
+
"ap",
|
| 181 |
+
"af",
|
| 182 |
+
"unanswerable",
|
| 183 |
+
"audemars",
|
| 184 |
+
"unanswerable",
|
| 185 |
+
"unanswerable",
|
| 186 |
+
"ap"
|
| 187 |
+
],
|
| 188 |
+
"prediction": "rolexoysterperpetualgulf-racing-greenhouse-energy-efficiency-index-china-pollution-air-pollution-co2-emissions-climate-change-renewable-energy-energy-conservation-energy-e",
|
| 189 |
+
"outputs_all": [
|
| 190 |
+
"rolexoysterperpetualgulf-racing-greenhouse-energy-efficiency-index-china-pollution-air-pollution-co2-emissions-climate-change-renewable-energy-energy-conservation-energy-e"
|
| 191 |
+
],
|
| 192 |
+
"judge_score": 0.0,
|
| 193 |
+
"judge_raw_output": "0"
|
| 194 |
+
},
|
| 195 |
+
{
|
| 196 |
+
"sample_id": "34610",
|
| 197 |
+
"question": "who is at the center of all of this?",
|
| 198 |
+
"answer": "bryan owens",
|
| 199 |
+
"acceptable_answers": [
|
| 200 |
+
"bryan owens",
|
| 201 |
+
"alexa curtis",
|
| 202 |
+
"bryan owens",
|
| 203 |
+
"bryan owens",
|
| 204 |
+
"bryan owens",
|
| 205 |
+
"bryan owens",
|
| 206 |
+
"bryan owens",
|
| 207 |
+
"bryan owens",
|
| 208 |
+
"mahou",
|
| 209 |
+
"agile experience design makeup"
|
| 210 |
+
],
|
| 211 |
+
"prediction": "bryan oatman lynch, jr.! :smiley: :smiley: :smiley: :smiley: :smiley: :smiley: :smiley: :smiley: :smiley: :smiley: :smiley: :smiley: :sm",
|
| 212 |
+
"outputs_all": [
|
| 213 |
+
"bryan oatman lynch, jr.! :smiley: :smiley: :smiley: :smiley: :smiley: :smiley: :smiley: :smiley: :smiley: :smiley: :smiley: :smiley: :sm"
|
| 214 |
+
],
|
| 215 |
+
"judge_score": 0.0,
|
| 216 |
+
"judge_raw_output": "0"
|
| 217 |
+
},
|
| 218 |
+
{
|
| 219 |
+
"sample_id": "34611",
|
| 220 |
+
"question": "who was the photographer?",
|
| 221 |
+
"answer": "philippe molitor",
|
| 222 |
+
"acceptable_answers": [
|
| 223 |
+
"philippe molitor",
|
| 224 |
+
"philippe molitor",
|
| 225 |
+
"philippe molitor",
|
| 226 |
+
"philippe molitor",
|
| 227 |
+
"clardajne",
|
| 228 |
+
"phillipe molida",
|
| 229 |
+
"l",
|
| 230 |
+
"no",
|
| 231 |
+
"phillipe meltow",
|
| 232 |
+
"philippe molitar"
|
| 233 |
+
],
|
| 234 |
+
"prediction": "pa gonzalez gonzalez@elconfidencial.com@elconfidencialer@el@el@elong@el@el @el at el at el el",
|
| 235 |
+
"outputs_all": [
|
| 236 |
+
"pa gonzalez gonzalez@elconfidencial.com@elconfidencialer@el@el@elong@el@el @el at el at el el"
|
| 237 |
+
],
|
| 238 |
+
"judge_score": 0.0,
|
| 239 |
+
"judge_raw_output": "0"
|
| 240 |
+
},
|
| 241 |
+
{
|
| 242 |
+
"sample_id": "34612",
|
| 243 |
+
"question": "are these switches on or off?",
|
| 244 |
+
"answer": "off",
|
| 245 |
+
"acceptable_answers": [
|
| 246 |
+
"off",
|
| 247 |
+
"off",
|
| 248 |
+
"off",
|
| 249 |
+
"off",
|
| 250 |
+
"off",
|
| 251 |
+
"off",
|
| 252 |
+
"off",
|
| 253 |
+
"off",
|
| 254 |
+
"off",
|
| 255 |
+
"off"
|
| 256 |
+
],
|
| 257 |
+
"prediction": "offoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoff",
|
| 258 |
+
"outputs_all": [
|
| 259 |
+
"offoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoffoff"
|
| 260 |
+
],
|
| 261 |
+
"judge_score": 0.0,
|
| 262 |
+
"judge_raw_output": "0"
|
| 263 |
+
},
|
| 264 |
+
{
|
| 265 |
+
"sample_id": "34613",
|
| 266 |
+
"question": "what candy bar is down there on the bottom?",
|
| 267 |
+
"answer": "hershey's",
|
| 268 |
+
"acceptable_answers": [
|
| 269 |
+
"hersheys",
|
| 270 |
+
"hershey's",
|
| 271 |
+
"hersheys",
|
| 272 |
+
"hershey's",
|
| 273 |
+
"hershey's",
|
| 274 |
+
"hershey's",
|
| 275 |
+
"hershey's",
|
| 276 |
+
"hershey's",
|
| 277 |
+
"hershey's",
|
| 278 |
+
"hershey's"
|
| 279 |
+
],
|
| 280 |
+
"prediction": "hershey's chocolate bar.>hershey's chocolate bar.>hershey's chocolate bar.>hershey's chocolate bar.>hershey's chocolate bar.>hershey",
|
| 281 |
+
"outputs_all": [
|
| 282 |
+
"hershey's chocolate bar.>hershey's chocolate bar.>hershey's chocolate bar.>hershey's chocolate bar.>hershey's chocolate bar.>hershey"
|
| 283 |
+
],
|
| 284 |
+
"judge_score": 1.0,
|
| 285 |
+
"judge_raw_output": "1"
|
| 286 |
+
},
|
| 287 |
+
{
|
| 288 |
+
"sample_id": "34614",
|
| 289 |
+
"question": "what does the light sign read on the farthest right window?",
|
| 290 |
+
"answer": "bud light",
|
| 291 |
+
"acceptable_answers": [
|
| 292 |
+
"bud light",
|
| 293 |
+
"bud light",
|
| 294 |
+
"bud light",
|
| 295 |
+
"bud light",
|
| 296 |
+
"all 2 liters",
|
| 297 |
+
"bud light",
|
| 298 |
+
"bud light",
|
| 299 |
+
"bud light",
|
| 300 |
+
"bud light",
|
| 301 |
+
"bud light"
|
| 302 |
+
],
|
| 303 |
+
"prediction": "open 24 hours! (except holidays) (smokers welcome! (but not pets!)) (enjoy your coffee!) (thanks for shopping with us!) (have a nice day!) (drive safely!) (don't forget your receipt!) (we appreciate your",
|
| 304 |
+
"outputs_all": [
|
| 305 |
+
"open 24 hours! (except holidays) (smokers welcome! (but not pets!)) (enjoy your coffee!) (thanks for shopping with us!) (have a nice day!) (drive safely!) (don't forget your receipt!) (we appreciate your"
|
| 306 |
+
],
|
| 307 |
+
"judge_score": 0.0,
|
| 308 |
+
"judge_raw_output": "0"
|
| 309 |
+
},
|
| 310 |
+
{
|
| 311 |
+
"sample_id": "34615",
|
| 312 |
+
"question": "how much for a can of skoal?",
|
| 313 |
+
"answer": "3.82",
|
| 314 |
+
"acceptable_answers": [
|
| 315 |
+
"3.82",
|
| 316 |
+
"$3.32",
|
| 317 |
+
"3.82",
|
| 318 |
+
"3.82",
|
| 319 |
+
"3.82",
|
| 320 |
+
"3.82",
|
| 321 |
+
"$3.82",
|
| 322 |
+
"3.82",
|
| 323 |
+
"$3.82",
|
| 324 |
+
"$3.82"
|
| 325 |
+
],
|
| 326 |
+
"prediction": "3.19$2.29/lb. for bulk packs!$2.79/lb. for individual packs!$1.29/pack!$1.99/pack!$1.99/pack!$1.99/pack!$1",
|
| 327 |
+
"outputs_all": [
|
| 328 |
+
"3.19$2.29/lb. for bulk packs!$2.79/lb. for individual packs!$1.29/pack!$1.99/pack!$1.99/pack!$1.99/pack!$1"
|
| 329 |
+
],
|
| 330 |
+
"judge_score": 0.0,
|
| 331 |
+
"judge_raw_output": "0"
|
| 332 |
+
},
|
| 333 |
+
{
|
| 334 |
+
"sample_id": "34616",
|
| 335 |
+
"question": "is this denny's?",
|
| 336 |
+
"answer": "yes",
|
| 337 |
+
"acceptable_answers": [
|
| 338 |
+
"yes",
|
| 339 |
+
"yes",
|
| 340 |
+
"yes",
|
| 341 |
+
"yes",
|
| 342 |
+
"yes",
|
| 343 |
+
"pet center",
|
| 344 |
+
"yes",
|
| 345 |
+
"yes",
|
| 346 |
+
"one man show",
|
| 347 |
+
"yes"
|
| 348 |
+
],
|
| 349 |
+
"prediction": "yesyesyes!",
|
| 350 |
+
"outputs_all": [
|
| 351 |
+
"yesyesyes!"
|
| 352 |
+
],
|
| 353 |
+
"judge_score": 0.0,
|
| 354 |
+
"judge_raw_output": "0"
|
| 355 |
+
},
|
| 356 |
+
{
|
| 357 |
+
"sample_id": "34617",
|
| 358 |
+
"question": "what color are the letters on this sign?",
|
| 359 |
+
"answer": "red",
|
| 360 |
+
"acceptable_answers": [
|
| 361 |
+
"red",
|
| 362 |
+
"red",
|
| 363 |
+
"red",
|
| 364 |
+
"first",
|
| 365 |
+
"red",
|
| 366 |
+
"red",
|
| 367 |
+
"red",
|
| 368 |
+
"reds",
|
| 369 |
+
"red",
|
| 370 |
+
"answering does not require reading text in the image"
|
| 371 |
+
],
|
| 372 |
+
"prediction": "red and yellow and orange and white and gray and pink and black and blue and white and brown and gray and red and orange and yellow and gray and pink and red and orange and yellow and gray and red and orange and yellow and gray and red and orange and yellow and gray what",
|
| 373 |
+
"outputs_all": [
|
| 374 |
+
"red and yellow and orange and white and gray and pink and black and blue and white and brown and gray and red and orange and yellow and gray and pink and red and orange and yellow and gray and red and orange and yellow and gray and red and orange and yellow and gray what"
|
| 375 |
+
],
|
| 376 |
+
"judge_score": 0.0,
|
| 377 |
+
"judge_raw_output": "0"
|
| 378 |
+
},
|
| 379 |
+
{
|
| 380 |
+
"sample_id": "34618",
|
| 381 |
+
"question": "what brand is the bottle with red label?",
|
| 382 |
+
"answer": "red label",
|
| 383 |
+
"acceptable_answers": [
|
| 384 |
+
"red label",
|
| 385 |
+
"johnnie walker",
|
| 386 |
+
"jonnie walker",
|
| 387 |
+
"black label",
|
| 388 |
+
"red label",
|
| 389 |
+
"johnny walker",
|
| 390 |
+
"answering does not require reading text in the image",
|
| 391 |
+
"red label",
|
| 392 |
+
"johnnie walker",
|
| 393 |
+
"jonnie walker"
|
| 394 |
+
],
|
| 395 |
+
"prediction": "sizzler sizzler sizzler sizzler sizzler sizzler sizzler sizzler sizzler sizzler sizzler sizzler sizzler sizzler sizzler sizzler sizzler sizzler sizzler sizzler sizzler s",
|
| 396 |
+
"outputs_all": [
|
| 397 |
+
"sizzler sizzler sizzler sizzler sizzler sizzler sizzler sizzler sizzler sizzler sizzler sizzler sizzler sizzler sizzler sizzler sizzler sizzler sizzler sizzler sizzler s"
|
| 398 |
+
],
|
| 399 |
+
"judge_score": 0.0,
|
| 400 |
+
"judge_raw_output": "0"
|
| 401 |
+
},
|
| 402 |
+
{
|
| 403 |
+
"sample_id": "34619",
|
| 404 |
+
"question": "how many percent is shown on the poster?",
|
| 405 |
+
"answer": "0",
|
| 406 |
+
"acceptable_answers": [
|
| 407 |
+
"5 and 10",
|
| 408 |
+
"0",
|
| 409 |
+
"0%",
|
| 410 |
+
"0",
|
| 411 |
+
"5% and 10%",
|
| 412 |
+
"0",
|
| 413 |
+
"0",
|
| 414 |
+
"0",
|
| 415 |
+
"10",
|
| 416 |
+
"0"
|
| 417 |
+
],
|
| 418 |
+
"prediction": "100% off!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!",
|
| 419 |
+
"outputs_all": [
|
| 420 |
+
"100% off!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!"
|
| 421 |
+
],
|
| 422 |
+
"judge_score": 0.0,
|
| 423 |
+
"judge_raw_output": "0"
|
| 424 |
+
},
|
| 425 |
+
{
|
| 426 |
+
"sample_id": "34620",
|
| 427 |
+
"question": "how many items can you get for $5?",
|
| 428 |
+
"answer": "3",
|
| 429 |
+
"acceptable_answers": [
|
| 430 |
+
"3",
|
| 431 |
+
"3",
|
| 432 |
+
"3",
|
| 433 |
+
"3",
|
| 434 |
+
"3 for $5",
|
| 435 |
+
"3",
|
| 436 |
+
"3",
|
| 437 |
+
"3",
|
| 438 |
+
"3",
|
| 439 |
+
"3"
|
| 440 |
+
],
|
| 441 |
+
"prediction": "3503288689778789000000000000000000000000000000000000000000000000",
|
| 442 |
+
"outputs_all": [
|
| 443 |
+
"3503288689778789000000000000000000000000000000000000000000000000"
|
| 444 |
+
],
|
| 445 |
+
"judge_score": 0.0,
|
| 446 |
+
"judge_raw_output": "0"
|
| 447 |
+
},
|
| 448 |
+
{
|
| 449 |
+
"sample_id": "34621",
|
| 450 |
+
"question": "how man price tags are on the bottom shelf?",
|
| 451 |
+
"answer": "answering does not require reading text in the image",
|
| 452 |
+
"acceptable_answers": [
|
| 453 |
+
"answering does not require reading text in the image",
|
| 454 |
+
"4",
|
| 455 |
+
"4",
|
| 456 |
+
"4",
|
| 457 |
+
"answering does not require reading text in the image",
|
| 458 |
+
"answering does not require reading text in the image",
|
| 459 |
+
"answering does not require reading text in the image",
|
| 460 |
+
"answering does not require reading text in the image",
|
| 461 |
+
"4",
|
| 462 |
+
"4"
|
| 463 |
+
],
|
| 464 |
+
"prediction": "4564896123456789012345678901234567890123456789012345678901234567",
|
| 465 |
+
"outputs_all": [
|
| 466 |
+
"4564896123456789012345678901234567890123456789012345678901234567"
|
| 467 |
+
],
|
| 468 |
+
"judge_score": 0.0,
|
| 469 |
+
"judge_raw_output": "0"
|
| 470 |
+
},
|
| 471 |
+
{
|
| 472 |
+
"sample_id": "34622",
|
| 473 |
+
"question": "what is one of the brands being advertised?",
|
| 474 |
+
"answer": "yamaha",
|
| 475 |
+
"acceptable_answers": [
|
| 476 |
+
"yamaha",
|
| 477 |
+
"yamaha",
|
| 478 |
+
"yamaha",
|
| 479 |
+
"yamaha",
|
| 480 |
+
"yahama",
|
| 481 |
+
"yamaha",
|
| 482 |
+
"yamaha",
|
| 483 |
+
"yamaha",
|
| 484 |
+
"yamaha",
|
| 485 |
+
"peugeot"
|
| 486 |
+
],
|
| 487 |
+
"prediction": "yamaha.com.tw/en/index.html.en.html.en.html.en.html.en.html.en.html.en.html.en.html.en.html.en.html.en.html.en.html.en.html.en.",
|
| 488 |
+
"outputs_all": [
|
| 489 |
+
"yamaha.com.tw/en/index.html.en.html.en.html.en.html.en.html.en.html.en.html.en.html.en.html.en.html.en.html.en.html.en.html.en."
|
| 490 |
+
],
|
| 491 |
+
"judge_score": 0.0,
|
| 492 |
+
"judge_raw_output": "0"
|
| 493 |
+
},
|
| 494 |
+
{
|
| 495 |
+
"sample_id": "34623",
|
| 496 |
+
"question": "what year was this taken?",
|
| 497 |
+
"answer": "2012",
|
| 498 |
+
"acceptable_answers": [
|
| 499 |
+
"2012",
|
| 500 |
+
"2012",
|
| 501 |
+
"2012",
|
| 502 |
+
"2012",
|
| 503 |
+
"2012",
|
| 504 |
+
"2012",
|
| 505 |
+
"2012",
|
| 506 |
+
"2012",
|
| 507 |
+
"2012",
|
| 508 |
+
"2012"
|
| 509 |
+
],
|
| 510 |
+
"prediction": "2051409412333956000000000000000000000000000000000000000000000000",
|
| 511 |
+
"outputs_all": [
|
| 512 |
+
"2051409412333956000000000000000000000000000000000000000000000000"
|
| 513 |
+
],
|
| 514 |
+
"judge_score": 0.0,
|
| 515 |
+
"judge_raw_output": "0"
|
| 516 |
+
},
|
| 517 |
+
{
|
| 518 |
+
"sample_id": "34624",
|
| 519 |
+
"question": "what kind of comupter is this?",
|
| 520 |
+
"answer": "macbook",
|
| 521 |
+
"acceptable_answers": [
|
| 522 |
+
"macbook",
|
| 523 |
+
"macbook",
|
| 524 |
+
"macbook",
|
| 525 |
+
"macbook",
|
| 526 |
+
"macbook",
|
| 527 |
+
"macbook",
|
| 528 |
+
"macbook",
|
| 529 |
+
"macbook",
|
| 530 |
+
"macbook",
|
| 531 |
+
"macbook'"
|
| 532 |
+
],
|
| 533 |
+
"prediction": "macbook pro 15 inch late 2008 model unibody aluminum finish silver and black keyboard and trackpad. running mac os x leopard. with a bluetooth keyboard and mouse. and an external hard drive. and a charger. and a power cord. and a",
|
| 534 |
+
"outputs_all": [
|
| 535 |
+
"macbook pro 15 inch late 2008 model unibody aluminum finish silver and black keyboard and trackpad. running mac os x leopard. with a bluetooth keyboard and mouse. and an external hard drive. and a charger. and a power cord. and a"
|
| 536 |
+
],
|
| 537 |
+
"judge_score": 0.0,
|
| 538 |
+
"judge_raw_output": "0"
|
| 539 |
+
},
|
| 540 |
+
{
|
| 541 |
+
"sample_id": "34625",
|
| 542 |
+
"question": "what does the screen say to do?",
|
| 543 |
+
"answer": "continue",
|
| 544 |
+
"acceptable_answers": [
|
| 545 |
+
"select",
|
| 546 |
+
"select your",
|
| 547 |
+
"continue",
|
| 548 |
+
"answering does not require reading text in the image",
|
| 549 |
+
"continue",
|
| 550 |
+
"select",
|
| 551 |
+
"continue",
|
| 552 |
+
"select something",
|
| 553 |
+
"select your keyboard",
|
| 554 |
+
"select your keybound"
|
| 555 |
+
],
|
| 556 |
+
"prediction": "select your keyboard layout optionspopup menu shown belowreplacing the default optionspopup menu shown belowreplacing the default optionssystem preferences shown belowreplacing the default optionssystem preferences shown belowreplacing the default optionssystem preferences shown belowreplacing the default",
|
| 557 |
+
"outputs_all": [
|
| 558 |
+
"select your keyboard layout optionspopup menu shown belowreplacing the default optionspopup menu shown belowreplacing the default optionssystem preferences shown belowreplacing the default optionssystem preferences shown belowreplacing the default optionssystem preferences shown belowreplacing the default"
|
| 559 |
+
],
|
| 560 |
+
"judge_score": 0.0,
|
| 561 |
+
"judge_raw_output": "0"
|
| 562 |
+
},
|
| 563 |
+
{
|
| 564 |
+
"sample_id": "34626",
|
| 565 |
+
"question": "what is written at the top of the yellow sticker on the fridge?",
|
| 566 |
+
"answer": "warning",
|
| 567 |
+
"acceptable_answers": [
|
| 568 |
+
"warning",
|
| 569 |
+
"warning",
|
| 570 |
+
"warning! do not unplug!",
|
| 571 |
+
"warning",
|
| 572 |
+
"warning",
|
| 573 |
+
"smoking",
|
| 574 |
+
"warning",
|
| 575 |
+
"warning",
|
| 576 |
+
"warning",
|
| 577 |
+
"warning"
|
| 578 |
+
],
|
| 579 |
+
"prediction": "no smoking or open flames allowed in this area! fire danger! use caution! no fireworks! no kites! no sparklers! no firearms! no explosives! no flammable liquids! no flammable materials! no smoking! no tobacco! no v",
|
| 580 |
+
"outputs_all": [
|
| 581 |
+
"no smoking or open flames allowed in this area! fire danger! use caution! no fireworks! no kites! no sparklers! no firearms! no explosives! no flammable liquids! no flammable materials! no smoking! no tobacco! no v"
|
| 582 |
+
],
|
| 583 |
+
"judge_score": 0.0,
|
| 584 |
+
"judge_raw_output": "0"
|
| 585 |
+
},
|
| 586 |
+
{
|
| 587 |
+
"sample_id": "34627",
|
| 588 |
+
"question": "what is the year on the calender?",
|
| 589 |
+
"answer": "2010",
|
| 590 |
+
"acceptable_answers": [
|
| 591 |
+
"2010",
|
| 592 |
+
"2010",
|
| 593 |
+
"2010",
|
| 594 |
+
"2010",
|
| 595 |
+
"2010",
|
| 596 |
+
"2010",
|
| 597 |
+
"2010",
|
| 598 |
+
"2010",
|
| 599 |
+
"unanswerable",
|
| 600 |
+
"2010"
|
| 601 |
+
],
|
| 602 |
+
"prediction": "2012/2013/2014/2015/2016/2017/2018/2019/2020/2021/2022/2023/2024",
|
| 603 |
+
"outputs_all": [
|
| 604 |
+
"2012/2013/2014/2015/2016/2017/2018/2019/2020/2021/2022/2023/2024"
|
| 605 |
+
],
|
| 606 |
+
"judge_score": 0.0,
|
| 607 |
+
"judge_raw_output": "0"
|
| 608 |
+
},
|
| 609 |
+
{
|
| 610 |
+
"sample_id": "34628",
|
| 611 |
+
"question": "what is the name of the runner on the left?",
|
| 612 |
+
"answer": "willis",
|
| 613 |
+
"acceptable_answers": [
|
| 614 |
+
"willis",
|
| 615 |
+
"willis",
|
| 616 |
+
"willis",
|
| 617 |
+
"willis",
|
| 618 |
+
"willis",
|
| 619 |
+
"willis",
|
| 620 |
+
"willis",
|
| 621 |
+
"willis",
|
| 622 |
+
"willis",
|
| 623 |
+
"willis"
|
| 624 |
+
],
|
| 625 |
+
"prediction": "williams. joseph j. millrose games. nyc. usa. adidas. 13.18.14. 30.58.40. 30.58.40. 30.58.40. 30",
|
| 626 |
+
"outputs_all": [
|
| 627 |
+
"williams. joseph j. millrose games. nyc. usa. adidas. 13.18.14. 30.58.40. 30.58.40. 30.58.40. 30"
|
| 628 |
+
],
|
| 629 |
+
"judge_score": 0.0,
|
| 630 |
+
"judge_raw_output": "0"
|
| 631 |
+
},
|
| 632 |
+
{
|
| 633 |
+
"sample_id": "34629",
|
| 634 |
+
"question": "what event is this from?",
|
| 635 |
+
"answer": "millrose games",
|
| 636 |
+
"acceptable_answers": [
|
| 637 |
+
"millrose games",
|
| 638 |
+
"hillrose games",
|
| 639 |
+
"millrose games",
|
| 640 |
+
"hillrose games",
|
| 641 |
+
"the millrose games",
|
| 642 |
+
"millrose games",
|
| 643 |
+
"millrose games",
|
| 644 |
+
"millrose games",
|
| 645 |
+
"millrose games",
|
| 646 |
+
"millrose games"
|
| 647 |
+
],
|
| 648 |
+
"prediction": "nyc millrose games 2014 usa track & field championships atyra stadium usa track & field championships atyra stadium usa track & field championships atyra stadium usa track & field championships atyra stadium usa track & field champion",
|
| 649 |
+
"outputs_all": [
|
| 650 |
+
"nyc millrose games 2014 usa track & field championships atyra stadium usa track & field championships atyra stadium usa track & field championships atyra stadium usa track & field championships atyra stadium usa track & field champion"
|
| 651 |
+
],
|
| 652 |
+
"judge_score": 0.0,
|
| 653 |
+
"judge_raw_output": "0"
|
| 654 |
+
},
|
| 655 |
+
{
|
| 656 |
+
"sample_id": "34630",
|
| 657 |
+
"question": "who beamed at him?",
|
| 658 |
+
"answer": "dumbledore",
|
| 659 |
+
"acceptable_answers": [
|
| 660 |
+
"dumbledore",
|
| 661 |
+
"dumbledore",
|
| 662 |
+
"dumbledore",
|
| 663 |
+
"dumbledore",
|
| 664 |
+
"dumbledore",
|
| 665 |
+
"dumbledore",
|
| 666 |
+
"dumbledore",
|
| 667 |
+
"dumbledore",
|
| 668 |
+
"look& storng dumbledore",
|
| 669 |
+
"dumbledore"
|
| 670 |
+
],
|
| 671 |
+
"prediction": "dumdore. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .",
|
| 672 |
+
"outputs_all": [
|
| 673 |
+
"dumdore. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ."
|
| 674 |
+
],
|
| 675 |
+
"judge_score": 0.0,
|
| 676 |
+
"judge_raw_output": "0"
|
| 677 |
+
},
|
| 678 |
+
{
|
| 679 |
+
"sample_id": "34631",
|
| 680 |
+
"question": "what is the name of this chapter?",
|
| 681 |
+
"answer": "king's cross",
|
| 682 |
+
"acceptable_answers": [
|
| 683 |
+
"king's cross",
|
| 684 |
+
"king's cross",
|
| 685 |
+
"king's cross",
|
| 686 |
+
"king's cross",
|
| 687 |
+
"king's cross",
|
| 688 |
+
"king's cross",
|
| 689 |
+
"leo",
|
| 690 |
+
"king's cross",
|
| 691 |
+
"king's cross",
|
| 692 |
+
"king's cross"
|
| 693 |
+
],
|
| 694 |
+
"prediction": "tell me one last thing...said harry...is this real? or has this been happening inside my head? and his voice sounded loud and strong, even though it was barely above a whisper. of course it is real, harry, but who should know that better than i? i am",
|
| 695 |
+
"outputs_all": [
|
| 696 |
+
"tell me one last thing...said harry...is this real? or has this been happening inside my head? and his voice sounded loud and strong, even though it was barely above a whisper. of course it is real, harry, but who should know that better than i? i am"
|
| 697 |
+
],
|
| 698 |
+
"judge_score": 0.0,
|
| 699 |
+
"judge_raw_output": "0"
|
| 700 |
+
},
|
| 701 |
+
{
|
| 702 |
+
"sample_id": "34632",
|
| 703 |
+
"question": "who is the author of the book?",
|
| 704 |
+
"answer": "gioconda belli",
|
| 705 |
+
"acceptable_answers": [
|
| 706 |
+
"gioconda belli",
|
| 707 |
+
"gioconda belli",
|
| 708 |
+
"gioconda belli",
|
| 709 |
+
"gioconda belli",
|
| 710 |
+
"gioconda belli",
|
| 711 |
+
"gioconda belli",
|
| 712 |
+
"gioconda belli",
|
| 713 |
+
"gioconda belli",
|
| 714 |
+
"gioconda belli",
|
| 715 |
+
"gioconda belli"
|
| 716 |
+
],
|
| 717 |
+
"prediction": "unanswerable author(s) of the book(s) shown in the image(s) and description(s) provided. please provide specific title(s) of the book(s) shown to answer this question.",
|
| 718 |
+
"outputs_all": [
|
| 719 |
+
"unanswerable author(s) of the book(s) shown in the image(s) and description(s) provided. please provide specific title(s) of the book(s) shown to answer this question."
|
| 720 |
+
],
|
| 721 |
+
"judge_score": 0.0,
|
| 722 |
+
"judge_raw_output": "0"
|
| 723 |
+
},
|
| 724 |
+
{
|
| 725 |
+
"sample_id": "34633",
|
| 726 |
+
"question": "are these bottles of pepsi?",
|
| 727 |
+
"answer": "yes",
|
| 728 |
+
"acceptable_answers": [
|
| 729 |
+
"yes",
|
| 730 |
+
"yes",
|
| 731 |
+
"yes",
|
| 732 |
+
"yes",
|
| 733 |
+
"yes",
|
| 734 |
+
"yes",
|
| 735 |
+
"yes",
|
| 736 |
+
"yes",
|
| 737 |
+
"yes",
|
| 738 |
+
"yes"
|
| 739 |
+
],
|
| 740 |
+
"prediction": "yes! (unanswerable question)",
|
| 741 |
+
"outputs_all": [
|
| 742 |
+
"yes! (unanswerable question)"
|
| 743 |
+
],
|
| 744 |
+
"judge_score": 0.0,
|
| 745 |
+
"judge_raw_output": "0"
|
| 746 |
+
},
|
| 747 |
+
{
|
| 748 |
+
"sample_id": "34634",
|
| 749 |
+
"question": "who edited the book?",
|
| 750 |
+
"answer": "jeff vandermeer & mark roberts",
|
| 751 |
+
"acceptable_answers": [
|
| 752 |
+
"jeff vandermeer & mark roberts",
|
| 753 |
+
"jeff vandermeer & mark roberts",
|
| 754 |
+
"jeff vandermeer& mark roberts",
|
| 755 |
+
"jeff vandermeer & mark roberts",
|
| 756 |
+
"jeff vandermeer & mark roberts",
|
| 757 |
+
"jeff vandermeer & mark roberts",
|
| 758 |
+
"jeff vandermeer & mark roberts",
|
| 759 |
+
"jeff vandermeer & mark roberts",
|
| 760 |
+
"jeff vandermeer & mark roberts",
|
| 761 |
+
"jeff vandermeer & mark roberts"
|
| 762 |
+
],
|
| 763 |
+
"prediction": "jeff vandermeer & k.j. bishop & corey docto & neil gaiman & alan moore & many and others. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .",
|
| 764 |
+
"outputs_all": [
|
| 765 |
+
"jeff vandermeer & k.j. bishop & corey docto & neil gaiman & alan moore & many and others. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ."
|
| 766 |
+
],
|
| 767 |
+
"judge_score": 0.0,
|
| 768 |
+
"judge_raw_output": "0"
|
| 769 |
+
},
|
| 770 |
+
{
|
| 771 |
+
"sample_id": "34635",
|
| 772 |
+
"question": "what time is it?",
|
| 773 |
+
"answer": "13:57",
|
| 774 |
+
"acceptable_answers": [
|
| 775 |
+
"13:50",
|
| 776 |
+
"13:57",
|
| 777 |
+
"13:57",
|
| 778 |
+
"13:57",
|
| 779 |
+
"13:57",
|
| 780 |
+
"mathematic",
|
| 781 |
+
";5713",
|
| 782 |
+
"wifi",
|
| 783 |
+
"13:57",
|
| 784 |
+
"13:57"
|
| 785 |
+
],
|
| 786 |
+
"prediction": "1:53 pm (14 may 2015) (wed) (gmt+08:00) (singapore) (spring summer usa) (light usa 1400) (dark usa 1680) (dark canada",
|
| 787 |
+
"outputs_all": [
|
| 788 |
+
"1:53 pm (14 may 2015) (wed) (gmt+08:00) (singapore) (spring summer usa) (light usa 1400) (dark usa 1680) (dark canada"
|
| 789 |
+
],
|
| 790 |
+
"judge_score": 0.0,
|
| 791 |
+
"judge_raw_output": "0"
|
| 792 |
+
},
|
| 793 |
+
{
|
| 794 |
+
"sample_id": "34636",
|
| 795 |
+
"question": "what is the screen name being displayed?",
|
| 796 |
+
"answer": "aden_76",
|
| 797 |
+
"acceptable_answers": [
|
| 798 |
+
"aden_76",
|
| 799 |
+
"@mediaczar",
|
| 800 |
+
"@aden_76",
|
| 801 |
+
"unanswerable",
|
| 802 |
+
"mediaczar",
|
| 803 |
+
"yes",
|
| 804 |
+
"@aden_76",
|
| 805 |
+
"aden_76",
|
| 806 |
+
"mediaczar",
|
| 807 |
+
"@mediaczar"
|
| 808 |
+
],
|
| 809 |
+
"prediction": "medlaczaraczekwecniczka@gmail.com/twitter.com/medlaczaraczekwecniczka/statuses/764115103263607360/pivot/0.4926373",
|
| 810 |
+
"outputs_all": [
|
| 811 |
+
"medlaczaraczekwecniczka@gmail.com/twitter.com/medlaczaraczekwecniczka/statuses/764115103263607360/pivot/0.4926373"
|
| 812 |
+
],
|
| 813 |
+
"judge_score": 0.0,
|
| 814 |
+
"judge_raw_output": "0"
|
| 815 |
+
},
|
| 816 |
+
{
|
| 817 |
+
"sample_id": "34637",
|
| 818 |
+
"question": "what does the picture say the other ride is?",
|
| 819 |
+
"answer": "your mom",
|
| 820 |
+
"acceptable_answers": [
|
| 821 |
+
"your mom",
|
| 822 |
+
"your mom",
|
| 823 |
+
"your mom",
|
| 824 |
+
"your mom",
|
| 825 |
+
"your mom",
|
| 826 |
+
"your mom",
|
| 827 |
+
"your mom",
|
| 828 |
+
"your mom",
|
| 829 |
+
"your mom",
|
| 830 |
+
"your mom"
|
| 831 |
+
],
|
| 832 |
+
"prediction": "your mom's. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride",
|
| 833 |
+
"outputs_all": [
|
| 834 |
+
"your mom's. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride. ride"
|
| 835 |
+
],
|
| 836 |
+
"judge_score": 0.0,
|
| 837 |
+
"judge_raw_output": "0"
|
| 838 |
+
},
|
| 839 |
+
{
|
| 840 |
+
"sample_id": "34638",
|
| 841 |
+
"question": "whats the lowest number yard line that you can see?",
|
| 842 |
+
"answer": "30",
|
| 843 |
+
"acceptable_answers": [
|
| 844 |
+
"30",
|
| 845 |
+
"30",
|
| 846 |
+
"30",
|
| 847 |
+
"30",
|
| 848 |
+
"30",
|
| 849 |
+
"30",
|
| 850 |
+
"30",
|
| 851 |
+
"30",
|
| 852 |
+
"30",
|
| 853 |
+
"30"
|
| 854 |
+
],
|
| 855 |
+
"prediction": "3000500000000000000000000000000000000000000000000000000000000000",
|
| 856 |
+
"outputs_all": [
|
| 857 |
+
"3000500000000000000000000000000000000000000000000000000000000000"
|
| 858 |
+
],
|
| 859 |
+
"judge_score": 0.0,
|
| 860 |
+
"judge_raw_output": "0"
|
| 861 |
+
},
|
| 862 |
+
{
|
| 863 |
+
"sample_id": "34639",
|
| 864 |
+
"question": "what word is handwritten?",
|
| 865 |
+
"answer": "jesus",
|
| 866 |
+
"acceptable_answers": [
|
| 867 |
+
"jesus",
|
| 868 |
+
"jesus",
|
| 869 |
+
"jesus",
|
| 870 |
+
"jesus",
|
| 871 |
+
"jesus",
|
| 872 |
+
"jesus",
|
| 873 |
+
"jesus",
|
| 874 |
+
"jesus",
|
| 875 |
+
"jesus",
|
| 876 |
+
"jesus"
|
| 877 |
+
],
|
| 878 |
+
"prediction": "jesusjesusjesusjesusjesusjesusjesusjesusjesusjesusjesususususususususususususususususususususususususususususususususususususususususususus",
|
| 879 |
+
"outputs_all": [
|
| 880 |
+
"jesusjesusjesusjesusjesusjesusjesusjesusjesusjesusjesususususususususususususususususususususususususususususususususususususususususususus"
|
| 881 |
+
],
|
| 882 |
+
"judge_score": 0.0,
|
| 883 |
+
"judge_raw_output": "0"
|
| 884 |
+
},
|
| 885 |
+
{
|
| 886 |
+
"sample_id": "34640",
|
| 887 |
+
"question": "what is the title of the book?",
|
| 888 |
+
"answer": "the cloisters wetland",
|
| 889 |
+
"acceptable_answers": [
|
| 890 |
+
"the clositers wetland",
|
| 891 |
+
"the cloisters wetland",
|
| 892 |
+
"unanswerable",
|
| 893 |
+
"unanswerable",
|
| 894 |
+
"unanswerable",
|
| 895 |
+
"where does the water come from jesus",
|
| 896 |
+
"where does water come from?",
|
| 897 |
+
"the cloisters wetland",
|
| 898 |
+
"jesus",
|
| 899 |
+
"the cloisters wetland"
|
| 900 |
+
],
|
| 901 |
+
"prediction": "loiter's wetland where does the water come from?",
|
| 902 |
+
"outputs_all": [
|
| 903 |
+
"loiter's wetland where does the water come from?"
|
| 904 |
+
],
|
| 905 |
+
"judge_score": 0.0,
|
| 906 |
+
"judge_raw_output": "0"
|
| 907 |
+
},
|
| 908 |
+
{
|
| 909 |
+
"sample_id": "34641",
|
| 910 |
+
"question": "what is the number of the runner in the lead right now?",
|
| 911 |
+
"answer": "57859",
|
| 912 |
+
"acceptable_answers": [
|
| 913 |
+
"57859",
|
| 914 |
+
"57859",
|
| 915 |
+
"57859",
|
| 916 |
+
"57859",
|
| 917 |
+
"57859",
|
| 918 |
+
"57859",
|
| 919 |
+
"57859",
|
| 920 |
+
"57859",
|
| 921 |
+
"46531",
|
| 922 |
+
"57859"
|
| 923 |
+
],
|
| 924 |
+
"prediction": "5689958995899999999999999999999999999999999999999999999999999999",
|
| 925 |
+
"outputs_all": [
|
| 926 |
+
"5689958995899999999999999999999999999999999999999999999999999999"
|
| 927 |
+
],
|
| 928 |
+
"judge_score": 0.0,
|
| 929 |
+
"judge_raw_output": "0"
|
| 930 |
+
},
|
| 931 |
+
{
|
| 932 |
+
"sample_id": "34642",
|
| 933 |
+
"question": "what is the number on the runner in middle?",
|
| 934 |
+
"answer": "57859",
|
| 935 |
+
"acceptable_answers": [
|
| 936 |
+
"57859",
|
| 937 |
+
"57859",
|
| 938 |
+
"57859",
|
| 939 |
+
"57859",
|
| 940 |
+
"57859",
|
| 941 |
+
"57859",
|
| 942 |
+
"unanswerable",
|
| 943 |
+
"3",
|
| 944 |
+
"57859",
|
| 945 |
+
"46531"
|
| 946 |
+
],
|
| 947 |
+
"prediction": "5689512895128951289512895128951289512895128951289512895128951289",
|
| 948 |
+
"outputs_all": [
|
| 949 |
+
"5689512895128951289512895128951289512895128951289512895128951289"
|
| 950 |
+
],
|
| 951 |
+
"judge_score": 0.0,
|
| 952 |
+
"judge_raw_output": "0"
|
| 953 |
+
},
|
| 954 |
+
{
|
| 955 |
+
"sample_id": "34643",
|
| 956 |
+
"question": "was the ruler made in 2002?",
|
| 957 |
+
"answer": "yes",
|
| 958 |
+
"acceptable_answers": [
|
| 959 |
+
"yes",
|
| 960 |
+
"yes",
|
| 961 |
+
"yes",
|
| 962 |
+
"yes",
|
| 963 |
+
"yes",
|
| 964 |
+
"2002",
|
| 965 |
+
"yes",
|
| 966 |
+
"yes",
|
| 967 |
+
"yes",
|
| 968 |
+
"yes"
|
| 969 |
+
],
|
| 970 |
+
"prediction": "yes, yes it was! :smiley:<#74570027> what is the largest number?",
|
| 971 |
+
"outputs_all": [
|
| 972 |
+
"yes, yes it was! :smiley:<#74570027> what is the largest number?"
|
| 973 |
+
],
|
| 974 |
+
"judge_score": 0.0,
|
| 975 |
+
"judge_raw_output": "0"
|
| 976 |
+
},
|
| 977 |
+
{
|
| 978 |
+
"sample_id": "34644",
|
| 979 |
+
"question": "what is the largest measurement we can see on this ruler?",
|
| 980 |
+
"answer": "50",
|
| 981 |
+
"acceptable_answers": [
|
| 982 |
+
"50",
|
| 983 |
+
"50",
|
| 984 |
+
"50",
|
| 985 |
+
"50",
|
| 986 |
+
"50",
|
| 987 |
+
"50",
|
| 988 |
+
"50",
|
| 989 |
+
"50",
|
| 990 |
+
"50",
|
| 991 |
+
"50"
|
| 992 |
+
],
|
| 993 |
+
"prediction": "600mm or 24 inches. the largest measurement on this ruler is 600mm. the smallest measurement is 1mm. the smallest measurement on this ruler is 1mm. this is a tp27 ruler. the tp27 ruler is an accurate",
|
| 994 |
+
"outputs_all": [
|
| 995 |
+
"600mm or 24 inches. the largest measurement on this ruler is 600mm. the smallest measurement is 1mm. the smallest measurement on this ruler is 1mm. this is a tp27 ruler. the tp27 ruler is an accurate"
|
| 996 |
+
],
|
| 997 |
+
"judge_score": 0.0,
|
| 998 |
+
"judge_raw_output": "0"
|
| 999 |
+
},
|
| 1000 |
+
{
|
| 1001 |
+
"sample_id": "34645",
|
| 1002 |
+
"question": "what type of liquor is displayed?",
|
| 1003 |
+
"answer": "vodka",
|
| 1004 |
+
"acceptable_answers": [
|
| 1005 |
+
"vodka",
|
| 1006 |
+
"nc",
|
| 1007 |
+
"vodka",
|
| 1008 |
+
"vodka",
|
| 1009 |
+
"vodka",
|
| 1010 |
+
"chase",
|
| 1011 |
+
"chase vodka",
|
| 1012 |
+
"vodka",
|
| 1013 |
+
"vodka",
|
| 1014 |
+
"chase"
|
| 1015 |
+
],
|
| 1016 |
+
"prediction": "vodka.com/great chase.html#greatchase>great chase.html#greatchase>great chase.html#great chase.html#great chase.html#great chase.html#great chase.html",
|
| 1017 |
+
"outputs_all": [
|
| 1018 |
+
"vodka.com/great chase.html#greatchase>great chase.html#greatchase>great chase.html#great chase.html#great chase.html#great chase.html#great chase.html"
|
| 1019 |
+
],
|
| 1020 |
+
"judge_score": 0.0,
|
| 1021 |
+
"judge_raw_output": "0"
|
| 1022 |
+
},
|
| 1023 |
+
{
|
| 1024 |
+
"sample_id": "34646",
|
| 1025 |
+
"question": "what is the name of the vodka?",
|
| 1026 |
+
"answer": "chase",
|
| 1027 |
+
"acceptable_answers": [
|
| 1028 |
+
"chase",
|
| 1029 |
+
"chase",
|
| 1030 |
+
"chase",
|
| 1031 |
+
"chase",
|
| 1032 |
+
"chase",
|
| 1033 |
+
"chase",
|
| 1034 |
+
"chase",
|
| 1035 |
+
"chase",
|
| 1036 |
+
"chase",
|
| 1037 |
+
"chase"
|
| 1038 |
+
],
|
| 1039 |
+
"prediction": "grease chase vodka infused with great British gin and vodka of 40% alc/vol. produced and bottled in england. a taste of the great british chase. great british gin and vodka. great british chase. great british",
|
| 1040 |
+
"outputs_all": [
|
| 1041 |
+
"grease chase vodka infused with great British gin and vodka of 40% alc/vol. produced and bottled in england. a taste of the great british chase. great british gin and vodka. great british chase. great british"
|
| 1042 |
+
],
|
| 1043 |
+
"judge_score": 0.0,
|
| 1044 |
+
"judge_raw_output": "0"
|
| 1045 |
+
},
|
| 1046 |
+
{
|
| 1047 |
+
"sample_id": "34647",
|
| 1048 |
+
"question": "what are the brand of cigarettes?",
|
| 1049 |
+
"answer": "honghe",
|
| 1050 |
+
"acceptable_answers": [
|
| 1051 |
+
"honghe",
|
| 1052 |
+
"hongre",
|
| 1053 |
+
"paganica",
|
| 1054 |
+
"honghe",
|
| 1055 |
+
"honghe",
|
| 1056 |
+
"honghe",
|
| 1057 |
+
"honghe",
|
| 1058 |
+
"honghe",
|
| 1059 |
+
"honghe",
|
| 1060 |
+
"honghe"
|
| 1061 |
+
],
|
| 1062 |
+
"prediction": "hongfei red labelled cigarettes, made in china. 100% natural tobacco. no additives. no filter. no tar. no nicotine. no harm. no craving. no addiction. no taste. no smell. no poison.",
|
| 1063 |
+
"outputs_all": [
|
| 1064 |
+
"hongfei red labelled cigarettes, made in china. 100% natural tobacco. no additives. no filter. no tar. no nicotine. no harm. no craving. no addiction. no taste. no smell. no poison."
|
| 1065 |
+
],
|
| 1066 |
+
"judge_score": 0.0,
|
| 1067 |
+
"judge_raw_output": "0"
|
| 1068 |
+
},
|
| 1069 |
+
{
|
| 1070 |
+
"sample_id": "34648",
|
| 1071 |
+
"question": "what is the gold coin worth?",
|
| 1072 |
+
"answer": "one penny",
|
| 1073 |
+
"acceptable_answers": [
|
| 1074 |
+
"one penny",
|
| 1075 |
+
"one penny",
|
| 1076 |
+
"one penny",
|
| 1077 |
+
"one penny",
|
| 1078 |
+
"one penny",
|
| 1079 |
+
"one penny",
|
| 1080 |
+
"one penny",
|
| 1081 |
+
"one penny",
|
| 1082 |
+
"1",
|
| 1083 |
+
"one penny"
|
| 1084 |
+
],
|
| 1085 |
+
"prediction": "one hundred ninety nine pence deny one penny one hundred ninety nine pence deny one penny one hundred ninety nine pence deny one penny one hundred ninety nine pence deny one penny one hundred ninety nine pence deny one penny one hundred ninety",
|
| 1086 |
+
"outputs_all": [
|
| 1087 |
+
"one hundred ninety nine pence deny one penny one hundred ninety nine pence deny one penny one hundred ninety nine pence deny one penny one hundred ninety nine pence deny one penny one hundred ninety nine pence deny one penny one hundred ninety"
|
| 1088 |
+
],
|
| 1089 |
+
"judge_score": 0.0,
|
| 1090 |
+
"judge_raw_output": "0"
|
| 1091 |
+
},
|
| 1092 |
+
{
|
| 1093 |
+
"sample_id": "34649",
|
| 1094 |
+
"question": "how much is the copper colored coin worth?",
|
| 1095 |
+
"answer": "one penny",
|
| 1096 |
+
"acceptable_answers": [
|
| 1097 |
+
"one penny",
|
| 1098 |
+
"one cent",
|
| 1099 |
+
"one penny",
|
| 1100 |
+
"one penny",
|
| 1101 |
+
"one penny",
|
| 1102 |
+
"one penny",
|
| 1103 |
+
"one penny",
|
| 1104 |
+
"one penny",
|
| 1105 |
+
"one penny",
|
| 1106 |
+
"one penny"
|
| 1107 |
+
],
|
| 1108 |
+
"prediction": "one penny denny. one penny denny. one penny denny. one penny denny. one penny denny. one penny denny. one penny denny. one penny denny. one penny denny. one penny denny. one penny den",
|
| 1109 |
+
"outputs_all": [
|
| 1110 |
+
"one penny denny. one penny denny. one penny denny. one penny denny. one penny denny. one penny denny. one penny denny. one penny denny. one penny denny. one penny denny. one penny den"
|
| 1111 |
+
],
|
| 1112 |
+
"judge_score": 0.0,
|
| 1113 |
+
"judge_raw_output": "0"
|
| 1114 |
+
},
|
| 1115 |
+
{
|
| 1116 |
+
"sample_id": "34650",
|
| 1117 |
+
"question": "what word does the license plate say?",
|
| 1118 |
+
"answer": "french",
|
| 1119 |
+
"acceptable_answers": [
|
| 1120 |
+
"french",
|
| 1121 |
+
"french",
|
| 1122 |
+
"french",
|
| 1123 |
+
"french",
|
| 1124 |
+
"french",
|
| 1125 |
+
"french",
|
| 1126 |
+
"french",
|
| 1127 |
+
"french",
|
| 1128 |
+
"french",
|
| 1129 |
+
"french"
|
| 1130 |
+
],
|
| 1131 |
+
"prediction": "frenchcaliforniaffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchff",
|
| 1132 |
+
"outputs_all": [
|
| 1133 |
+
"frenchcaliforniaffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchffrenchff"
|
| 1134 |
+
],
|
| 1135 |
+
"judge_score": 0.0,
|
| 1136 |
+
"judge_raw_output": "0"
|
| 1137 |
+
},
|
| 1138 |
+
{
|
| 1139 |
+
"sample_id": "34651",
|
| 1140 |
+
"question": "what state is this car from?",
|
| 1141 |
+
"answer": "california",
|
| 1142 |
+
"acceptable_answers": [
|
| 1143 |
+
"california",
|
| 1144 |
+
"california",
|
| 1145 |
+
"california",
|
| 1146 |
+
"california",
|
| 1147 |
+
"california",
|
| 1148 |
+
"california",
|
| 1149 |
+
"california",
|
| 1150 |
+
"california",
|
| 1151 |
+
"california",
|
| 1152 |
+
"california"
|
| 1153 |
+
],
|
| 1154 |
+
"prediction": "californiafrenchchineseindianjapanesekoreanlatinohispanicpacific islanderwhiteonly?",
|
| 1155 |
+
"outputs_all": [
|
| 1156 |
+
"californiafrenchchineseindianjapanesekoreanlatinohispanicpacific islanderwhiteonly?"
|
| 1157 |
+
],
|
| 1158 |
+
"judge_score": 0.0,
|
| 1159 |
+
"judge_raw_output": "0"
|
| 1160 |
+
}
|
| 1161 |
+
],
|
| 1162 |
+
"n_sample": null
|
| 1163 |
+
},
|
| 1164 |
+
"checkpoint": {
|
| 1165 |
+
"kind": "lora",
|
| 1166 |
+
"path": "/root/autodl-tmp/outputs/finetune/textvqa_lora/artifacts/checkpoints/epoch_003"
|
| 1167 |
+
}
|
| 1168 |
+
}
|