File size: 48,800 Bytes
3358602
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
# Spider 2.0 ORIGINAL (no-wildcards) β€” review

**Total samples:** 233
**Samples with substitutions:** 83
**Total pattern substitutions:** 396
**Leftover `*` in graph (should be 0):** 0

## Substitution reasons

| reason | count |
|---|---|
| `q_year_fallback` | 250 |
| `hardcoded_default` | 101 |
| `sql_specific` | 13 |
| `table_suffix_range` | 10 |
| `sql_specific_year_alpha` | 7 |
| `sql_partial_wildcard` | 4 |
| `sql_8digit_literal` | 3 |
| `sql_specific_alphanumeric` | 3 |
| `declare` | 2 |
| `family_default` | 2 |
| `usa_1910_current_singleton` | 1 |

## Top 20 patterns substituted

| pattern | count |
|---|---|
| `ga_sessions_*` | 11 |
| `zip_codes_*` | 10 |
| `censustract_*` | 10 |
| `puma_*` | 10 |
| `schooldistrictunified_*` | 10 |
| `state_*` | 10 |
| `congressionaldistrict_*` | 10 |
| `place_*` | 10 |
| `blockgroup_*` | 10 |
| `cbsa_*` | 10 |
| `schooldistrictsecondary_*` | 10 |
| `county_*` | 10 |
| `schooldistrictelementary_*` | 10 |
| `zcta_*` | 10 |
| `events_*` | 9 |
| `zcta5_*` | 8 |
| `icoads_core_*` | 6 |
| `storms_*` | 6 |
| `tlc_fhv_trips_*` | 6 |
| `tlc_yellow_trips_*` | 6 |

## Gold-affecting substitutions (47 suspect samples)

These are the samples where the substitution changes `used_columns` (the gold).

### sid=0  iid=bq011  db=ga4

**Q:** How many distinct pseudo users had positive engagement time  in the 7-day period ending on January 7, 2021 at 23:59:59, but had no positive engagement time in the 2-day period ending on the same date (January 7, 2021 at 23:59:59) ?

**Old gold (2):** `['events_*.user_pseudo_id', 'events_*.event_timestamp']`

**New gold (2):** `['events_20210101.user_pseudo_id', 'events_20210101.event_timestamp']`

**Substitutions:**
  - `events_*` β†’ `events_20210101` (table_suffix_range:20210101 -> pad8)

---

### sid=1  iid=bq010  db=ga360

**Q:** Find the top-selling product among customers who bought 'Youtube Men’s Vintage Henley' in July 2017, excluding itself.

**Old gold (2):** `['ga_sessions_*.fullVisitorId', 'ga_sessions_*.hits']`

**New gold (2):** `['ga_sessions_20170701.fullVisitorId', 'ga_sessions_20170701.hits']`

**Substitutions:**
  - `ga_sessions_*` β†’ `ga_sessions_20170701` (sql_partial_wildcard:201707 -> pad8)

---

### sid=2  iid=bq009  db=ga360

**Q:** Which traffic source has the highest total transaction revenue for the year 2017, and what is the difference in millions (rounded to two decimal places) between the highest and lowest monthly total transaction revenue for that traffic source?

**Old gold (3):** `['ga_sessions_*.date', 'ga_sessions_*.trafficSource', 'ga_sessions_*.totals']`

**New gold (3):** `['ga_sessions_20170101.date', 'ga_sessions_20170101.trafficSource', 'ga_sessions_20170101.totals']`

**Substitutions:**
  - `ga_sessions_*` β†’ `ga_sessions_20170101` (sql_partial_wildcard:2017 -> pad8)

---

### sid=3  iid=bq001  db=ga360

**Q:** For each visitor who made at least one transaction in February 2017, how many days elapsed between the date of their first visit in February and the date of their first transaction in February, and on what type of device did they make that first transaction?

**Old gold (2):** `['ga_sessions_*.fullVisitorId', 'ga_sessions_*.date']`

**New gold (2):** `['ga_sessions_20170201.fullVisitorId', 'ga_sessions_20170201.date']`

**Substitutions:**
  - `ga_sessions_*` β†’ `ga_sessions_20170201` (declare:20170201 -> pad8)

---

### sid=4  iid=bq002  db=ga360

**Q:** During the first half of 2017,  focusing on hits product revenue, which traffic source generated the highest total product revenue, and what were the maximum daily, weekly, and monthly product revenues (in millions) for that top-performing source over this period?

**Old gold (2):** `['ga_sessions_*.date', 'ga_sessions_*.hits']`

**New gold (2):** `['ga_sessions_20170101.date', 'ga_sessions_20170101.hits']`

**Substitutions:**
  - `ga_sessions_*` β†’ `ga_sessions_20170101` (declare:20170101 -> pad8)

---

### sid=5  iid=bq003  db=ga360

**Q:** Between April 1 and July 31 of 2017, using the hits product revenue data along with the totals transactions to classify sessions as purchase (transactions β‰₯ 1 and productRevenue not null) or non-purchase (transactions null and productRevenue null), compare the average pageviews per visitor for each 

**Old gold (2):** `['ga_sessions_*.date', 'ga_sessions_*.fullVisitorId']`

**New gold (2):** `['ga_sessions_20170101.date', 'ga_sessions_20170101.fullVisitorId']`

**Substitutions:**
  - `ga_sessions_*` β†’ `ga_sessions_20170101` (sql_partial_wildcard:2017 -> pad8)

---

### sid=7  iid=bq008  db=ga360

**Q:** In January 2017, among visitors whose campaign name contains 'Data Share' and who accessed any page starting with '/home', which page did they most commonly visit next, and what is the maximum time (in seconds) they spent on the '/home' page before moving on?

**Old gold (7):** `['ga_sessions_*.fullVisitorId', 'ga_sessions_*.visitId', 'ga_sessions_*.trafficSource', 'ga_sessions_*.hits', 'ga_sessions_*.hits', 'ga_sessions_*.visitStartTime', 'ga_sessions_*.hits']`

**New gold (7):** `['ga_sessions_20170101.fullVisitorId', 'ga_sessions_20170101.visitId', 'ga_sessions_20170101.trafficSource', 'ga_sessions_20170101.hits', 'ga_sessions_20170101.hits', 'ga_sessions_20170101.visitStartTime', 'ga_sessions_20170101.hits']`

**Substitutions:**
  - `ga_sessions_*` β†’ `ga_sessions_20170101` (table_suffix_range:20170101 -> pad8)

---

### sid=8  iid=bq269  db=ga360

**Q:** Between June 1, 2017, and July 31, 2017, consider only sessions that have non-null pageviews. Classify each session as β€˜purchase’ if it has at least one transaction, or β€˜non_purchase’ otherwise. For each month, sum each visitor’s total pageviews under each classification, then compute the average pa

**Old gold (4):** `['ga_sessions_*.date', 'ga_sessions_*.fullVisitorId', 'ga_sessions_*.totals', 'ga_sessions_*.totals']`

**New gold (4):** `['ga_sessions_20170601.date', 'ga_sessions_20170601.fullVisitorId', 'ga_sessions_20170601.totals', 'ga_sessions_20170601.totals']`

**Substitutions:**
  - `ga_sessions_*` β†’ `ga_sessions_20170601` (table_suffix_range:20170601 -> pad8)

---

### sid=9  iid=bq268  db=ga360

**Q:** Identify the longest number of days between the first visit and the last recorded event (either the last visit or the first transaction) for a user, where the last recorded event is associated with a mobile device. The last recorded event could either be the last visit or the first transaction, and 

**Old gold (4):** `['ga_sessions_*.fullVisitorId', 'ga_sessions_*.date', 'ga_sessions_*.device', 'ga_sessions_*.hits']`

**New gold (4):** `['ga_sessions_20170701.fullVisitorId', 'ga_sessions_20170701.date', 'ga_sessions_20170701.device', 'ga_sessions_20170701.hits']`

**Substitutions:**
  - `ga_sessions_*` β†’ `ga_sessions_20170701` (family_default:ga_sessions=20170701)

---

### sid=10  iid=bq270  db=ga360

**Q:** What were the monthly add-to-cart and purchase conversion rates, calculated as a percentage of pageviews on product details, from January to March 2017?

**Old gold (3):** `['ga_sessions_*.date', 'ga_sessions_*.hits', 'ga_sessions_*.hits']`

**New gold (3):** `['ga_sessions_20170101.date', 'ga_sessions_20170101.hits', 'ga_sessions_20170101.hits']`

**Substitutions:**
  - `ga_sessions_*` β†’ `ga_sessions_20170101` (sql_partial_wildcard:2017 -> pad8)

---

### sid=11  iid=bq275  db=ga360

**Q:** Which visitor IDs belong to users whose first transaction occurred on a device explicitly labeled as 'mobile' on a later date than their first visit?

**Old gold (4):** `['ga_sessions_*.fullVisitorId', 'ga_sessions_*.date', 'ga_sessions_*.hits', 'ga_sessions_*.device']`

**New gold (4):** `['ga_sessions_20170701.fullVisitorId', 'ga_sessions_20170701.date', 'ga_sessions_20170701.hits', 'ga_sessions_20170701.device']`

**Substitutions:**
  - `ga_sessions_*` β†’ `ga_sessions_20170701` (family_default:ga_sessions=20170701)

---

### sid=12  iid=bq374  db=ga360

**Q:** Calculates the percentage of new users who, between August 1, 2016, and April 30, 2017, both stayed on the site for more than 5 minutes during their initial visit and made a purchase on a subsequent visit at any later time, relative to the total number of new users in the same period.

**Old gold (6):** `['ga_sessions_*.fullVisitorId', 'ga_sessions_*.visitStartTime', 'ga_sessions_*.totals', 'ga_sessions_*.date', 'ga_sessions_*.totals', 'ga_sessions_*.totals']`

**New gold (6):** `['ga_sessions_20160801.fullVisitorId', 'ga_sessions_20160801.visitStartTime', 'ga_sessions_20160801.totals', 'ga_sessions_20160801.date', 'ga_sessions_20160801.totals', 'ga_sessions_20160801.totals']`

**Substitutions:**
  - `ga_sessions_*` β†’ `ga_sessions_20160801` (sql_8digit_literal:20160801)

---

### sid=61  iid=bq235  db=cms_data

**Q:** Can you tell me which healthcare provider incurs the highest combined average costs for both outpatient and inpatient services in 2014?

**Old gold (12):** `['outpatient_charges_*.provider_state', 'outpatient_charges_*.provider_city', 'outpatient_charges_*.provider_id', 'outpatient_charges_*.provider_name', 'outpatient_charges_*.outpatient_services', 'outpatient_charges_*.average_total_payments', 'inpatient_charges_*.provider_state', 'inpatient_charges_*.provider_city', 'inpatient_charges_*.provider_id', 'inpatient_charges_*.provider_name', 'inpatient_charges_*.total_discharges', 'inpatient_charges_*.average_medicare_payments']`

**New gold (12):** `['outpatient_charges_2014.provider_state', 'outpatient_charges_2014.provider_city', 'outpatient_charges_2014.provider_id', 'outpatient_charges_2014.provider_name', 'outpatient_charges_2014.outpatient_services', 'outpatient_charges_2014.average_total_payments', 'inpatient_charges_2014.provider_state', 'inpatient_charges_2014.provider_city', 'inpatient_charges_2014.provider_id', 'inpatient_charges_2014.provider_name', 'inpatient_charges_2014.total_discharges', 'inpatient_charges_2014.average_medicare_payments']`

**Substitutions:**
  - `outpatient_charges_*` β†’ `outpatient_charges_2014` (sql_specific:2014 -> pad4)
  - `inpatient_charges_*` β†’ `inpatient_charges_2014` (sql_specific:2014 -> pad4)

---

### sid=69  iid=bq419  db=noaa_data

**Q:** Which 5 states had the most storm events from 1980 to 1995, considering only the top 1000 states with the highest event counts each year? Please use state abbreviations.

**Old gold (2):** `['storms_*.state', 'storms_*.event_id']`

**New gold (2):** `['storms_1980.state', 'storms_1980.event_id']`

**Substitutions:**
  - `storms_*` β†’ `storms_1980` (sql_specific:1980 -> pad4)

---

### sid=71  iid=sf_bq236  db=NOAA_DATA_PLUS

**Q:** What are the top 5 zip codes of the areas in the United States that have experienced the most hail storm events in the past 10 years? Don't use data from hail reports table.

**Old gold (7):** `['STORMS_*.event_point', 'STORMS_*.event_type', 'STORMS_*.event_id', 'ZIP_CODES.zip_code', 'ZIP_CODES.city', 'ZIP_CODES.state_name', 'ZIP_CODES.zip_code_geom']`

**New gold (7):** `['STORMS_2014.event_point', 'STORMS_2014.event_type', 'STORMS_2014.event_id', 'ZIP_CODES.zip_code', 'ZIP_CODES.city', 'ZIP_CODES.state_name', 'ZIP_CODES.zip_code_geom']`

**Substitutions:**
  - `STORMS_*` β†’ `STORMS_2014` (sql_specific:2014 -> pad4)

---

### sid=75  iid=bq357  db=noaa_data

**Q:** What are the latitude and longitude coordinates and dates between 2005 and 2015 with the top 5 highest daily average wind speeds, excluding records with missing wind speed values? Using data from tables start with prefix "icoads_core".

**Old gold (6):** `['icoads_core_*.year', 'icoads_core_*.month', 'icoads_core_*.day', 'icoads_core_*.latitude', 'icoads_core_*.longitude', 'icoads_core_*.wind_speed']`

**New gold (6):** `['icoads_core_2005.year', 'icoads_core_2005.month', 'icoads_core_2005.day', 'icoads_core_2005.latitude', 'icoads_core_2005.longitude', 'icoads_core_2005.wind_speed']`

**Substitutions:**
  - `icoads_core_*` β†’ `icoads_core_2005` (table_suffix_range:2005 -> pad4)

---

### sid=93  iid=sf_bq429  db=CENSUS_BUREAU_ACS_2

**Q:** Which are the top five states with the greatest average difference in median income between 2015 and 2018 at the ZIP code level, and what is the corresponding average number of vulnerable employees across wholesale trade, natural resources and construction, arts and entertainment, information, and r

**Old gold (26):** `['ZCTA5_2017_5YR.employed_information', 'PLACE_*.geo_id', 'PLACE_*.employed_arts_entertainment_recreation_accommodation_food', 'PLACE_*.employed_wholesale_trade', 'ZIP_CODES_*.employed_information', 'ZCTA5_2018_5YR.geo_id', 'PLACE_*.occupation_natural_resources_construction_maintenance', 'ZCTA5_2015_5YR.geo_id', 'ZCTA5_2017_5YR.employed_arts_entertainment_recreation_accommodation_food', 'ZCTA5_2017_5YR.occupation_natural_resources_construction_maintenance', 'ZCTA5_2017_5YR.employed_retail_trade', 'PLACE_*.employed_information', 'ZCTA5_2018_5YR.median_income', 'ZIP_CODES_*.employed_wholesale_trade', 'ZIP_CODES_*.employed_retail_trade', 'ZIP_CODES_*.median_income', 'ZIP_CODES_*.geo_id', 'ZIP_CODES.state_name', 'PLACE_*.employed_retail_trade', 'PLACE_*.median_income', 'ZIP_CODES_*.occupation_natural_resources_construction_maintenance', 'ZIP_CODES_*.employed_arts_entertainment_recreation_accommodation_food', 'ZIP_CODES.zip_code', 'ZCTA5_2017_5YR.employed_wholesale_trade', 'ZCTA5_2015_5YR.median_income', 'ZCTA5_2017_5YR.geo_id']`

**New gold (26):** `['ZCTA5_2017_5YR.employed_information', 'PLACE_2015.geo_id', 'PLACE_2015.employed_arts_entertainment_recreation_accommodation_food', 'PLACE_2015.employed_wholesale_trade', 'ZIP_CODES_2018_5YR.employed_information', 'ZCTA5_2018_5YR.geo_id', 'PLACE_2015.occupation_natural_resources_construction_maintenance', 'ZCTA5_2015_5YR.geo_id', 'ZCTA5_2017_5YR.employed_arts_entertainment_recreation_accommodation_food', 'ZCTA5_2017_5YR.occupation_natural_resources_construction_maintenance', 'ZCTA5_2017_5YR.employed_retail_trade', 'PLACE_2015.employed_information', 'ZCTA5_2018_5YR.median_income', 'ZIP_CODES_2018_5YR.employed_wholesale_trade', 'ZIP_CODES_2018_5YR.employed_retail_trade', 'ZIP_CODES_2018_5YR.median_income', 'ZIP_CODES_2018_5YR.geo_id', 'ZIP_CODES.state_name', 'PLACE_2015.employed_retail_trade', 'PLACE_2015.median_income', 'ZIP_CODES_2018_5YR.occupation_natural_resources_construction_maintenance', 'ZIP_CODES_2018_5YR.employed_arts_entertainment_recreation_accommodation_food', 'ZIP_CODES.zip_code', 'ZCTA5_2017_5YR.employed_wholesale_trade', 'ZCTA5_2015_5YR.median_income', 'ZCTA5_2017_5YR.geo_id']`

**Substitutions:**
  - `ZIP_CODES_*` β†’ `ZIP_CODES_2018_5YR` (sql_specific_year_alpha:2018_5YR)
  - `PLACE_*` β†’ `PLACE_2015` (q_year_fallback:2015 -> pad4)

---

### sid=100  iid=sf_bq289  db=GEO_OPENSTREETMAP_CENSUS_PLACES

**Q:** Can you find the shortest distance between any two amenities (either a library, place of worship, or community center) located within Philadelphia, analyzed through pennsylvania table and planet features points?

**Old gold (5):** `['PLACES_*.place_name', 'PLACES_*.place_geom', 'PLANET_FEATURES_POINTS.all_tags', 'PLANET_FEATURES_POINTS.geometry', 'PLANET_FEATURES_POINTS.osm_id']`

**New gold (5):** `['PLACES_PENNSYLVANIA.place_name', 'PLACES_PENNSYLVANIA.place_geom', 'PLANET_FEATURES_POINTS.all_tags', 'PLANET_FEATURES_POINTS.geometry', 'PLANET_FEATURES_POINTS.osm_id']`

**Substitutions:**
  - `PLACES_*` β†’ `PLACES_PENNSYLVANIA` (sql_specific_alphanumeric:PENNSYLVANIA)

---

### sid=132  iid=bq021  db=new_york

**Q:** For the top 20 Citi Bike routes in 2016, which route is faster than yellow taxis and among those, which one has the longest average bike duration? Please provide the start station name of this route. The coordinates are rounded to three decimals.

**Old gold (14):** `['citibike_trips.start_station_name', 'citibike_trips.end_station_name', 'citibike_trips.start_station_latitude', 'citibike_trips.start_station_longitude', 'citibike_trips.end_station_latitude', 'citibike_trips.end_station_longitude', 'citibike_trips.tripduration', 'citibike_trips.starttime', 'tlc_yellow_trips_*.pickup_latitude', 'tlc_yellow_trips_*.pickup_longitude', 'tlc_yellow_trips_*.dropoff_latitude', 'tlc_yellow_trips_*.dropoff_longitude', 'tlc_yellow_trips_*.dropoff_datetime', 'tlc_yellow_trips_*.pickup_datetime']`

**New gold (14):** `['citibike_trips.start_station_name', 'citibike_trips.end_station_name', 'citibike_trips.start_station_latitude', 'citibike_trips.start_station_longitude', 'citibike_trips.end_station_latitude', 'citibike_trips.end_station_longitude', 'citibike_trips.tripduration', 'citibike_trips.starttime', 'tlc_yellow_trips_2016.pickup_latitude', 'tlc_yellow_trips_2016.pickup_longitude', 'tlc_yellow_trips_2016.dropoff_latitude', 'tlc_yellow_trips_2016.dropoff_longitude', 'tlc_yellow_trips_2016.dropoff_datetime', 'tlc_yellow_trips_2016.pickup_datetime']`

**Substitutions:**
  - `tlc_yellow_trips_*` β†’ `tlc_yellow_trips_2016` (sql_specific:2016 -> pad4)

---

### sid=134  iid=bq185  db=new_york_plus

**Q:** What is the average trip duration in minutes for all valid Yellow taxi trips that took place between February 1, 2016, and February 7, 2016 (inclusive), with a positive trip duration, more than three passengers, and a trip distance of at least ten miles, where both the pickup and dropoff locations a

**Old gold (8):** `['tlc_yellow_trips_*.dropoff_datetime', 'tlc_yellow_trips_*.pickup_datetime', 'tlc_yellow_trips_*.passenger_count', 'tlc_yellow_trips_*.trip_distance', 'tlc_yellow_trips_*.pickup_location_id', 'tlc_yellow_trips_*.dropoff_location_id', 'taxi_zone_geom.zone_id', 'taxi_zone_geom.borough']`

**New gold (8):** `['tlc_yellow_trips_2016.dropoff_datetime', 'tlc_yellow_trips_2016.pickup_datetime', 'tlc_yellow_trips_2016.passenger_count', 'tlc_yellow_trips_2016.trip_distance', 'tlc_yellow_trips_2016.pickup_location_id', 'tlc_yellow_trips_2016.dropoff_location_id', 'taxi_zone_geom.zone_id', 'taxi_zone_geom.borough']`

**Substitutions:**
  - `tlc_yellow_trips_*` β†’ `tlc_yellow_trips_2016` (sql_specific:2016 -> pad4)

---

### sid=136  iid=bq098  db=new_york_plus

**Q:** For NYC yellow taxi trips where both the pickup and dropoff occurred between January 1 and 7, 2016, inclusive, calculate the percentage of trips with no tip in each pickup borough, ensuring that only trips where the dropoff occurs after the pickup are included, the passenger count is greater than ze

**Old gold (12):** `['tlc_yellow_trips_*.dropoff_datetime', 'tlc_yellow_trips_*.pickup_datetime', 'tlc_yellow_trips_*.total_amount', 'tlc_yellow_trips_*.tip_amount', 'tlc_yellow_trips_*.passenger_count', 'tlc_yellow_trips_*.trip_distance', 'tlc_yellow_trips_*.tolls_amount', 'tlc_yellow_trips_*.mta_tax', 'tlc_yellow_trips_*.fare_amount', 'tlc_yellow_trips_*.pickup_location_id', 'taxi_zone_geom.zone_id', 'taxi_zone_geom.borough']`

**New gold (12):** `['tlc_yellow_trips_2016.dropoff_datetime', 'tlc_yellow_trips_2016.pickup_datetime', 'tlc_yellow_trips_2016.total_amount', 'tlc_yellow_trips_2016.tip_amount', 'tlc_yellow_trips_2016.passenger_count', 'tlc_yellow_trips_2016.trip_distance', 'tlc_yellow_trips_2016.tolls_amount', 'tlc_yellow_trips_2016.mta_tax', 'tlc_yellow_trips_2016.fare_amount', 'tlc_yellow_trips_2016.pickup_location_id', 'taxi_zone_geom.zone_id', 'taxi_zone_geom.borough']`

**Substitutions:**
  - `tlc_yellow_trips_*` β†’ `tlc_yellow_trips_2016` (sql_specific:2016 -> pad4)

---

### sid=137  iid=bq039  db=new_york_plus

**Q:** Find the top 10 taxi trips in New York City between July 1 and July 7, 2016 (ensuring both pickup and dropoff times fall within these dates) where the passenger count is greater than five, the trip distance is at least ten miles, and there are no negative fare-related amounts (including tip, tolls, 

**Old gold (13):** `['tlc_yellow_trips_*.pickup_datetime', 'tlc_yellow_trips_*.dropoff_datetime', 'tlc_yellow_trips_*.trip_distance', 'tlc_yellow_trips_*.tip_amount', 'tlc_yellow_trips_*.total_amount', 'tlc_yellow_trips_*.pickup_location_id', 'tlc_yellow_trips_*.dropoff_location_id', 'tlc_yellow_trips_*.passenger_count', 'tlc_yellow_trips_*.tolls_amount', 'tlc_yellow_trips_*.mta_tax', 'tlc_yellow_trips_*.fare_amount', 'taxi_zone_geom.zone_name', 'taxi_zone_geom.zone_id']`

**New gold (13):** `['tlc_yellow_trips_2016.pickup_datetime', 'tlc_yellow_trips_2016.dropoff_datetime', 'tlc_yellow_trips_2016.trip_distance', 'tlc_yellow_trips_2016.tip_amount', 'tlc_yellow_trips_2016.total_amount', 'tlc_yellow_trips_2016.pickup_location_id', 'tlc_yellow_trips_2016.dropoff_location_id', 'tlc_yellow_trips_2016.passenger_count', 'tlc_yellow_trips_2016.tolls_amount', 'tlc_yellow_trips_2016.mta_tax', 'tlc_yellow_trips_2016.fare_amount', 'taxi_zone_geom.zone_name', 'taxi_zone_geom.zone_id']`

**Substitutions:**
  - `tlc_yellow_trips_*` β†’ `tlc_yellow_trips_2016` (sql_specific:2016 -> pad4)

---

### sid=175  iid=bq087  db=covid19_symptom_search

**Q:** Please calculate the overall percentage change in the average weekly search frequency for the symptom 'Anosmia' across the five New York City countiesβ€”Bronx County, Queens County, Kings County, New York County, and Richmond Countyβ€”by comparing the combined data from January 1, 2019, through December

**Old gold (4):** `['symptom_search_*.symptom_anosmia', 'symptom_search_*.sub_region_1', 'symptom_search_*.sub_region_2', 'symptom_search_*.date']`

**New gold (4):** `['symptom_search_2019.symptom_anosmia', 'symptom_search_2019.sub_region_1', 'symptom_search_2019.sub_region_2', 'symptom_search_2019.date']`

**Substitutions:**
  - `symptom_search_*` β†’ `symptom_search_2019` (q_year_fallback:2019 -> pad4)

---

### sid=176  iid=bq088  db=covid19_symptom_search

**Q:** Please calculate the average levels of anxiety and depression symptoms from the weekly country data for the United States during the periods from January 1, 2019, to January 1, 2020, and from January 1, 2020, to January 1, 2021. Then, compute the percentage increase in these average symptom levels f

**Old gold (4):** `['symptom_search_*.symptom_anxiety', 'symptom_search_*.symptom_depression', 'symptom_search_*.country_region_code', 'symptom_search_*.date']`

**New gold (4):** `['symptom_search_2019.symptom_anxiety', 'symptom_search_2019.symptom_depression', 'symptom_search_2019.country_region_code', 'symptom_search_2019.date']`

**Substitutions:**
  - `symptom_search_*` β†’ `symptom_search_2019` (q_year_fallback:2019 -> pad4)

---

### sid=177  iid=bq089  db=covid19_usa

**Q:** Given the latest population estimates from the 2018 five-year American Community Survey, what is the number of vaccine sites per 1000 people for counties in California?

**Old gold (12):** `['state_*.geo_id', 'facility_boundary_us_all.facility_place_id', 'place_*.total_pop', 'facility_boundary_us_all.facility_sub_region_2_code', 'place_*.geo_id', 'censustract_*.total_pop', 'facility_boundary_us_all.facility_sub_region_2', 'congressionaldistrict_*.total_pop', 'state_*.total_pop', 'congressionaldistrict_*.geo_id', 'facility_boundary_us_all.facility_sub_region_1', 'censustract_*.geo_id']`

**New gold (12):** `['state_2018.geo_id', 'facility_boundary_us_all.facility_place_id', 'place_2018.total_pop', 'facility_boundary_us_all.facility_sub_region_2_code', 'place_2018.geo_id', 'censustract_2018.total_pop', 'facility_boundary_us_all.facility_sub_region_2', 'congressionaldistrict_2018.total_pop', 'state_2018.total_pop', 'congressionaldistrict_2018.geo_id', 'facility_boundary_us_all.facility_sub_region_1', 'censustract_2018.geo_id']`

**Substitutions:**
  - `censustract_*` β†’ `censustract_2018` (q_year_fallback:2018 -> pad4)
  - `state_*` β†’ `state_2018` (q_year_fallback:2018 -> pad4)
  - `congressionaldistrict_*` β†’ `congressionaldistrict_2018` (q_year_fallback:2018 -> pad4)
  - `place_*` β†’ `place_2018` (q_year_fallback:2018 -> pad4)

---

### sid=178  iid=bq407  db=covid19_usa

**Q:** Find the top three counties with populations over 50,000, using the 2020 5-year census data, that had the highest COVID-19 case fatality rates on August 27, 2020. For these counties, provide the name, state, median age, total population, number of confirmed COVID-19 cases per 100,000 people, number 

**Old gold (30):** `['state_*.geo_id', 'place_*.geo_id', 'county_*.geo_id', 'schooldistrictelementary_*.geo_id', 'congressionaldistrict_*.median_age', 'schooldistrictsecondary_*.median_age', 'state_*.total_pop', 'summary.state', 'place_*.total_pop', 'summary.county_name', 'summary.date', 'schooldistrictelementary_*.median_age', 'congressionaldistrict_*.total_pop', 'congressionaldistrict_*.geo_id', 'schooldistrictsecondary_*.total_pop', 'summary.deaths', 'puma_*.total_pop', 'puma_*.median_age', 'summary.confirmed_cases', 'state_*.median_age', 'schooldistrictsecondary_*.geo_id', 'summary.county_fips_code', 'county_*.geo_id', 'schooldistrictelementary_*.total_pop', 'county_*.total_pop', 'puma_*.geo_id', 'county_*.median_age', 'county_*.median_age', 'place_*.median_age', 'county_*.total_pop']`

**New gold (30):** `['state_2020.geo_id', 'place_2020.geo_id', 'county_2020_5yr.geo_id', 'schooldistrictelementary_2020.geo_id', 'congressionaldistrict_2020.median_age', 'schooldistrictsecondary_2020.median_age', 'state_2020.total_pop', 'summary.state', 'place_2020.total_pop', 'summary.county_name', 'summary.date', 'schooldistrictelementary_2020.median_age', 'congressionaldistrict_2020.total_pop', 'congressionaldistrict_2020.geo_id', 'schooldistrictsecondary_2020.total_pop', 'summary.deaths', 'puma_2020.total_pop', 'puma_2020.median_age', 'summary.confirmed_cases', 'state_2020.median_age', 'schooldistrictsecondary_2020.geo_id', 'summary.county_fips_code', 'county_2020_5yr.geo_id', 'schooldistrictelementary_2020.total_pop', 'county_2020_5yr.total_pop', 'puma_2020.geo_id', 'county_2020_5yr.median_age', 'county_2020_5yr.median_age', 'place_2020.median_age', 'county_2020_5yr.total_pop']`

**Substitutions:**
  - `puma_*` β†’ `puma_2020` (q_year_fallback:2020 -> pad4)
  - `state_*` β†’ `state_2020` (q_year_fallback:2020 -> pad4)
  - `congressionaldistrict_*` β†’ `congressionaldistrict_2020` (q_year_fallback:2020 -> pad4)
  - `place_*` β†’ `place_2020` (q_year_fallback:2020 -> pad4)
  - `schooldistrictsecondary_*` β†’ `schooldistrictsecondary_2020` (q_year_fallback:2020 -> pad4)
  - `county_*` β†’ `county_2020_5yr` (sql_specific_year_alpha:2020_5yr)
  - `schooldistrictelementary_*` β†’ `schooldistrictelementary_2020` (q_year_fallback:2020 -> pad4)

---

### sid=182  iid=bq061  db=census_bureau_acs_1

**Q:** Which census tract has witnessed the largest increase in median income between 2015 and 2018 in California? Tell me the tract code.

**Old gold (6):** `['state_*.geo_id', 'census_tracts_*.geo_id', 'censustract_*.median_income', 'censustract_*.geo_id', 'state_*.median_income', 'census_tracts_*.tract_ce']`

**New gold (6):** `['state_2015.geo_id', 'census_tracts_2015.geo_id', 'censustract_2018_5yr.median_income', 'censustract_2018_5yr.geo_id', 'state_2015.median_income', 'census_tracts_2015.tract_ce']`

**Substitutions:**
  - `censustract_*` β†’ `censustract_2018_5yr` (sql_specific_year_alpha:2018_5yr)
  - `state_*` β†’ `state_2015` (q_year_fallback:2015 -> pad4)
  - `census_tracts_*` β†’ `census_tracts_2015` (q_year_fallback:2015 -> pad4)

---

### sid=183  iid=bq064  db=census_bureau_acs_1

**Q:** Using the 2017 U.S. Census Tract data from the BigQuery public datasets, you need to proportionally allocate each tract's population and income to the zip codes based on the overlapping area between their geographic boundaries. Then, filter the results to include only those zip codes located within 

**Old gold (18):** `['state_*.geo_id', 'census_tracts_*.geo_id', 'us_census_tracts_national.functional_status', 'us_census_tracts_national.tract_ce', 'state_*.income_per_capita', 'us_census_tracts_national.geo_id', 'census_tracts_*.functional_status', 'censustract_*.total_pop', 'censustract_*.income_per_capita', 'state_*.total_pop', 'zip_codes.zip_code', 'zip_codes.zip_code_geom', 'zip_codes.functional_status', 'census_tracts_*.tract_geom', 'zip_codes.state_code', 'censustract_*.geo_id', 'us_census_tracts_national.tract_geom', 'census_tracts_*.tract_ce']`

**New gold (18):** `['state_2017.geo_id', 'census_tracts_2017.geo_id', 'us_census_tracts_national.functional_status', 'us_census_tracts_national.tract_ce', 'state_2017.income_per_capita', 'us_census_tracts_national.geo_id', 'census_tracts_2017.functional_status', 'censustract_2017_5yr.total_pop', 'censustract_2017_5yr.income_per_capita', 'state_2017.total_pop', 'zip_codes.zip_code', 'zip_codes.zip_code_geom', 'zip_codes.functional_status', 'census_tracts_2017.tract_geom', 'zip_codes.state_code', 'censustract_2017_5yr.geo_id', 'us_census_tracts_national.tract_geom', 'census_tracts_2017.tract_ce']`

**Substitutions:**
  - `censustract_*` β†’ `censustract_2017_5yr` (sql_specific_year_alpha:2017_5yr)
  - `state_*` β†’ `state_2017` (q_year_fallback:2017 -> pad4)
  - `census_tracts_*` β†’ `census_tracts_2017` (q_year_fallback:2017 -> pad4)

---

### sid=194  iid=bq406  db=google_dei

**Q:** Please calculate the growth rates for Asians, Black people, Latinx people, Native Americans, White people, US women, US men, global women, and global men from 2014 to 2024 concerning the overall workforce.

**Old gold (11):** `['dar_non_intersectional_*.race_hispanic_latinx', 'dar_intersectional_*.race_white', 'dar_region_non_intersectional_representation.race_asian', 'dar_region_non_intersectional_representation.report_year', 'dar_non_intersectional_*.gender_us_men', 'dar_non_intersectional_*.gender_global_men', 'dar_non_intersectional_*.gender_global_women', 'dar_intersectional_*.race_black', 'dar_non_intersectional_*.gender_us_women', 'dar_intersectional_*.race_native_american', 'dar_region_non_intersectional_representation.workforce']`

**New gold (11):** `['dar_non_intersectional_2014.race_hispanic_latinx', 'dar_intersectional_2014.race_white', 'dar_region_non_intersectional_representation.race_asian', 'dar_region_non_intersectional_representation.report_year', 'dar_non_intersectional_2014.gender_us_men', 'dar_non_intersectional_2014.gender_global_men', 'dar_non_intersectional_2014.gender_global_women', 'dar_intersectional_2014.race_black', 'dar_non_intersectional_2014.gender_us_women', 'dar_intersectional_2014.race_native_american', 'dar_region_non_intersectional_representation.workforce']`

**Substitutions:**
  - `dar_intersectional_*` β†’ `dar_intersectional_2014` (q_year_fallback:2014 -> pad4)
  - `dar_non_intersectional_*` β†’ `dar_non_intersectional_2014` (q_year_fallback:2014 -> pad4)

---

### sid=228  iid=bq105  db=nhtsa_traffic_fatalities_plus

**Q:** According to the 2015 and 2016 accident and driver distraction, and excluding cases where the driver’s distraction status is recorded as 'Not Distracted,' 'Unknown if Distracted,' or 'Not Reported,' how many traffic accidents per 100,000 people were caused by driver distraction in each U.S. state fo

**Old gold (12):** `[' accident_2015.consecutive_number', ' accident_2015.state_name', ' distract_2015.consecutive_number', ' distract_2015.driver_distracted_by_name', 'population_by_zip_*.population', 'population_by_zip_*.zipcode', 'zipcode_area.zipcode', 'zipcode_area.state_name', ' accident_2016.consecutive_number', ' accident_2016.state_name', ' distract_2016.consecutive_number', ' distract_2016.driver_distracted_by_name']`

**New gold (12):** `[' accident_2015.consecutive_number', ' accident_2015.state_name', ' distract_2015.consecutive_number', ' distract_2015.driver_distracted_by_name', 'population_by_zip_2010.population', 'population_by_zip_2010.zipcode', 'zipcode_area.zipcode', 'zipcode_area.state_name', ' accident_2016.consecutive_number', ' accident_2016.state_name', ' distract_2016.consecutive_number', ' distract_2016.driver_distracted_by_name']`

**Substitutions:**
  - `population_by_zip_*` β†’ `population_by_zip_2010` (sql_specific:2010 -> pad4)

---

### sid=237  iid=bq352  db=sdoh

**Q:** Please list the average number of prenatal weeks in 2018 for counties in Wisconsin where more than 5% of the employed population had commutes of 45-59 minutes in 2017.

**Old gold (25):** `['state_*.geo_id', 'place_*.geo_id', 'cbsa_*.geo_id', 'state_*.employed_pop', 'county_*.geo_id', 'county_*.commute_45_59_mins', 'schooldistrictelementary_*.geo_id', 'place_*.employed_pop', 'place_*.commute_45_59_mins', 'county_*.County_of_Residence', 'cbsa_*.employed_pop', 'county_*.employed_pop', 'censustract_*.commute_45_59_mins', 'state_*.commute_45_59_mins', 'schooldistrictelementary_*.commute_45_59_mins', 'censustract_*.geo_id', 'cbsa_*.commute_45_59_mins', 'schooldistrictelementary_*.employed_pop', 'county_*.County_of_Residence_FIPS', 'censustract_*.employed_pop', 'county_*.Year', 'puma_*.commute_45_59_mins', 'county_*.Ave_Number_of_Prenatal_Wks', 'puma_*.geo_id', 'puma_*.employed_pop']`

**New gold (25):** `['state_2017.geo_id', 'place_2017.geo_id', 'cbsa_2017.geo_id', 'state_2017.employed_pop', 'county_2017_5yr.geo_id', 'county_2017_5yr.commute_45_59_mins', 'schooldistrictelementary_2017.geo_id', 'place_2017.employed_pop', 'place_2017.commute_45_59_mins', 'county_2017_5yr.County_of_Residence', 'cbsa_2017.employed_pop', 'county_2017_5yr.employed_pop', 'censustract_2017.commute_45_59_mins', 'state_2017.commute_45_59_mins', 'schooldistrictelementary_2017.commute_45_59_mins', 'censustract_2017.geo_id', 'cbsa_2017.commute_45_59_mins', 'schooldistrictelementary_2017.employed_pop', 'county_2017_5yr.County_of_Residence_FIPS', 'censustract_2017.employed_pop', 'county_2017_5yr.Year', 'puma_2017.commute_45_59_mins', 'county_2017_5yr.Ave_Number_of_Prenatal_Wks', 'puma_2017.geo_id', 'puma_2017.employed_pop']`

**Substitutions:**
  - `puma_*` β†’ `puma_2017` (q_year_fallback:2017 -> pad4)
  - `censustract_*` β†’ `censustract_2017` (q_year_fallback:2017 -> pad4)
  - `state_*` β†’ `state_2017` (q_year_fallback:2017 -> pad4)
  - `place_*` β†’ `place_2017` (q_year_fallback:2017 -> pad4)
  - `cbsa_*` β†’ `cbsa_2017` (q_year_fallback:2017 -> pad4)
  - `county_*` β†’ `county_2017_5yr` (sql_specific_year_alpha:2017_5yr)
  - `schooldistrictelementary_*` β†’ `schooldistrictelementary_2017` (q_year_fallback:2017 -> pad4)

---

### sid=238  iid=bq074  db=sdoh

**Q:** Count the number of counties that experienced an increase in unemployment from 2015 to 2018, using 5-year ACS data, and a decrease in dual-eligible enrollee counts between December 1, 2015, and December 1, 2018.

**Old gold (16):** `['dual_eligible_enrollment_by_county_and_program.County_Name', 'state_*.geo_id', 'county_*.unemployed_pop', 'dual_eligible_enrollment_by_county_and_program.Public_Total', 'place_*.geo_id', 'congressionaldistrict_*.unemployed_pop', 'schooldistrictsecondary_*.unemployed_pop', 'puma_*.geo_id', 'dual_eligible_enrollment_by_county_and_program.FIPS', 'county_*.geo_id', 'state_*.unemployed_pop', 'place_*.unemployed_pop', 'dual_eligible_enrollment_by_county_and_program.Date', 'congressionaldistrict_*.geo_id', 'puma_*.unemployed_pop', 'schooldistrictsecondary_*.geo_id']`

**New gold (16):** `['dual_eligible_enrollment_by_county_and_program.County_Name', 'state_2015.geo_id', 'county_2018_5yr.unemployed_pop', 'dual_eligible_enrollment_by_county_and_program.Public_Total', 'place_2015.geo_id', 'congressionaldistrict_2015.unemployed_pop', 'schooldistrictsecondary_2015.unemployed_pop', 'puma_2015.geo_id', 'dual_eligible_enrollment_by_county_and_program.FIPS', 'county_2018_5yr.geo_id', 'state_2015.unemployed_pop', 'place_2015.unemployed_pop', 'dual_eligible_enrollment_by_county_and_program.Date', 'congressionaldistrict_2015.geo_id', 'puma_2015.unemployed_pop', 'schooldistrictsecondary_2015.geo_id']`

**Substitutions:**
  - `puma_*` β†’ `puma_2015` (q_year_fallback:2015 -> pad4)
  - `state_*` β†’ `state_2015` (q_year_fallback:2015 -> pad4)
  - `congressionaldistrict_*` β†’ `congressionaldistrict_2015` (q_year_fallback:2015 -> pad4)
  - `place_*` β†’ `place_2015` (q_year_fallback:2015 -> pad4)
  - `schooldistrictsecondary_*` β†’ `schooldistrictsecondary_2015` (q_year_fallback:2015 -> pad4)
  - `county_*` β†’ `county_2018_5yr` (sql_specific_year_alpha:2018_5yr)

---

### sid=239  iid=bq066  db=sdoh

**Q:** Could you assess the relationship between the poverty rates from the previous year's census data and the percentage of births without maternal morbidity for the years 2016 to 2018? Use only data for births where no maternal morbidity was reported and for each year, use the 5-year census data from th

**Old gold (28):** `['state_*.geo_id', 'county_*.geo_id', 'county_*.Maternal_Morbidity_YN', 'cbsa_*.geo_id', 'place_*.geo_id', 'county_*.geo_id', 'place_*.pop_determined_poverty_status', 'cbsa_*.poverty', 'county_*.pop_determined_poverty_status', 'cbsa_*.pop_determined_poverty_status', 'county_*.geo_id', 'county_*.poverty', 'county_*.pop_determined_poverty_status', 'county_*.geo_id', 'puma_*.poverty', 'county_*.pop_determined_poverty_status', 'state_*.poverty', 'puma_*.pop_determined_poverty_status', 'county_*.Births', 'county_*.County_of_Residence_FIPS', 'county_*.poverty', 'county_*.Year', 'county_*.poverty', 'place_*.poverty', 'puma_*.geo_id', 'state_*.pop_determined_poverty_status', 'county_*.pop_determined_poverty_status', 'county_*.poverty']`

**New gold (28):** `['state_2016.geo_id', 'county_2015_5yr.geo_id', 'county_2015_5yr.Maternal_Morbidity_YN', 'cbsa_2016.geo_id', 'place_2016.geo_id', 'county_2015_5yr.geo_id', 'place_2016.pop_determined_poverty_status', 'cbsa_2016.poverty', 'county_2015_5yr.pop_determined_poverty_status', 'cbsa_2016.pop_determined_poverty_status', 'county_2015_5yr.geo_id', 'county_2015_5yr.poverty', 'county_2015_5yr.pop_determined_poverty_status', 'county_2015_5yr.geo_id', 'puma_2016.poverty', 'county_2015_5yr.pop_determined_poverty_status', 'state_2016.poverty', 'puma_2016.pop_determined_poverty_status', 'county_2015_5yr.Births', 'county_2015_5yr.County_of_Residence_FIPS', 'county_2015_5yr.poverty', 'county_2015_5yr.Year', 'county_2015_5yr.poverty', 'place_2016.poverty', 'puma_2016.geo_id', 'state_2016.pop_determined_poverty_status', 'county_2015_5yr.pop_determined_poverty_status', 'county_2015_5yr.poverty']`

**Substitutions:**
  - `puma_*` β†’ `puma_2016` (q_year_fallback:2016 -> pad4)
  - `state_*` β†’ `state_2016` (q_year_fallback:2016 -> pad4)
  - `place_*` β†’ `place_2016` (q_year_fallback:2016 -> pad4)
  - `cbsa_*` β†’ `cbsa_2016` (q_year_fallback:2016 -> pad4)
  - `county_*` β†’ `county_2015_5yr` (sql_specific_year_alpha:2015_5yr)

---

### sid=267  iid=bq204  db=eclipse_megamovie

**Q:** Find the user with the highest total clicks across all records from all available photo collections.

**Old gold (1):** `['photos_v_0_*.user']`

**New gold (1):** `['photos_v_0_1.user']`

**Substitutions:**
  - `photos_v_0_*` β†’ `photos_v_0_1` (sql_specific:1 -> pad4)

---

### sid=279  iid=bq049  db=iowa_liquor_sales_plus

**Q:** Please show the monthly per capita Bourbon Whiskey sales during 2022 in Dubuque County for the zip code that ranks third in total Bourbon Whiskey sales, using only the population aged 21 and older.

**Old gold (8):** `['sales.category_name', 'sales.date', 'sales.zip_code', 'sales.sale_dollars', 'sales.county', 'population_by_zip_*.zipcode', 'population_by_zip_*.population', 'population_by_zip_*.minimum_age']`

**New gold (8):** `['sales.category_name', 'sales.date', 'sales.zip_code', 'sales.sale_dollars', 'sales.county', 'population_by_zip_2022.zipcode', 'population_by_zip_2022.population', 'population_by_zip_2022.minimum_age']`

**Substitutions:**
  - `population_by_zip_*` β†’ `population_by_zip_2022` (q_year_fallback:2022 -> pad4)

---

### sid=281  iid=bq286  db=usa_names

**Q:** Can you tell me the name of the most popular female baby in Wyoming for the year 2021, based on the proportion of female babies given that name compared to the total number of female babies given the same name across all states?

**Old gold (5):** `['usa_1910_*.name', 'usa_1910_*.gender', 'usa_1910_*.year', 'usa_1910_*.number', 'usa_1910_*.state']`

**New gold (5):** `['usa_1910_current.name', 'usa_1910_current.gender', 'usa_1910_current.year', 'usa_1910_current.number', 'usa_1910_current.state']`

**Substitutions:**
  - `usa_1910_*` β†’ `usa_1910_current` (usa_1910_current_singleton)

---

### sid=284  iid=bq143  db=CPTAC_PDC

**Q:** Use CPTAC proteomics and RNAseq data for Clear Cell Renal Cell Carcinoma to select 'Primary Tumor' and 'Solid Tissue Normal' samples. Join the datasets on sample submitter IDs and gene symbols. Calculate the correlation between protein abundance (log2 ratio) and gene expression levels (log-transform

**Old gold (11):** `['quant_proteome_*.case_id', 'quant_proteome_*.aliquot_id', 'quant_proteome_*.gene_symbol', 'quant_proteome_*.protein_abundance_log2ratio', 'aliquot_to_case_mapping_current.sample_submitter_id', 'aliquot_to_case_mapping_current.sample_type', 'aliquot_to_case_mapping_current.case_id', 'aliquot_to_case_mapping_current.aliquot_id', 'RNAseq_hg38_gdc_current.gene_name', 'RNAseq_hg38_gdc_current.fpkm_unstranded', 'RNAseq_hg38_gdc_current.sample_barcode']`

**New gold (11):** `['quant_proteome_CPTAC_CCRCC_discovery_study_pdc_current.case_id', 'quant_proteome_CPTAC_CCRCC_discovery_study_pdc_current.aliquot_id', 'quant_proteome_CPTAC_CCRCC_discovery_study_pdc_current.gene_symbol', 'quant_proteome_CPTAC_CCRCC_discovery_study_pdc_current.protein_abundance_log2ratio', 'aliquot_to_case_mapping_current.sample_submitter_id', 'aliquot_to_case_mapping_current.sample_type', 'aliquot_to_case_mapping_current.case_id', 'aliquot_to_case_mapping_current.aliquot_id', 'RNAseq_hg38_gdc_current.gene_name', 'RNAseq_hg38_gdc_current.fpkm_unstranded', 'RNAseq_hg38_gdc_current.sample_barcode']`

**Substitutions:**
  - `quant_proteome_*` β†’ `quant_proteome_CPTAC_CCRCC_discovery_study_pdc_current` (sql_specific_alphanumeric:CPTAC_CCRCC_discovery_study_pdc_current)

---

### sid=318  iid=bq006  db=austin

**Q:** What is the date with the second highest Z-score for daily counts of 'PUBLIC INTOXICATION' incidents in Austin for the year 2016? List the date in the format of '2016-xx-xx'.

**Old gold (2):** `['incidents_*.date', 'incidents_*.descript']`

**New gold (2):** `['incidents_2016.date', 'incidents_2016.descript']`

**Substitutions:**
  - `incidents_*` β†’ `incidents_2016` (sql_specific:2016 -> pad4)

---

### sid=323  iid=bq430  db=ebi_chembl

**Q:** Find pairs of different molecules tested in the same assay and standard type, where both have 10–15 heavy atoms, fewer than 5 activities in that assay, fewer than 2 duplicate activities, non-null standard values, and pChEMBL values over 10. For each pair, report the maximum heavy atom count, the lat

**Old gold (13):** `['activities_*.assay_id', 'activities_*.standard_type', 'activities_*.activity_id', 'activities_*.standard_value', 'activities_*.standard_relation', 'activities_*.pchembl_value', 'activities_*.molregno', 'compound_structures_25.canonical_smiles', 'compound_properties_26.heavy_atoms', 'docs_*.doc_id', 'docs_*.year', 'docs_*.journal', 'docs_*.first_page']`

**New gold (13):** `['activities_29.assay_id', 'activities_29.standard_type', 'activities_29.activity_id', 'activities_29.standard_value', 'activities_29.standard_relation', 'activities_29.pchembl_value', 'activities_29.molregno', 'compound_structures_25.canonical_smiles', 'compound_properties_26.heavy_atoms', 'docs_29.doc_id', 'docs_29.year', 'docs_29.journal', 'docs_29.first_page']`

**Substitutions:**
  - `activities_*` β†’ `activities_29` (sql_specific:29 -> pad4)
  - `docs_*` β†’ `docs_29` (sql_specific:29 -> pad4)

---

### sid=369  iid=ga001  db=ga4

**Q:** I want to know the preferences of customers who purchased the Google Navy Speckled Tee in December 2020. What other product was purchased with the highest total quantity alongside this item?

**Old gold (5):** `['events_*.user_pseudo_id', 'events_*.event_name', 'events_*.items', 'events_*.items', 'events_*.items']`

**New gold (5):** `['events_20201201.user_pseudo_id', 'events_20201201.event_name', 'events_20201201.items', 'events_20201201.items', 'events_20201201.items']`

**Substitutions:**
  - `events_*` β†’ `events_20201201` (table_suffix_range:20201201 -> pad8)

---

### sid=370  iid=ga002  db=ga4

**Q:** Tell me the most purchased other products and their quantities by customers who bought the Google Red Speckled Tee each month for the three months starting from November 2020.

**Old gold (3):** `['events_*.user_pseudo_id', 'events_*.items', 'events_*.event_name']`

**New gold (3):** `['events_20201101.user_pseudo_id', 'events_20201101.items', 'events_20201101.event_name']`

**Substitutions:**
  - `events_*` β†’ `events_20201101` (sql_8digit_literal:20201101)

---

### sid=372  iid=ga004  db=ga4

**Q:** Can you figure out the average difference in pageviews between users who bought something and those who didn’t in December 2020? Just label anyone who was involved in purchase events as a purchaser.

**Old gold (2):** `['events_*.user_pseudo_id', 'events_*.event_name']`

**New gold (2):** `['events_20201201.user_pseudo_id', 'events_20201201.event_name']`

**Substitutions:**
  - `events_*` β†’ `events_20201201` (table_suffix_range:20201201 -> pad8)

---

### sid=373  iid=ga008  db=ga4

**Q:** Could you provide the total number of page views for each day in November 2020 as well as the average number of page views per user on those days, restricted to users who made at least one purchase in November 2020?

**Old gold (3):** `['events_*.user_pseudo_id', 'events_*.event_date', 'events_*.event_name']`

**New gold (3):** `['events_20201101.user_pseudo_id', 'events_20201101.event_date', 'events_20201101.event_name']`

**Substitutions:**
  - `events_*` β†’ `events_20201101` (table_suffix_range:20201101 -> pad8)

---

### sid=374  iid=ga017  db=ga4

**Q:** How many distinct users viewed the most frequently visited page during January 2021?

**Old gold (4):** `['events_*.event_params', 'events_*.event_name', 'events_*.event_timestamp', 'events_*.user_pseudo_id']`

**New gold (4):** `['events_20210101.event_params', 'events_20210101.event_name', 'events_20210101.event_timestamp', 'events_20210101.user_pseudo_id']`

**Substitutions:**
  - `events_*` β†’ `events_20210101` (table_suffix_range:20210101 -> pad8)

---

### sid=377  iid=ga018  db=ga4

**Q:** On January 2nd, 2021, I want to determine the percentage of times users transition from a product list page (PLP) view to a product detail page (PDP) view within the same session, using only page_view events. Could you calculate how many PLP views eventually led to a PDP view in the same session on 

**Old gold (10):** `['events_*.event_name', 'events_*.event_date', 'events_*.event_timestamp', 'events_*.user_pseudo_id', 'events_*.user_id', 'events_*.device', 'events_*.geo', 'events_*.traffic_source', 'events_*.event_params', 'events_*.user_properties']`

**New gold (10):** `['events_20210102.event_name', 'events_20210102.event_date', 'events_20210102.event_timestamp', 'events_20210102.user_pseudo_id', 'events_20210102.user_id', 'events_20210102.device', 'events_20210102.geo', 'events_20210102.traffic_source', 'events_20210102.event_params', 'events_20210102.user_properties']`

**Substitutions:**
  - `events_*` β†’ `events_20210102` (sql_8digit_literal:20210102)

---

### sid=382  iid=ga010  db=ga4

**Q:** Can you give me an overview of our website traffic for December 2020? I'm particularly interested in the channel with the fourth highest number of sessions.

**Old gold (3):** `['events_*.user_pseudo_id', 'events_*.event_params', 'events_*.event_timestamp']`

**New gold (3):** `['events_20201201.user_pseudo_id', 'events_20201201.event_params', 'events_20201201.event_timestamp']`

**Substitutions:**
  - `events_*` β†’ `events_20201201` (table_suffix_range:20201201 -> pad8)

---

### sid=385  iid=ga012  db=ga4

**Q:** On November 30, 2020, identify the item category with the highest tax rate by dividing tax value in usd by purchase revenue in usd for purchase events, and then retrieve the transaction IDs, total item quantities, and both purchase revenue in usd and purchase revenue for those purchase events in tha

**Old gold (7):** `['events_*.items', 'events_*.ecommerce', 'events_*.ecommerce', 'events_*.event_name', 'events_*.ecommerce', 'events_*.ecommerce', 'events_*.ecommerce']`

**New gold (7):** `['events_20200101.items', 'events_20200101.ecommerce', 'events_20200101.ecommerce', 'events_20200101.event_name', 'events_20200101.ecommerce', 'events_20200101.ecommerce', 'events_20200101.ecommerce']`

**Substitutions:**
  - `events_*` β†’ `events_20200101` (q_year_fallback:2020 -> pad8)

---