File size: 48,800 Bytes
3358602 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 | # Spider 2.0 ORIGINAL (no-wildcards) β review
**Total samples:** 233
**Samples with substitutions:** 83
**Total pattern substitutions:** 396
**Leftover `*` in graph (should be 0):** 0
## Substitution reasons
| reason | count |
|---|---|
| `q_year_fallback` | 250 |
| `hardcoded_default` | 101 |
| `sql_specific` | 13 |
| `table_suffix_range` | 10 |
| `sql_specific_year_alpha` | 7 |
| `sql_partial_wildcard` | 4 |
| `sql_8digit_literal` | 3 |
| `sql_specific_alphanumeric` | 3 |
| `declare` | 2 |
| `family_default` | 2 |
| `usa_1910_current_singleton` | 1 |
## Top 20 patterns substituted
| pattern | count |
|---|---|
| `ga_sessions_*` | 11 |
| `zip_codes_*` | 10 |
| `censustract_*` | 10 |
| `puma_*` | 10 |
| `schooldistrictunified_*` | 10 |
| `state_*` | 10 |
| `congressionaldistrict_*` | 10 |
| `place_*` | 10 |
| `blockgroup_*` | 10 |
| `cbsa_*` | 10 |
| `schooldistrictsecondary_*` | 10 |
| `county_*` | 10 |
| `schooldistrictelementary_*` | 10 |
| `zcta_*` | 10 |
| `events_*` | 9 |
| `zcta5_*` | 8 |
| `icoads_core_*` | 6 |
| `storms_*` | 6 |
| `tlc_fhv_trips_*` | 6 |
| `tlc_yellow_trips_*` | 6 |
## Gold-affecting substitutions (47 suspect samples)
These are the samples where the substitution changes `used_columns` (the gold).
### sid=0 iid=bq011 db=ga4
**Q:** How many distinct pseudo users had positive engagement time in the 7-day period ending on January 7, 2021 at 23:59:59, but had no positive engagement time in the 2-day period ending on the same date (January 7, 2021 at 23:59:59) ?
**Old gold (2):** `['events_*.user_pseudo_id', 'events_*.event_timestamp']`
**New gold (2):** `['events_20210101.user_pseudo_id', 'events_20210101.event_timestamp']`
**Substitutions:**
- `events_*` β `events_20210101` (table_suffix_range:20210101 -> pad8)
---
### sid=1 iid=bq010 db=ga360
**Q:** Find the top-selling product among customers who bought 'Youtube Menβs Vintage Henley' in July 2017, excluding itself.
**Old gold (2):** `['ga_sessions_*.fullVisitorId', 'ga_sessions_*.hits']`
**New gold (2):** `['ga_sessions_20170701.fullVisitorId', 'ga_sessions_20170701.hits']`
**Substitutions:**
- `ga_sessions_*` β `ga_sessions_20170701` (sql_partial_wildcard:201707 -> pad8)
---
### sid=2 iid=bq009 db=ga360
**Q:** Which traffic source has the highest total transaction revenue for the year 2017, and what is the difference in millions (rounded to two decimal places) between the highest and lowest monthly total transaction revenue for that traffic source?
**Old gold (3):** `['ga_sessions_*.date', 'ga_sessions_*.trafficSource', 'ga_sessions_*.totals']`
**New gold (3):** `['ga_sessions_20170101.date', 'ga_sessions_20170101.trafficSource', 'ga_sessions_20170101.totals']`
**Substitutions:**
- `ga_sessions_*` β `ga_sessions_20170101` (sql_partial_wildcard:2017 -> pad8)
---
### sid=3 iid=bq001 db=ga360
**Q:** For each visitor who made at least one transaction in February 2017, how many days elapsed between the date of their first visit in February and the date of their first transaction in February, and on what type of device did they make that first transaction?
**Old gold (2):** `['ga_sessions_*.fullVisitorId', 'ga_sessions_*.date']`
**New gold (2):** `['ga_sessions_20170201.fullVisitorId', 'ga_sessions_20170201.date']`
**Substitutions:**
- `ga_sessions_*` β `ga_sessions_20170201` (declare:20170201 -> pad8)
---
### sid=4 iid=bq002 db=ga360
**Q:** During the first half of 2017, focusing on hits product revenue, which traffic source generated the highest total product revenue, and what were the maximum daily, weekly, and monthly product revenues (in millions) for that top-performing source over this period?
**Old gold (2):** `['ga_sessions_*.date', 'ga_sessions_*.hits']`
**New gold (2):** `['ga_sessions_20170101.date', 'ga_sessions_20170101.hits']`
**Substitutions:**
- `ga_sessions_*` β `ga_sessions_20170101` (declare:20170101 -> pad8)
---
### sid=5 iid=bq003 db=ga360
**Q:** Between April 1 and July 31 of 2017, using the hits product revenue data along with the totals transactions to classify sessions as purchase (transactions β₯ 1 and productRevenue not null) or non-purchase (transactions null and productRevenue null), compare the average pageviews per visitor for each
**Old gold (2):** `['ga_sessions_*.date', 'ga_sessions_*.fullVisitorId']`
**New gold (2):** `['ga_sessions_20170101.date', 'ga_sessions_20170101.fullVisitorId']`
**Substitutions:**
- `ga_sessions_*` β `ga_sessions_20170101` (sql_partial_wildcard:2017 -> pad8)
---
### sid=7 iid=bq008 db=ga360
**Q:** In January 2017, among visitors whose campaign name contains 'Data Share' and who accessed any page starting with '/home', which page did they most commonly visit next, and what is the maximum time (in seconds) they spent on the '/home' page before moving on?
**Old gold (7):** `['ga_sessions_*.fullVisitorId', 'ga_sessions_*.visitId', 'ga_sessions_*.trafficSource', 'ga_sessions_*.hits', 'ga_sessions_*.hits', 'ga_sessions_*.visitStartTime', 'ga_sessions_*.hits']`
**New gold (7):** `['ga_sessions_20170101.fullVisitorId', 'ga_sessions_20170101.visitId', 'ga_sessions_20170101.trafficSource', 'ga_sessions_20170101.hits', 'ga_sessions_20170101.hits', 'ga_sessions_20170101.visitStartTime', 'ga_sessions_20170101.hits']`
**Substitutions:**
- `ga_sessions_*` β `ga_sessions_20170101` (table_suffix_range:20170101 -> pad8)
---
### sid=8 iid=bq269 db=ga360
**Q:** Between June 1, 2017, and July 31, 2017, consider only sessions that have non-null pageviews. Classify each session as βpurchaseβ if it has at least one transaction, or βnon_purchaseβ otherwise. For each month, sum each visitorβs total pageviews under each classification, then compute the average pa
**Old gold (4):** `['ga_sessions_*.date', 'ga_sessions_*.fullVisitorId', 'ga_sessions_*.totals', 'ga_sessions_*.totals']`
**New gold (4):** `['ga_sessions_20170601.date', 'ga_sessions_20170601.fullVisitorId', 'ga_sessions_20170601.totals', 'ga_sessions_20170601.totals']`
**Substitutions:**
- `ga_sessions_*` β `ga_sessions_20170601` (table_suffix_range:20170601 -> pad8)
---
### sid=9 iid=bq268 db=ga360
**Q:** Identify the longest number of days between the first visit and the last recorded event (either the last visit or the first transaction) for a user, where the last recorded event is associated with a mobile device. The last recorded event could either be the last visit or the first transaction, and
**Old gold (4):** `['ga_sessions_*.fullVisitorId', 'ga_sessions_*.date', 'ga_sessions_*.device', 'ga_sessions_*.hits']`
**New gold (4):** `['ga_sessions_20170701.fullVisitorId', 'ga_sessions_20170701.date', 'ga_sessions_20170701.device', 'ga_sessions_20170701.hits']`
**Substitutions:**
- `ga_sessions_*` β `ga_sessions_20170701` (family_default:ga_sessions=20170701)
---
### sid=10 iid=bq270 db=ga360
**Q:** What were the monthly add-to-cart and purchase conversion rates, calculated as a percentage of pageviews on product details, from January to March 2017?
**Old gold (3):** `['ga_sessions_*.date', 'ga_sessions_*.hits', 'ga_sessions_*.hits']`
**New gold (3):** `['ga_sessions_20170101.date', 'ga_sessions_20170101.hits', 'ga_sessions_20170101.hits']`
**Substitutions:**
- `ga_sessions_*` β `ga_sessions_20170101` (sql_partial_wildcard:2017 -> pad8)
---
### sid=11 iid=bq275 db=ga360
**Q:** Which visitor IDs belong to users whose first transaction occurred on a device explicitly labeled as 'mobile' on a later date than their first visit?
**Old gold (4):** `['ga_sessions_*.fullVisitorId', 'ga_sessions_*.date', 'ga_sessions_*.hits', 'ga_sessions_*.device']`
**New gold (4):** `['ga_sessions_20170701.fullVisitorId', 'ga_sessions_20170701.date', 'ga_sessions_20170701.hits', 'ga_sessions_20170701.device']`
**Substitutions:**
- `ga_sessions_*` β `ga_sessions_20170701` (family_default:ga_sessions=20170701)
---
### sid=12 iid=bq374 db=ga360
**Q:** Calculates the percentage of new users who, between August 1, 2016, and April 30, 2017, both stayed on the site for more than 5 minutes during their initial visit and made a purchase on a subsequent visit at any later time, relative to the total number of new users in the same period.
**Old gold (6):** `['ga_sessions_*.fullVisitorId', 'ga_sessions_*.visitStartTime', 'ga_sessions_*.totals', 'ga_sessions_*.date', 'ga_sessions_*.totals', 'ga_sessions_*.totals']`
**New gold (6):** `['ga_sessions_20160801.fullVisitorId', 'ga_sessions_20160801.visitStartTime', 'ga_sessions_20160801.totals', 'ga_sessions_20160801.date', 'ga_sessions_20160801.totals', 'ga_sessions_20160801.totals']`
**Substitutions:**
- `ga_sessions_*` β `ga_sessions_20160801` (sql_8digit_literal:20160801)
---
### sid=61 iid=bq235 db=cms_data
**Q:** Can you tell me which healthcare provider incurs the highest combined average costs for both outpatient and inpatient services in 2014?
**Old gold (12):** `['outpatient_charges_*.provider_state', 'outpatient_charges_*.provider_city', 'outpatient_charges_*.provider_id', 'outpatient_charges_*.provider_name', 'outpatient_charges_*.outpatient_services', 'outpatient_charges_*.average_total_payments', 'inpatient_charges_*.provider_state', 'inpatient_charges_*.provider_city', 'inpatient_charges_*.provider_id', 'inpatient_charges_*.provider_name', 'inpatient_charges_*.total_discharges', 'inpatient_charges_*.average_medicare_payments']`
**New gold (12):** `['outpatient_charges_2014.provider_state', 'outpatient_charges_2014.provider_city', 'outpatient_charges_2014.provider_id', 'outpatient_charges_2014.provider_name', 'outpatient_charges_2014.outpatient_services', 'outpatient_charges_2014.average_total_payments', 'inpatient_charges_2014.provider_state', 'inpatient_charges_2014.provider_city', 'inpatient_charges_2014.provider_id', 'inpatient_charges_2014.provider_name', 'inpatient_charges_2014.total_discharges', 'inpatient_charges_2014.average_medicare_payments']`
**Substitutions:**
- `outpatient_charges_*` β `outpatient_charges_2014` (sql_specific:2014 -> pad4)
- `inpatient_charges_*` β `inpatient_charges_2014` (sql_specific:2014 -> pad4)
---
### sid=69 iid=bq419 db=noaa_data
**Q:** Which 5 states had the most storm events from 1980 to 1995, considering only the top 1000 states with the highest event counts each year? Please use state abbreviations.
**Old gold (2):** `['storms_*.state', 'storms_*.event_id']`
**New gold (2):** `['storms_1980.state', 'storms_1980.event_id']`
**Substitutions:**
- `storms_*` β `storms_1980` (sql_specific:1980 -> pad4)
---
### sid=71 iid=sf_bq236 db=NOAA_DATA_PLUS
**Q:** What are the top 5 zip codes of the areas in the United States that have experienced the most hail storm events in the past 10 years? Don't use data from hail reports table.
**Old gold (7):** `['STORMS_*.event_point', 'STORMS_*.event_type', 'STORMS_*.event_id', 'ZIP_CODES.zip_code', 'ZIP_CODES.city', 'ZIP_CODES.state_name', 'ZIP_CODES.zip_code_geom']`
**New gold (7):** `['STORMS_2014.event_point', 'STORMS_2014.event_type', 'STORMS_2014.event_id', 'ZIP_CODES.zip_code', 'ZIP_CODES.city', 'ZIP_CODES.state_name', 'ZIP_CODES.zip_code_geom']`
**Substitutions:**
- `STORMS_*` β `STORMS_2014` (sql_specific:2014 -> pad4)
---
### sid=75 iid=bq357 db=noaa_data
**Q:** What are the latitude and longitude coordinates and dates between 2005 and 2015 with the top 5 highest daily average wind speeds, excluding records with missing wind speed values? Using data from tables start with prefix "icoads_core".
**Old gold (6):** `['icoads_core_*.year', 'icoads_core_*.month', 'icoads_core_*.day', 'icoads_core_*.latitude', 'icoads_core_*.longitude', 'icoads_core_*.wind_speed']`
**New gold (6):** `['icoads_core_2005.year', 'icoads_core_2005.month', 'icoads_core_2005.day', 'icoads_core_2005.latitude', 'icoads_core_2005.longitude', 'icoads_core_2005.wind_speed']`
**Substitutions:**
- `icoads_core_*` β `icoads_core_2005` (table_suffix_range:2005 -> pad4)
---
### sid=93 iid=sf_bq429 db=CENSUS_BUREAU_ACS_2
**Q:** Which are the top five states with the greatest average difference in median income between 2015 and 2018 at the ZIP code level, and what is the corresponding average number of vulnerable employees across wholesale trade, natural resources and construction, arts and entertainment, information, and r
**Old gold (26):** `['ZCTA5_2017_5YR.employed_information', 'PLACE_*.geo_id', 'PLACE_*.employed_arts_entertainment_recreation_accommodation_food', 'PLACE_*.employed_wholesale_trade', 'ZIP_CODES_*.employed_information', 'ZCTA5_2018_5YR.geo_id', 'PLACE_*.occupation_natural_resources_construction_maintenance', 'ZCTA5_2015_5YR.geo_id', 'ZCTA5_2017_5YR.employed_arts_entertainment_recreation_accommodation_food', 'ZCTA5_2017_5YR.occupation_natural_resources_construction_maintenance', 'ZCTA5_2017_5YR.employed_retail_trade', 'PLACE_*.employed_information', 'ZCTA5_2018_5YR.median_income', 'ZIP_CODES_*.employed_wholesale_trade', 'ZIP_CODES_*.employed_retail_trade', 'ZIP_CODES_*.median_income', 'ZIP_CODES_*.geo_id', 'ZIP_CODES.state_name', 'PLACE_*.employed_retail_trade', 'PLACE_*.median_income', 'ZIP_CODES_*.occupation_natural_resources_construction_maintenance', 'ZIP_CODES_*.employed_arts_entertainment_recreation_accommodation_food', 'ZIP_CODES.zip_code', 'ZCTA5_2017_5YR.employed_wholesale_trade', 'ZCTA5_2015_5YR.median_income', 'ZCTA5_2017_5YR.geo_id']`
**New gold (26):** `['ZCTA5_2017_5YR.employed_information', 'PLACE_2015.geo_id', 'PLACE_2015.employed_arts_entertainment_recreation_accommodation_food', 'PLACE_2015.employed_wholesale_trade', 'ZIP_CODES_2018_5YR.employed_information', 'ZCTA5_2018_5YR.geo_id', 'PLACE_2015.occupation_natural_resources_construction_maintenance', 'ZCTA5_2015_5YR.geo_id', 'ZCTA5_2017_5YR.employed_arts_entertainment_recreation_accommodation_food', 'ZCTA5_2017_5YR.occupation_natural_resources_construction_maintenance', 'ZCTA5_2017_5YR.employed_retail_trade', 'PLACE_2015.employed_information', 'ZCTA5_2018_5YR.median_income', 'ZIP_CODES_2018_5YR.employed_wholesale_trade', 'ZIP_CODES_2018_5YR.employed_retail_trade', 'ZIP_CODES_2018_5YR.median_income', 'ZIP_CODES_2018_5YR.geo_id', 'ZIP_CODES.state_name', 'PLACE_2015.employed_retail_trade', 'PLACE_2015.median_income', 'ZIP_CODES_2018_5YR.occupation_natural_resources_construction_maintenance', 'ZIP_CODES_2018_5YR.employed_arts_entertainment_recreation_accommodation_food', 'ZIP_CODES.zip_code', 'ZCTA5_2017_5YR.employed_wholesale_trade', 'ZCTA5_2015_5YR.median_income', 'ZCTA5_2017_5YR.geo_id']`
**Substitutions:**
- `ZIP_CODES_*` β `ZIP_CODES_2018_5YR` (sql_specific_year_alpha:2018_5YR)
- `PLACE_*` β `PLACE_2015` (q_year_fallback:2015 -> pad4)
---
### sid=100 iid=sf_bq289 db=GEO_OPENSTREETMAP_CENSUS_PLACES
**Q:** Can you find the shortest distance between any two amenities (either a library, place of worship, or community center) located within Philadelphia, analyzed through pennsylvania table and planet features points?
**Old gold (5):** `['PLACES_*.place_name', 'PLACES_*.place_geom', 'PLANET_FEATURES_POINTS.all_tags', 'PLANET_FEATURES_POINTS.geometry', 'PLANET_FEATURES_POINTS.osm_id']`
**New gold (5):** `['PLACES_PENNSYLVANIA.place_name', 'PLACES_PENNSYLVANIA.place_geom', 'PLANET_FEATURES_POINTS.all_tags', 'PLANET_FEATURES_POINTS.geometry', 'PLANET_FEATURES_POINTS.osm_id']`
**Substitutions:**
- `PLACES_*` β `PLACES_PENNSYLVANIA` (sql_specific_alphanumeric:PENNSYLVANIA)
---
### sid=132 iid=bq021 db=new_york
**Q:** For the top 20 Citi Bike routes in 2016, which route is faster than yellow taxis and among those, which one has the longest average bike duration? Please provide the start station name of this route. The coordinates are rounded to three decimals.
**Old gold (14):** `['citibike_trips.start_station_name', 'citibike_trips.end_station_name', 'citibike_trips.start_station_latitude', 'citibike_trips.start_station_longitude', 'citibike_trips.end_station_latitude', 'citibike_trips.end_station_longitude', 'citibike_trips.tripduration', 'citibike_trips.starttime', 'tlc_yellow_trips_*.pickup_latitude', 'tlc_yellow_trips_*.pickup_longitude', 'tlc_yellow_trips_*.dropoff_latitude', 'tlc_yellow_trips_*.dropoff_longitude', 'tlc_yellow_trips_*.dropoff_datetime', 'tlc_yellow_trips_*.pickup_datetime']`
**New gold (14):** `['citibike_trips.start_station_name', 'citibike_trips.end_station_name', 'citibike_trips.start_station_latitude', 'citibike_trips.start_station_longitude', 'citibike_trips.end_station_latitude', 'citibike_trips.end_station_longitude', 'citibike_trips.tripduration', 'citibike_trips.starttime', 'tlc_yellow_trips_2016.pickup_latitude', 'tlc_yellow_trips_2016.pickup_longitude', 'tlc_yellow_trips_2016.dropoff_latitude', 'tlc_yellow_trips_2016.dropoff_longitude', 'tlc_yellow_trips_2016.dropoff_datetime', 'tlc_yellow_trips_2016.pickup_datetime']`
**Substitutions:**
- `tlc_yellow_trips_*` β `tlc_yellow_trips_2016` (sql_specific:2016 -> pad4)
---
### sid=134 iid=bq185 db=new_york_plus
**Q:** What is the average trip duration in minutes for all valid Yellow taxi trips that took place between February 1, 2016, and February 7, 2016 (inclusive), with a positive trip duration, more than three passengers, and a trip distance of at least ten miles, where both the pickup and dropoff locations a
**Old gold (8):** `['tlc_yellow_trips_*.dropoff_datetime', 'tlc_yellow_trips_*.pickup_datetime', 'tlc_yellow_trips_*.passenger_count', 'tlc_yellow_trips_*.trip_distance', 'tlc_yellow_trips_*.pickup_location_id', 'tlc_yellow_trips_*.dropoff_location_id', 'taxi_zone_geom.zone_id', 'taxi_zone_geom.borough']`
**New gold (8):** `['tlc_yellow_trips_2016.dropoff_datetime', 'tlc_yellow_trips_2016.pickup_datetime', 'tlc_yellow_trips_2016.passenger_count', 'tlc_yellow_trips_2016.trip_distance', 'tlc_yellow_trips_2016.pickup_location_id', 'tlc_yellow_trips_2016.dropoff_location_id', 'taxi_zone_geom.zone_id', 'taxi_zone_geom.borough']`
**Substitutions:**
- `tlc_yellow_trips_*` β `tlc_yellow_trips_2016` (sql_specific:2016 -> pad4)
---
### sid=136 iid=bq098 db=new_york_plus
**Q:** For NYC yellow taxi trips where both the pickup and dropoff occurred between January 1 and 7, 2016, inclusive, calculate the percentage of trips with no tip in each pickup borough, ensuring that only trips where the dropoff occurs after the pickup are included, the passenger count is greater than ze
**Old gold (12):** `['tlc_yellow_trips_*.dropoff_datetime', 'tlc_yellow_trips_*.pickup_datetime', 'tlc_yellow_trips_*.total_amount', 'tlc_yellow_trips_*.tip_amount', 'tlc_yellow_trips_*.passenger_count', 'tlc_yellow_trips_*.trip_distance', 'tlc_yellow_trips_*.tolls_amount', 'tlc_yellow_trips_*.mta_tax', 'tlc_yellow_trips_*.fare_amount', 'tlc_yellow_trips_*.pickup_location_id', 'taxi_zone_geom.zone_id', 'taxi_zone_geom.borough']`
**New gold (12):** `['tlc_yellow_trips_2016.dropoff_datetime', 'tlc_yellow_trips_2016.pickup_datetime', 'tlc_yellow_trips_2016.total_amount', 'tlc_yellow_trips_2016.tip_amount', 'tlc_yellow_trips_2016.passenger_count', 'tlc_yellow_trips_2016.trip_distance', 'tlc_yellow_trips_2016.tolls_amount', 'tlc_yellow_trips_2016.mta_tax', 'tlc_yellow_trips_2016.fare_amount', 'tlc_yellow_trips_2016.pickup_location_id', 'taxi_zone_geom.zone_id', 'taxi_zone_geom.borough']`
**Substitutions:**
- `tlc_yellow_trips_*` β `tlc_yellow_trips_2016` (sql_specific:2016 -> pad4)
---
### sid=137 iid=bq039 db=new_york_plus
**Q:** Find the top 10 taxi trips in New York City between July 1 and July 7, 2016 (ensuring both pickup and dropoff times fall within these dates) where the passenger count is greater than five, the trip distance is at least ten miles, and there are no negative fare-related amounts (including tip, tolls,
**Old gold (13):** `['tlc_yellow_trips_*.pickup_datetime', 'tlc_yellow_trips_*.dropoff_datetime', 'tlc_yellow_trips_*.trip_distance', 'tlc_yellow_trips_*.tip_amount', 'tlc_yellow_trips_*.total_amount', 'tlc_yellow_trips_*.pickup_location_id', 'tlc_yellow_trips_*.dropoff_location_id', 'tlc_yellow_trips_*.passenger_count', 'tlc_yellow_trips_*.tolls_amount', 'tlc_yellow_trips_*.mta_tax', 'tlc_yellow_trips_*.fare_amount', 'taxi_zone_geom.zone_name', 'taxi_zone_geom.zone_id']`
**New gold (13):** `['tlc_yellow_trips_2016.pickup_datetime', 'tlc_yellow_trips_2016.dropoff_datetime', 'tlc_yellow_trips_2016.trip_distance', 'tlc_yellow_trips_2016.tip_amount', 'tlc_yellow_trips_2016.total_amount', 'tlc_yellow_trips_2016.pickup_location_id', 'tlc_yellow_trips_2016.dropoff_location_id', 'tlc_yellow_trips_2016.passenger_count', 'tlc_yellow_trips_2016.tolls_amount', 'tlc_yellow_trips_2016.mta_tax', 'tlc_yellow_trips_2016.fare_amount', 'taxi_zone_geom.zone_name', 'taxi_zone_geom.zone_id']`
**Substitutions:**
- `tlc_yellow_trips_*` β `tlc_yellow_trips_2016` (sql_specific:2016 -> pad4)
---
### sid=175 iid=bq087 db=covid19_symptom_search
**Q:** Please calculate the overall percentage change in the average weekly search frequency for the symptom 'Anosmia' across the five New York City countiesβBronx County, Queens County, Kings County, New York County, and Richmond Countyβby comparing the combined data from January 1, 2019, through December
**Old gold (4):** `['symptom_search_*.symptom_anosmia', 'symptom_search_*.sub_region_1', 'symptom_search_*.sub_region_2', 'symptom_search_*.date']`
**New gold (4):** `['symptom_search_2019.symptom_anosmia', 'symptom_search_2019.sub_region_1', 'symptom_search_2019.sub_region_2', 'symptom_search_2019.date']`
**Substitutions:**
- `symptom_search_*` β `symptom_search_2019` (q_year_fallback:2019 -> pad4)
---
### sid=176 iid=bq088 db=covid19_symptom_search
**Q:** Please calculate the average levels of anxiety and depression symptoms from the weekly country data for the United States during the periods from January 1, 2019, to January 1, 2020, and from January 1, 2020, to January 1, 2021. Then, compute the percentage increase in these average symptom levels f
**Old gold (4):** `['symptom_search_*.symptom_anxiety', 'symptom_search_*.symptom_depression', 'symptom_search_*.country_region_code', 'symptom_search_*.date']`
**New gold (4):** `['symptom_search_2019.symptom_anxiety', 'symptom_search_2019.symptom_depression', 'symptom_search_2019.country_region_code', 'symptom_search_2019.date']`
**Substitutions:**
- `symptom_search_*` β `symptom_search_2019` (q_year_fallback:2019 -> pad4)
---
### sid=177 iid=bq089 db=covid19_usa
**Q:** Given the latest population estimates from the 2018 five-year American Community Survey, what is the number of vaccine sites per 1000 people for counties in California?
**Old gold (12):** `['state_*.geo_id', 'facility_boundary_us_all.facility_place_id', 'place_*.total_pop', 'facility_boundary_us_all.facility_sub_region_2_code', 'place_*.geo_id', 'censustract_*.total_pop', 'facility_boundary_us_all.facility_sub_region_2', 'congressionaldistrict_*.total_pop', 'state_*.total_pop', 'congressionaldistrict_*.geo_id', 'facility_boundary_us_all.facility_sub_region_1', 'censustract_*.geo_id']`
**New gold (12):** `['state_2018.geo_id', 'facility_boundary_us_all.facility_place_id', 'place_2018.total_pop', 'facility_boundary_us_all.facility_sub_region_2_code', 'place_2018.geo_id', 'censustract_2018.total_pop', 'facility_boundary_us_all.facility_sub_region_2', 'congressionaldistrict_2018.total_pop', 'state_2018.total_pop', 'congressionaldistrict_2018.geo_id', 'facility_boundary_us_all.facility_sub_region_1', 'censustract_2018.geo_id']`
**Substitutions:**
- `censustract_*` β `censustract_2018` (q_year_fallback:2018 -> pad4)
- `state_*` β `state_2018` (q_year_fallback:2018 -> pad4)
- `congressionaldistrict_*` β `congressionaldistrict_2018` (q_year_fallback:2018 -> pad4)
- `place_*` β `place_2018` (q_year_fallback:2018 -> pad4)
---
### sid=178 iid=bq407 db=covid19_usa
**Q:** Find the top three counties with populations over 50,000, using the 2020 5-year census data, that had the highest COVID-19 case fatality rates on August 27, 2020. For these counties, provide the name, state, median age, total population, number of confirmed COVID-19 cases per 100,000 people, number
**Old gold (30):** `['state_*.geo_id', 'place_*.geo_id', 'county_*.geo_id', 'schooldistrictelementary_*.geo_id', 'congressionaldistrict_*.median_age', 'schooldistrictsecondary_*.median_age', 'state_*.total_pop', 'summary.state', 'place_*.total_pop', 'summary.county_name', 'summary.date', 'schooldistrictelementary_*.median_age', 'congressionaldistrict_*.total_pop', 'congressionaldistrict_*.geo_id', 'schooldistrictsecondary_*.total_pop', 'summary.deaths', 'puma_*.total_pop', 'puma_*.median_age', 'summary.confirmed_cases', 'state_*.median_age', 'schooldistrictsecondary_*.geo_id', 'summary.county_fips_code', 'county_*.geo_id', 'schooldistrictelementary_*.total_pop', 'county_*.total_pop', 'puma_*.geo_id', 'county_*.median_age', 'county_*.median_age', 'place_*.median_age', 'county_*.total_pop']`
**New gold (30):** `['state_2020.geo_id', 'place_2020.geo_id', 'county_2020_5yr.geo_id', 'schooldistrictelementary_2020.geo_id', 'congressionaldistrict_2020.median_age', 'schooldistrictsecondary_2020.median_age', 'state_2020.total_pop', 'summary.state', 'place_2020.total_pop', 'summary.county_name', 'summary.date', 'schooldistrictelementary_2020.median_age', 'congressionaldistrict_2020.total_pop', 'congressionaldistrict_2020.geo_id', 'schooldistrictsecondary_2020.total_pop', 'summary.deaths', 'puma_2020.total_pop', 'puma_2020.median_age', 'summary.confirmed_cases', 'state_2020.median_age', 'schooldistrictsecondary_2020.geo_id', 'summary.county_fips_code', 'county_2020_5yr.geo_id', 'schooldistrictelementary_2020.total_pop', 'county_2020_5yr.total_pop', 'puma_2020.geo_id', 'county_2020_5yr.median_age', 'county_2020_5yr.median_age', 'place_2020.median_age', 'county_2020_5yr.total_pop']`
**Substitutions:**
- `puma_*` β `puma_2020` (q_year_fallback:2020 -> pad4)
- `state_*` β `state_2020` (q_year_fallback:2020 -> pad4)
- `congressionaldistrict_*` β `congressionaldistrict_2020` (q_year_fallback:2020 -> pad4)
- `place_*` β `place_2020` (q_year_fallback:2020 -> pad4)
- `schooldistrictsecondary_*` β `schooldistrictsecondary_2020` (q_year_fallback:2020 -> pad4)
- `county_*` β `county_2020_5yr` (sql_specific_year_alpha:2020_5yr)
- `schooldistrictelementary_*` β `schooldistrictelementary_2020` (q_year_fallback:2020 -> pad4)
---
### sid=182 iid=bq061 db=census_bureau_acs_1
**Q:** Which census tract has witnessed the largest increase in median income between 2015 and 2018 in California? Tell me the tract code.
**Old gold (6):** `['state_*.geo_id', 'census_tracts_*.geo_id', 'censustract_*.median_income', 'censustract_*.geo_id', 'state_*.median_income', 'census_tracts_*.tract_ce']`
**New gold (6):** `['state_2015.geo_id', 'census_tracts_2015.geo_id', 'censustract_2018_5yr.median_income', 'censustract_2018_5yr.geo_id', 'state_2015.median_income', 'census_tracts_2015.tract_ce']`
**Substitutions:**
- `censustract_*` β `censustract_2018_5yr` (sql_specific_year_alpha:2018_5yr)
- `state_*` β `state_2015` (q_year_fallback:2015 -> pad4)
- `census_tracts_*` β `census_tracts_2015` (q_year_fallback:2015 -> pad4)
---
### sid=183 iid=bq064 db=census_bureau_acs_1
**Q:** Using the 2017 U.S. Census Tract data from the BigQuery public datasets, you need to proportionally allocate each tract's population and income to the zip codes based on the overlapping area between their geographic boundaries. Then, filter the results to include only those zip codes located within
**Old gold (18):** `['state_*.geo_id', 'census_tracts_*.geo_id', 'us_census_tracts_national.functional_status', 'us_census_tracts_national.tract_ce', 'state_*.income_per_capita', 'us_census_tracts_national.geo_id', 'census_tracts_*.functional_status', 'censustract_*.total_pop', 'censustract_*.income_per_capita', 'state_*.total_pop', 'zip_codes.zip_code', 'zip_codes.zip_code_geom', 'zip_codes.functional_status', 'census_tracts_*.tract_geom', 'zip_codes.state_code', 'censustract_*.geo_id', 'us_census_tracts_national.tract_geom', 'census_tracts_*.tract_ce']`
**New gold (18):** `['state_2017.geo_id', 'census_tracts_2017.geo_id', 'us_census_tracts_national.functional_status', 'us_census_tracts_national.tract_ce', 'state_2017.income_per_capita', 'us_census_tracts_national.geo_id', 'census_tracts_2017.functional_status', 'censustract_2017_5yr.total_pop', 'censustract_2017_5yr.income_per_capita', 'state_2017.total_pop', 'zip_codes.zip_code', 'zip_codes.zip_code_geom', 'zip_codes.functional_status', 'census_tracts_2017.tract_geom', 'zip_codes.state_code', 'censustract_2017_5yr.geo_id', 'us_census_tracts_national.tract_geom', 'census_tracts_2017.tract_ce']`
**Substitutions:**
- `censustract_*` β `censustract_2017_5yr` (sql_specific_year_alpha:2017_5yr)
- `state_*` β `state_2017` (q_year_fallback:2017 -> pad4)
- `census_tracts_*` β `census_tracts_2017` (q_year_fallback:2017 -> pad4)
---
### sid=194 iid=bq406 db=google_dei
**Q:** Please calculate the growth rates for Asians, Black people, Latinx people, Native Americans, White people, US women, US men, global women, and global men from 2014 to 2024 concerning the overall workforce.
**Old gold (11):** `['dar_non_intersectional_*.race_hispanic_latinx', 'dar_intersectional_*.race_white', 'dar_region_non_intersectional_representation.race_asian', 'dar_region_non_intersectional_representation.report_year', 'dar_non_intersectional_*.gender_us_men', 'dar_non_intersectional_*.gender_global_men', 'dar_non_intersectional_*.gender_global_women', 'dar_intersectional_*.race_black', 'dar_non_intersectional_*.gender_us_women', 'dar_intersectional_*.race_native_american', 'dar_region_non_intersectional_representation.workforce']`
**New gold (11):** `['dar_non_intersectional_2014.race_hispanic_latinx', 'dar_intersectional_2014.race_white', 'dar_region_non_intersectional_representation.race_asian', 'dar_region_non_intersectional_representation.report_year', 'dar_non_intersectional_2014.gender_us_men', 'dar_non_intersectional_2014.gender_global_men', 'dar_non_intersectional_2014.gender_global_women', 'dar_intersectional_2014.race_black', 'dar_non_intersectional_2014.gender_us_women', 'dar_intersectional_2014.race_native_american', 'dar_region_non_intersectional_representation.workforce']`
**Substitutions:**
- `dar_intersectional_*` β `dar_intersectional_2014` (q_year_fallback:2014 -> pad4)
- `dar_non_intersectional_*` β `dar_non_intersectional_2014` (q_year_fallback:2014 -> pad4)
---
### sid=228 iid=bq105 db=nhtsa_traffic_fatalities_plus
**Q:** According to the 2015 and 2016 accident and driver distraction, and excluding cases where the driverβs distraction status is recorded as 'Not Distracted,' 'Unknown if Distracted,' or 'Not Reported,' how many traffic accidents per 100,000 people were caused by driver distraction in each U.S. state fo
**Old gold (12):** `[' accident_2015.consecutive_number', ' accident_2015.state_name', ' distract_2015.consecutive_number', ' distract_2015.driver_distracted_by_name', 'population_by_zip_*.population', 'population_by_zip_*.zipcode', 'zipcode_area.zipcode', 'zipcode_area.state_name', ' accident_2016.consecutive_number', ' accident_2016.state_name', ' distract_2016.consecutive_number', ' distract_2016.driver_distracted_by_name']`
**New gold (12):** `[' accident_2015.consecutive_number', ' accident_2015.state_name', ' distract_2015.consecutive_number', ' distract_2015.driver_distracted_by_name', 'population_by_zip_2010.population', 'population_by_zip_2010.zipcode', 'zipcode_area.zipcode', 'zipcode_area.state_name', ' accident_2016.consecutive_number', ' accident_2016.state_name', ' distract_2016.consecutive_number', ' distract_2016.driver_distracted_by_name']`
**Substitutions:**
- `population_by_zip_*` β `population_by_zip_2010` (sql_specific:2010 -> pad4)
---
### sid=237 iid=bq352 db=sdoh
**Q:** Please list the average number of prenatal weeks in 2018 for counties in Wisconsin where more than 5% of the employed population had commutes of 45-59 minutes in 2017.
**Old gold (25):** `['state_*.geo_id', 'place_*.geo_id', 'cbsa_*.geo_id', 'state_*.employed_pop', 'county_*.geo_id', 'county_*.commute_45_59_mins', 'schooldistrictelementary_*.geo_id', 'place_*.employed_pop', 'place_*.commute_45_59_mins', 'county_*.County_of_Residence', 'cbsa_*.employed_pop', 'county_*.employed_pop', 'censustract_*.commute_45_59_mins', 'state_*.commute_45_59_mins', 'schooldistrictelementary_*.commute_45_59_mins', 'censustract_*.geo_id', 'cbsa_*.commute_45_59_mins', 'schooldistrictelementary_*.employed_pop', 'county_*.County_of_Residence_FIPS', 'censustract_*.employed_pop', 'county_*.Year', 'puma_*.commute_45_59_mins', 'county_*.Ave_Number_of_Prenatal_Wks', 'puma_*.geo_id', 'puma_*.employed_pop']`
**New gold (25):** `['state_2017.geo_id', 'place_2017.geo_id', 'cbsa_2017.geo_id', 'state_2017.employed_pop', 'county_2017_5yr.geo_id', 'county_2017_5yr.commute_45_59_mins', 'schooldistrictelementary_2017.geo_id', 'place_2017.employed_pop', 'place_2017.commute_45_59_mins', 'county_2017_5yr.County_of_Residence', 'cbsa_2017.employed_pop', 'county_2017_5yr.employed_pop', 'censustract_2017.commute_45_59_mins', 'state_2017.commute_45_59_mins', 'schooldistrictelementary_2017.commute_45_59_mins', 'censustract_2017.geo_id', 'cbsa_2017.commute_45_59_mins', 'schooldistrictelementary_2017.employed_pop', 'county_2017_5yr.County_of_Residence_FIPS', 'censustract_2017.employed_pop', 'county_2017_5yr.Year', 'puma_2017.commute_45_59_mins', 'county_2017_5yr.Ave_Number_of_Prenatal_Wks', 'puma_2017.geo_id', 'puma_2017.employed_pop']`
**Substitutions:**
- `puma_*` β `puma_2017` (q_year_fallback:2017 -> pad4)
- `censustract_*` β `censustract_2017` (q_year_fallback:2017 -> pad4)
- `state_*` β `state_2017` (q_year_fallback:2017 -> pad4)
- `place_*` β `place_2017` (q_year_fallback:2017 -> pad4)
- `cbsa_*` β `cbsa_2017` (q_year_fallback:2017 -> pad4)
- `county_*` β `county_2017_5yr` (sql_specific_year_alpha:2017_5yr)
- `schooldistrictelementary_*` β `schooldistrictelementary_2017` (q_year_fallback:2017 -> pad4)
---
### sid=238 iid=bq074 db=sdoh
**Q:** Count the number of counties that experienced an increase in unemployment from 2015 to 2018, using 5-year ACS data, and a decrease in dual-eligible enrollee counts between December 1, 2015, and December 1, 2018.
**Old gold (16):** `['dual_eligible_enrollment_by_county_and_program.County_Name', 'state_*.geo_id', 'county_*.unemployed_pop', 'dual_eligible_enrollment_by_county_and_program.Public_Total', 'place_*.geo_id', 'congressionaldistrict_*.unemployed_pop', 'schooldistrictsecondary_*.unemployed_pop', 'puma_*.geo_id', 'dual_eligible_enrollment_by_county_and_program.FIPS', 'county_*.geo_id', 'state_*.unemployed_pop', 'place_*.unemployed_pop', 'dual_eligible_enrollment_by_county_and_program.Date', 'congressionaldistrict_*.geo_id', 'puma_*.unemployed_pop', 'schooldistrictsecondary_*.geo_id']`
**New gold (16):** `['dual_eligible_enrollment_by_county_and_program.County_Name', 'state_2015.geo_id', 'county_2018_5yr.unemployed_pop', 'dual_eligible_enrollment_by_county_and_program.Public_Total', 'place_2015.geo_id', 'congressionaldistrict_2015.unemployed_pop', 'schooldistrictsecondary_2015.unemployed_pop', 'puma_2015.geo_id', 'dual_eligible_enrollment_by_county_and_program.FIPS', 'county_2018_5yr.geo_id', 'state_2015.unemployed_pop', 'place_2015.unemployed_pop', 'dual_eligible_enrollment_by_county_and_program.Date', 'congressionaldistrict_2015.geo_id', 'puma_2015.unemployed_pop', 'schooldistrictsecondary_2015.geo_id']`
**Substitutions:**
- `puma_*` β `puma_2015` (q_year_fallback:2015 -> pad4)
- `state_*` β `state_2015` (q_year_fallback:2015 -> pad4)
- `congressionaldistrict_*` β `congressionaldistrict_2015` (q_year_fallback:2015 -> pad4)
- `place_*` β `place_2015` (q_year_fallback:2015 -> pad4)
- `schooldistrictsecondary_*` β `schooldistrictsecondary_2015` (q_year_fallback:2015 -> pad4)
- `county_*` β `county_2018_5yr` (sql_specific_year_alpha:2018_5yr)
---
### sid=239 iid=bq066 db=sdoh
**Q:** Could you assess the relationship between the poverty rates from the previous year's census data and the percentage of births without maternal morbidity for the years 2016 to 2018? Use only data for births where no maternal morbidity was reported and for each year, use the 5-year census data from th
**Old gold (28):** `['state_*.geo_id', 'county_*.geo_id', 'county_*.Maternal_Morbidity_YN', 'cbsa_*.geo_id', 'place_*.geo_id', 'county_*.geo_id', 'place_*.pop_determined_poverty_status', 'cbsa_*.poverty', 'county_*.pop_determined_poverty_status', 'cbsa_*.pop_determined_poverty_status', 'county_*.geo_id', 'county_*.poverty', 'county_*.pop_determined_poverty_status', 'county_*.geo_id', 'puma_*.poverty', 'county_*.pop_determined_poverty_status', 'state_*.poverty', 'puma_*.pop_determined_poverty_status', 'county_*.Births', 'county_*.County_of_Residence_FIPS', 'county_*.poverty', 'county_*.Year', 'county_*.poverty', 'place_*.poverty', 'puma_*.geo_id', 'state_*.pop_determined_poverty_status', 'county_*.pop_determined_poverty_status', 'county_*.poverty']`
**New gold (28):** `['state_2016.geo_id', 'county_2015_5yr.geo_id', 'county_2015_5yr.Maternal_Morbidity_YN', 'cbsa_2016.geo_id', 'place_2016.geo_id', 'county_2015_5yr.geo_id', 'place_2016.pop_determined_poverty_status', 'cbsa_2016.poverty', 'county_2015_5yr.pop_determined_poverty_status', 'cbsa_2016.pop_determined_poverty_status', 'county_2015_5yr.geo_id', 'county_2015_5yr.poverty', 'county_2015_5yr.pop_determined_poverty_status', 'county_2015_5yr.geo_id', 'puma_2016.poverty', 'county_2015_5yr.pop_determined_poverty_status', 'state_2016.poverty', 'puma_2016.pop_determined_poverty_status', 'county_2015_5yr.Births', 'county_2015_5yr.County_of_Residence_FIPS', 'county_2015_5yr.poverty', 'county_2015_5yr.Year', 'county_2015_5yr.poverty', 'place_2016.poverty', 'puma_2016.geo_id', 'state_2016.pop_determined_poverty_status', 'county_2015_5yr.pop_determined_poverty_status', 'county_2015_5yr.poverty']`
**Substitutions:**
- `puma_*` β `puma_2016` (q_year_fallback:2016 -> pad4)
- `state_*` β `state_2016` (q_year_fallback:2016 -> pad4)
- `place_*` β `place_2016` (q_year_fallback:2016 -> pad4)
- `cbsa_*` β `cbsa_2016` (q_year_fallback:2016 -> pad4)
- `county_*` β `county_2015_5yr` (sql_specific_year_alpha:2015_5yr)
---
### sid=267 iid=bq204 db=eclipse_megamovie
**Q:** Find the user with the highest total clicks across all records from all available photo collections.
**Old gold (1):** `['photos_v_0_*.user']`
**New gold (1):** `['photos_v_0_1.user']`
**Substitutions:**
- `photos_v_0_*` β `photos_v_0_1` (sql_specific:1 -> pad4)
---
### sid=279 iid=bq049 db=iowa_liquor_sales_plus
**Q:** Please show the monthly per capita Bourbon Whiskey sales during 2022 in Dubuque County for the zip code that ranks third in total Bourbon Whiskey sales, using only the population aged 21 and older.
**Old gold (8):** `['sales.category_name', 'sales.date', 'sales.zip_code', 'sales.sale_dollars', 'sales.county', 'population_by_zip_*.zipcode', 'population_by_zip_*.population', 'population_by_zip_*.minimum_age']`
**New gold (8):** `['sales.category_name', 'sales.date', 'sales.zip_code', 'sales.sale_dollars', 'sales.county', 'population_by_zip_2022.zipcode', 'population_by_zip_2022.population', 'population_by_zip_2022.minimum_age']`
**Substitutions:**
- `population_by_zip_*` β `population_by_zip_2022` (q_year_fallback:2022 -> pad4)
---
### sid=281 iid=bq286 db=usa_names
**Q:** Can you tell me the name of the most popular female baby in Wyoming for the year 2021, based on the proportion of female babies given that name compared to the total number of female babies given the same name across all states?
**Old gold (5):** `['usa_1910_*.name', 'usa_1910_*.gender', 'usa_1910_*.year', 'usa_1910_*.number', 'usa_1910_*.state']`
**New gold (5):** `['usa_1910_current.name', 'usa_1910_current.gender', 'usa_1910_current.year', 'usa_1910_current.number', 'usa_1910_current.state']`
**Substitutions:**
- `usa_1910_*` β `usa_1910_current` (usa_1910_current_singleton)
---
### sid=284 iid=bq143 db=CPTAC_PDC
**Q:** Use CPTAC proteomics and RNAseq data for Clear Cell Renal Cell Carcinoma to select 'Primary Tumor' and 'Solid Tissue Normal' samples. Join the datasets on sample submitter IDs and gene symbols. Calculate the correlation between protein abundance (log2 ratio) and gene expression levels (log-transform
**Old gold (11):** `['quant_proteome_*.case_id', 'quant_proteome_*.aliquot_id', 'quant_proteome_*.gene_symbol', 'quant_proteome_*.protein_abundance_log2ratio', 'aliquot_to_case_mapping_current.sample_submitter_id', 'aliquot_to_case_mapping_current.sample_type', 'aliquot_to_case_mapping_current.case_id', 'aliquot_to_case_mapping_current.aliquot_id', 'RNAseq_hg38_gdc_current.gene_name', 'RNAseq_hg38_gdc_current.fpkm_unstranded', 'RNAseq_hg38_gdc_current.sample_barcode']`
**New gold (11):** `['quant_proteome_CPTAC_CCRCC_discovery_study_pdc_current.case_id', 'quant_proteome_CPTAC_CCRCC_discovery_study_pdc_current.aliquot_id', 'quant_proteome_CPTAC_CCRCC_discovery_study_pdc_current.gene_symbol', 'quant_proteome_CPTAC_CCRCC_discovery_study_pdc_current.protein_abundance_log2ratio', 'aliquot_to_case_mapping_current.sample_submitter_id', 'aliquot_to_case_mapping_current.sample_type', 'aliquot_to_case_mapping_current.case_id', 'aliquot_to_case_mapping_current.aliquot_id', 'RNAseq_hg38_gdc_current.gene_name', 'RNAseq_hg38_gdc_current.fpkm_unstranded', 'RNAseq_hg38_gdc_current.sample_barcode']`
**Substitutions:**
- `quant_proteome_*` β `quant_proteome_CPTAC_CCRCC_discovery_study_pdc_current` (sql_specific_alphanumeric:CPTAC_CCRCC_discovery_study_pdc_current)
---
### sid=318 iid=bq006 db=austin
**Q:** What is the date with the second highest Z-score for daily counts of 'PUBLIC INTOXICATION' incidents in Austin for the year 2016? List the date in the format of '2016-xx-xx'.
**Old gold (2):** `['incidents_*.date', 'incidents_*.descript']`
**New gold (2):** `['incidents_2016.date', 'incidents_2016.descript']`
**Substitutions:**
- `incidents_*` β `incidents_2016` (sql_specific:2016 -> pad4)
---
### sid=323 iid=bq430 db=ebi_chembl
**Q:** Find pairs of different molecules tested in the same assay and standard type, where both have 10β15 heavy atoms, fewer than 5 activities in that assay, fewer than 2 duplicate activities, non-null standard values, and pChEMBL values over 10. For each pair, report the maximum heavy atom count, the lat
**Old gold (13):** `['activities_*.assay_id', 'activities_*.standard_type', 'activities_*.activity_id', 'activities_*.standard_value', 'activities_*.standard_relation', 'activities_*.pchembl_value', 'activities_*.molregno', 'compound_structures_25.canonical_smiles', 'compound_properties_26.heavy_atoms', 'docs_*.doc_id', 'docs_*.year', 'docs_*.journal', 'docs_*.first_page']`
**New gold (13):** `['activities_29.assay_id', 'activities_29.standard_type', 'activities_29.activity_id', 'activities_29.standard_value', 'activities_29.standard_relation', 'activities_29.pchembl_value', 'activities_29.molregno', 'compound_structures_25.canonical_smiles', 'compound_properties_26.heavy_atoms', 'docs_29.doc_id', 'docs_29.year', 'docs_29.journal', 'docs_29.first_page']`
**Substitutions:**
- `activities_*` β `activities_29` (sql_specific:29 -> pad4)
- `docs_*` β `docs_29` (sql_specific:29 -> pad4)
---
### sid=369 iid=ga001 db=ga4
**Q:** I want to know the preferences of customers who purchased the Google Navy Speckled Tee in December 2020. What other product was purchased with the highest total quantity alongside this item?
**Old gold (5):** `['events_*.user_pseudo_id', 'events_*.event_name', 'events_*.items', 'events_*.items', 'events_*.items']`
**New gold (5):** `['events_20201201.user_pseudo_id', 'events_20201201.event_name', 'events_20201201.items', 'events_20201201.items', 'events_20201201.items']`
**Substitutions:**
- `events_*` β `events_20201201` (table_suffix_range:20201201 -> pad8)
---
### sid=370 iid=ga002 db=ga4
**Q:** Tell me the most purchased other products and their quantities by customers who bought the Google Red Speckled Tee each month for the three months starting from November 2020.
**Old gold (3):** `['events_*.user_pseudo_id', 'events_*.items', 'events_*.event_name']`
**New gold (3):** `['events_20201101.user_pseudo_id', 'events_20201101.items', 'events_20201101.event_name']`
**Substitutions:**
- `events_*` β `events_20201101` (sql_8digit_literal:20201101)
---
### sid=372 iid=ga004 db=ga4
**Q:** Can you figure out the average difference in pageviews between users who bought something and those who didnβt in December 2020? Just label anyone who was involved in purchase events as a purchaser.
**Old gold (2):** `['events_*.user_pseudo_id', 'events_*.event_name']`
**New gold (2):** `['events_20201201.user_pseudo_id', 'events_20201201.event_name']`
**Substitutions:**
- `events_*` β `events_20201201` (table_suffix_range:20201201 -> pad8)
---
### sid=373 iid=ga008 db=ga4
**Q:** Could you provide the total number of page views for each day in November 2020 as well as the average number of page views per user on those days, restricted to users who made at least one purchase in November 2020?
**Old gold (3):** `['events_*.user_pseudo_id', 'events_*.event_date', 'events_*.event_name']`
**New gold (3):** `['events_20201101.user_pseudo_id', 'events_20201101.event_date', 'events_20201101.event_name']`
**Substitutions:**
- `events_*` β `events_20201101` (table_suffix_range:20201101 -> pad8)
---
### sid=374 iid=ga017 db=ga4
**Q:** How many distinct users viewed the most frequently visited page during January 2021?
**Old gold (4):** `['events_*.event_params', 'events_*.event_name', 'events_*.event_timestamp', 'events_*.user_pseudo_id']`
**New gold (4):** `['events_20210101.event_params', 'events_20210101.event_name', 'events_20210101.event_timestamp', 'events_20210101.user_pseudo_id']`
**Substitutions:**
- `events_*` β `events_20210101` (table_suffix_range:20210101 -> pad8)
---
### sid=377 iid=ga018 db=ga4
**Q:** On January 2nd, 2021, I want to determine the percentage of times users transition from a product list page (PLP) view to a product detail page (PDP) view within the same session, using only page_view events. Could you calculate how many PLP views eventually led to a PDP view in the same session on
**Old gold (10):** `['events_*.event_name', 'events_*.event_date', 'events_*.event_timestamp', 'events_*.user_pseudo_id', 'events_*.user_id', 'events_*.device', 'events_*.geo', 'events_*.traffic_source', 'events_*.event_params', 'events_*.user_properties']`
**New gold (10):** `['events_20210102.event_name', 'events_20210102.event_date', 'events_20210102.event_timestamp', 'events_20210102.user_pseudo_id', 'events_20210102.user_id', 'events_20210102.device', 'events_20210102.geo', 'events_20210102.traffic_source', 'events_20210102.event_params', 'events_20210102.user_properties']`
**Substitutions:**
- `events_*` β `events_20210102` (sql_8digit_literal:20210102)
---
### sid=382 iid=ga010 db=ga4
**Q:** Can you give me an overview of our website traffic for December 2020? I'm particularly interested in the channel with the fourth highest number of sessions.
**Old gold (3):** `['events_*.user_pseudo_id', 'events_*.event_params', 'events_*.event_timestamp']`
**New gold (3):** `['events_20201201.user_pseudo_id', 'events_20201201.event_params', 'events_20201201.event_timestamp']`
**Substitutions:**
- `events_*` β `events_20201201` (table_suffix_range:20201201 -> pad8)
---
### sid=385 iid=ga012 db=ga4
**Q:** On November 30, 2020, identify the item category with the highest tax rate by dividing tax value in usd by purchase revenue in usd for purchase events, and then retrieve the transaction IDs, total item quantities, and both purchase revenue in usd and purchase revenue for those purchase events in tha
**Old gold (7):** `['events_*.items', 'events_*.ecommerce', 'events_*.ecommerce', 'events_*.event_name', 'events_*.ecommerce', 'events_*.ecommerce', 'events_*.ecommerce']`
**New gold (7):** `['events_20200101.items', 'events_20200101.ecommerce', 'events_20200101.ecommerce', 'events_20200101.event_name', 'events_20200101.ecommerce', 'events_20200101.ecommerce', 'events_20200101.ecommerce']`
**Substitutions:**
- `events_*` β `events_20200101` (q_year_fallback:2020 -> pad8)
---
|