predictivemodeler commited on
Commit
a0281ea
·
verified ·
1 Parent(s): 3dbdbe7

Upload 3 files

Browse files
vg_/csvs/create_embeddings.txt CHANGED
@@ -1598,7 +1598,7 @@ MUZAYAN_GRAM\SQLEXPRESS|MEPS|h214|ICD10CDX|nvarchar|G47|Value|
1598
  MUZAYAN_GRAM\SQLEXPRESS|MEPS|h214|ICD10CDX|nvarchar|T78|Value|
1599
  MUZAYAN_GRAM\SQLEXPRESS|MEPS|h214|ICD10CDX|nvarchar|M79|Value|
1600
  MUZAYAN_GRAM\SQLEXPRESS|MEPS|h214|ICD10CDX|nvarchar|E07|Value|
1601
- MUZAYAN_GRAM\SQLEXPRESS|MEPS|h216_narrow|ACTLIM31|smallint||Column|Indicates any limitation in work, housework, or school activities (Round 3/1). Values: 1 = Yes, 2 = No.
1602
  MUZAYAN_GRAM\SQLEXPRESS|MEPS|h216_narrow|ADACLS42|smallint||Column|Frequency of feeling accomplished less due to physical problems in the past 4 weeks (VR -12).
1603
  MUZAYAN_GRAM\SQLEXPRESS|MEPS|h216_narrow|ADCLIM42|smallint||Column|Health limits climbing stairs (VR-12 scale, past 4 weeks).
1604
  MUZAYAN_GRAM\SQLEXPRESS|MEPS|h216_narrow|ADDAYA42|smallint||Column|Delay in medical care due to cost (1=Yes, 2=No, -8=Don't Know, -7=Refused, -1=Inapplicable).
@@ -1750,6 +1750,7 @@ MUZAYAN_GRAM\SQLEXPRESS|MEPS|h216_narrow|WAGEP19X|int||Column|Numeric; person's
1750
  MUZAYAN_GRAM\SQLEXPRESS|MEPS|h216_narrow|WLKDIF31|smallint||Column|Difficulty walking: -8 DK, -1 Inapplicable, 1 No difficulty, 2 Some difficulty, 3 A lot of difficulty, 4 Unable to do.
1751
  MUZAYAN_GRAM\SQLEXPRESS|MEPS|h216_narrow|WLKLIM31|smallint||Column|2.0 numeric: -8=Don�t know, -1=Inapplicable, 1=Yes, 2=No. Indicates child�s health limitation status.
1752
  MUZAYAN_GRAM\SQLEXPRESS|MEPS|h216_narrow|WRKLIM31|smallint||Column|Indicates work limitations (-8=Don't know, -7=Refused, -1=Inapplicable, 1=Yes, 2=No).
 
1753
  MUZAYAN_GRAM\SQLEXPRESS|MEPS|h216_narrow||||Table|This is the main table, has enrollment, expenditure, and key demographic information. Can be linked to pharmacy and medical tables if needed using DUPERSID.
1754
  MUZAYAN_GRAM\SQLEXPRESS|IRS|incyallagi21||||Db_File|The Statistics of Income (SOI) Division�s county data is tabulated using individual income tax returns (Forms 1040) filed with the Internal Revenue Service (IRS) during the 12-month period, January 1, 2022 to December 31, 2022.
1755
  MUZAYAN_GRAM\SQLEXPRESS|IRS|incyallagi21||||Table|All money amounts are in thousands of dollars (so 1 = 1,000). To calculate averages, divide by number of returns (N1)
@@ -1924,3 +1925,471 @@ MUZAYAN_GRAM\SQLEXPRESS|IRS|incyallagi21|N11902|Num||Column|Number of returns wi
1924
  MUZAYAN_GRAM\SQLEXPRESS|IRS|incyallagi21|A11902|Num||Column|Overpayments refunded amount [22] ; 1040:35a
1925
  MUZAYAN_GRAM\SQLEXPRESS|IRS|incyallagi21|N12000|Num||Column|Number of returns with credit to next year�s estimated tax; 1040:36
1926
  MUZAYAN_GRAM\SQLEXPRESS|IRS|incyallagi21|A12000|Num||Column|Credited to next year�s estimated tax amount; 1040:36
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1598
  MUZAYAN_GRAM\SQLEXPRESS|MEPS|h214|ICD10CDX|nvarchar|T78|Value|
1599
  MUZAYAN_GRAM\SQLEXPRESS|MEPS|h214|ICD10CDX|nvarchar|M79|Value|
1600
  MUZAYAN_GRAM\SQLEXPRESS|MEPS|h214|ICD10CDX|nvarchar|E07|Value|
1601
+ MUZAYAN_GRAM\SQLEXPRESS|MEPS|h216_narrow|ACTLIM31|smallint||Column|Indicates any limitation in work, housework, or school activities. Values: 1 = Yes, 2 = No.
1602
  MUZAYAN_GRAM\SQLEXPRESS|MEPS|h216_narrow|ADACLS42|smallint||Column|Frequency of feeling accomplished less due to physical problems in the past 4 weeks (VR -12).
1603
  MUZAYAN_GRAM\SQLEXPRESS|MEPS|h216_narrow|ADCLIM42|smallint||Column|Health limits climbing stairs (VR-12 scale, past 4 weeks).
1604
  MUZAYAN_GRAM\SQLEXPRESS|MEPS|h216_narrow|ADDAYA42|smallint||Column|Delay in medical care due to cost (1=Yes, 2=No, -8=Don't Know, -7=Refused, -1=Inapplicable).
 
1750
  MUZAYAN_GRAM\SQLEXPRESS|MEPS|h216_narrow|WLKDIF31|smallint||Column|Difficulty walking: -8 DK, -1 Inapplicable, 1 No difficulty, 2 Some difficulty, 3 A lot of difficulty, 4 Unable to do.
1751
  MUZAYAN_GRAM\SQLEXPRESS|MEPS|h216_narrow|WLKLIM31|smallint||Column|2.0 numeric: -8=Don�t know, -1=Inapplicable, 1=Yes, 2=No. Indicates child�s health limitation status.
1752
  MUZAYAN_GRAM\SQLEXPRESS|MEPS|h216_narrow|WRKLIM31|smallint||Column|Indicates work limitations (-8=Don't know, -7=Refused, -1=Inapplicable, 1=Yes, 2=No).
1753
+ MUZAYAN_GRAM\SQLEXPRESS|MEPS|h216_narrow|PERWT19F|float||Column|Person-level weight that allows us to extrapolate to the US population.
1754
  MUZAYAN_GRAM\SQLEXPRESS|MEPS|h216_narrow||||Table|This is the main table, has enrollment, expenditure, and key demographic information. Can be linked to pharmacy and medical tables if needed using DUPERSID.
1755
  MUZAYAN_GRAM\SQLEXPRESS|IRS|incyallagi21||||Db_File|The Statistics of Income (SOI) Division�s county data is tabulated using individual income tax returns (Forms 1040) filed with the Internal Revenue Service (IRS) during the 12-month period, January 1, 2022 to December 31, 2022.
1756
  MUZAYAN_GRAM\SQLEXPRESS|IRS|incyallagi21||||Table|All money amounts are in thousands of dollars (so 1 = 1,000). To calculate averages, divide by number of returns (N1)
 
1925
  MUZAYAN_GRAM\SQLEXPRESS|IRS|incyallagi21|A11902|Num||Column|Overpayments refunded amount [22] ; 1040:35a
1926
  MUZAYAN_GRAM\SQLEXPRESS|IRS|incyallagi21|N12000|Num||Column|Number of returns with credit to next year�s estimated tax; 1040:36
1927
  MUZAYAN_GRAM\SQLEXPRESS|IRS|incyallagi21|A12000|Num||Column|Credited to next year�s estimated tax amount; 1040:36
1928
+ MySQL-001|freedb_vg_demo|Album|AlbumId|int||Column|
1929
+ MySQL-001|freedb_vg_demo|Album|ArtistId|int||Column|
1930
+ MySQL-001|freedb_vg_demo|Album|Title|nvarchar||Column|
1931
+ MySQL-001|freedb_vg_demo|Album||||Table|
1932
+ MySQL-001|freedb_vg_demo|Artist|ArtistId|int||Column|
1933
+ MySQL-001|freedb_vg_demo|Artist|Name|nvarchar||Column|
1934
+ MySQL-001|freedb_vg_demo|Artist||||Table|
1935
+ MySQL-001|freedb_vg_demo|Customer|Address|nvarchar||Column|
1936
+ MySQL-001|freedb_vg_demo|Customer|City|nvarchar||Column|
1937
+ MySQL-001|freedb_vg_demo|Customer|Company|nvarchar||Column|
1938
+ MySQL-001|freedb_vg_demo|Customer|Country|nvarchar||Column|
1939
+ MySQL-001|freedb_vg_demo|Customer|CustomerId|int||Column|
1940
+ MySQL-001|freedb_vg_demo|Customer|Email|nvarchar||Column|
1941
+ MySQL-001|freedb_vg_demo|Customer|Fax|nvarchar||Column|
1942
+ MySQL-001|freedb_vg_demo|Customer|FirstName|nvarchar||Column|
1943
+ MySQL-001|freedb_vg_demo|Customer|LastName|nvarchar||Column|
1944
+ MySQL-001|freedb_vg_demo|Customer|Phone|nvarchar||Column|
1945
+ MySQL-001|freedb_vg_demo|Customer|PostalCode|nvarchar||Column|
1946
+ MySQL-001|freedb_vg_demo|Customer|State|nvarchar||Column|
1947
+ MySQL-001|freedb_vg_demo|Customer|SupportRepId|int||Column|
1948
+ MySQL-001|freedb_vg_demo|Customer||||Table|
1949
+ MySQL-001|freedb_vg_demo|Employee|Address|nvarchar||Column|
1950
+ MySQL-001|freedb_vg_demo|Employee|BirthDate|datetime||Column|
1951
+ MySQL-001|freedb_vg_demo|Employee|City|nvarchar||Column|
1952
+ MySQL-001|freedb_vg_demo|Employee|Country|nvarchar||Column|
1953
+ MySQL-001|freedb_vg_demo|Employee|Email|nvarchar||Column|
1954
+ MySQL-001|freedb_vg_demo|Employee|EmployeeId|int||Column|
1955
+ MySQL-001|freedb_vg_demo|Employee|Fax|nvarchar||Column|
1956
+ MySQL-001|freedb_vg_demo|Employee|FirstName|nvarchar||Column|
1957
+ MySQL-001|freedb_vg_demo|Employee|HireDate|datetime||Column|
1958
+ MySQL-001|freedb_vg_demo|Employee|LastName|nvarchar||Column|
1959
+ MySQL-001|freedb_vg_demo|Employee|Phone|nvarchar||Column|
1960
+ MySQL-001|freedb_vg_demo|Employee|PostalCode|nvarchar||Column|
1961
+ MySQL-001|freedb_vg_demo|Employee|ReportsTo|int||Column|
1962
+ MySQL-001|freedb_vg_demo|Employee|State|nvarchar||Column|
1963
+ MySQL-001|freedb_vg_demo|Employee|Title|nvarchar||Column|
1964
+ MySQL-001|freedb_vg_demo|Employee||||Table|
1965
+ MySQL-001|freedb_vg_demo|Genre|GenreId|int||Column|
1966
+ MySQL-001|freedb_vg_demo|Genre|Name|nvarchar||Column|
1967
+ MySQL-001|freedb_vg_demo|Genre||||Table|
1968
+ MySQL-001|freedb_vg_demo|Invoice|BillingAddress|nvarchar||Column|
1969
+ MySQL-001|freedb_vg_demo|Invoice|BillingCity|nvarchar||Column|
1970
+ MySQL-001|freedb_vg_demo|Invoice|BillingCountry|nvarchar||Column|
1971
+ MySQL-001|freedb_vg_demo|Invoice|BillingPostalCode|nvarchar||Column|
1972
+ MySQL-001|freedb_vg_demo|Invoice|BillingState|nvarchar||Column|
1973
+ MySQL-001|freedb_vg_demo|Invoice|CustomerId|int||Column|
1974
+ MySQL-001|freedb_vg_demo|Invoice|InvoiceDate|datetime||Column|
1975
+ MySQL-001|freedb_vg_demo|Invoice|InvoiceId|int||Column|
1976
+ MySQL-001|freedb_vg_demo|Invoice|Total|numeric||Column|Total sale
1977
+ MySQL-001|freedb_vg_demo|Invoice||||Table|Invoices for years 2009 to 2013
1978
+ MySQL-001|freedb_vg_demo|InvoiceLine|InvoiceId|int||Column|
1979
+ MySQL-001|freedb_vg_demo|InvoiceLine|InvoiceLineId|int||Column|
1980
+ MySQL-001|freedb_vg_demo|InvoiceLine|Quantity|int||Column|
1981
+ MySQL-001|freedb_vg_demo|InvoiceLine|TrackId|int||Column|
1982
+ MySQL-001|freedb_vg_demo|InvoiceLine|UnitPrice|numeric||Column|
1983
+ MySQL-001|freedb_vg_demo|InvoiceLine||||Table|Detailed line items for Invoices
1984
+ MySQL-001|freedb_vg_demo|MediaType|MediaTypeId|int||Column|
1985
+ MySQL-001|freedb_vg_demo|MediaType|Name|nvarchar||Column|
1986
+ MySQL-001|freedb_vg_demo|MediaType||||Table|
1987
+ MySQL-001|freedb_vg_demo|Playlist|Name|nvarchar||Column|
1988
+ MySQL-001|freedb_vg_demo|Playlist|PlaylistId|int||Column|
1989
+ MySQL-001|freedb_vg_demo|Playlist||||Table|
1990
+ MySQL-001|freedb_vg_demo|PlaylistTrack|PlaylistId|int||Column|
1991
+ MySQL-001|freedb_vg_demo|PlaylistTrack|TrackId|int||Column|
1992
+ MySQL-001|freedb_vg_demo|PlaylistTrack||||Table|
1993
+ MySQL-001|freedb_vg_demo|Track|AlbumId|int||Column|
1994
+ MySQL-001|freedb_vg_demo|Track|Bytes|int||Column|
1995
+ MySQL-001|freedb_vg_demo|Track|Composer|nvarchar||Column|
1996
+ MySQL-001|freedb_vg_demo|Track|GenreId|int||Column|
1997
+ MySQL-001|freedb_vg_demo|Track|MediaTypeId|int||Column|
1998
+ MySQL-001|freedb_vg_demo|Track|Milliseconds|int||Column|
1999
+ MySQL-001|freedb_vg_demo|Track|Name|nvarchar||Column|
2000
+ MySQL-001|freedb_vg_demo|Track|TrackId|int||Column|
2001
+ MySQL-001|freedb_vg_demo|Track|UnitPrice|numeric||Column|
2002
+ MySQL-001|freedb_vg_demo|Track||||Table|
2003
+ MySQL-001|freedb_vg_demo|||||Db_File|The Chinook data model represents a digital media store, including tables for artists, albums, media tracks, invoices, and customers.
2004
+ MySQL-001|freedb_vg_demo|Track|Name|nvarchar||Value|
2005
+ MySQL-001|vg_meps01|||||Db_File|This is the Medical Expenditure Panel Survey Database. Has enrollment, medical and pharmacy tables.
2006
+ MySQL-001|vg_meps01|h213a|DIABEQUIP|tinyint||Column|Indicates presence of diabetic equipment/supplies (1) or other items (2).
2007
+ MySQL-001|vg_meps01|h213a|DRUGIDX|bigint||Column|Unique identifier combining person ID and a counter for drug records.
2008
+ MySQL-001|vg_meps01|h213a|DUID|int||Column|Panel number concatenated with encrypted household identifier.
2009
+ MySQL-001|vg_meps01|h213a|DUPERSID|bigint||Column|Unique identifier combining DUID and PID.
2010
+ MySQL-001|vg_meps01|h213a|IMPFLAG|tinyint||Column|Indicates the method of expenditure data creation.
2011
+ MySQL-001|vg_meps01|h213a|INPCFLG|int||Column|Indicates presence in 2019 Prescribed Medicines file: 0=Not present, 1=Present.
2012
+ MySQL-001|vg_meps01|h213a|LINKIDX|bigint||Column|Unique identifier for linking to condition or other event files.
2013
+ MySQL-001|vg_meps01|h213a|PANEL|tinyint||Column|Number identifying the survey panel (1 to 6) in the MEPS HC-213A 2019 dataset.
2014
+ MySQL-001|vg_meps01|h213a|PCIMPFLG|tinyint||Column|Type of Health Component (HC) to Pharmacy Component (PC) prescription match.
2015
+ MySQL-001|vg_meps01|h213a|PERWT19F|float||Column|Weighting factor for 2019 expenditure data; values range from 633.006358 to 77911.063903.
2016
+ MySQL-001|vg_meps01|h213a|PHARTP1|smallint||Column|Defines the type of pharmacy provider for the first prescription: mail-order, in-store, HMO/clinic/hospital, online, etc.
2017
+ MySQL-001|vg_meps01|h213a|PHARTP2|smallint||Column|Categorical variable defining type of pharmacy provider for the 2nd instance. Values: 1=MAIL-ORDER, 2=IN ANOTHER STORE, 3=IN HMO/CLINIC/HOSPITAL, 4=DRUG STORE, 5=ON-LINE.
2018
+ MySQL-001|vg_meps01|h213a|PHARTP3|smallint||Column|Type of pharmacy provision (3rd occurrence), coded as: 1=Mail-order, 2=Another store, 3=HMO/clinic/hospital, 4=Drug store, 5=Online.
2019
+ MySQL-001|vg_meps01|h213a|PHARTP4|smallint||Column|Identifies the 4th type of pharmacy provider (e.g., mail-order, drug store). Values include: 1=mail-order, 2=in another store, 3=in HMO/clinic/hospital, 4=drug store, 5=online
2020
+ MySQL-001|vg_meps01|h213a|PHARTP5|smallint||Column|Type of pharmacy provider: 5th instance; values include mail-order, other store, HMO/clinic/hospital, drug store.
2021
+ MySQL-001|vg_meps01|h213a|PHARTP6|smallint||Column|Type of pharmacy provider - 6th: Mail-order, in another store, HMO/clinic/hospital, drug store, or online.
2022
+ MySQL-001|vg_meps01|h213a|PHARTP7|smallint||Column|Type of pharmacy (7th): 1=Mail-order, 3=HMO/clinic/hospital, 4=Drug store.
2023
+ MySQL-001|vg_meps01|h213a|PID|int||Column|Unique identifier for a person within a household, ranging from 101 to 503.
2024
+ MySQL-001|vg_meps01|h213a|PREGCAT|nvarchar||Column|Internal Server Error
2025
+ MySQL-001|vg_meps01|h213a|PURCHRD|tinyint||Column|'' indicates the weighted count of drug purchases.
2026
+ MySQL-001|vg_meps01|h213a|RXBEGMM|smallint||Column|Month the person started taking medication (format: 3.0 Num).
2027
+ MySQL-001|vg_meps01|h213a|RXBEGYRX|smallint||Column|Imputed year prescription began; weighted by PERWT19F.
2028
+ MySQL-001|vg_meps01|h213a|RXDAYSUP|smallint||Column|Imputed number of days supplied for prescribed medication.
2029
+ MySQL-001|vg_meps01|h213a|RXDRGNAM|nvarchar||Column|Multum medication name, imputed, 60-character alphanumeric string.
2030
+ MySQL-001|vg_meps01|h213a|RXFLG|tinyint||Column|NDC imputation source from PC donor record; indicates data origin.
2031
+ MySQL-001|vg_meps01|h213a|RXFORM|nvarchar||Column|Char (50.0): Dosage form of medication, imputed, weighted by PERWT19F.
2032
+ MySQL-001|vg_meps01|h213a|RXFRMUNT|nvarchar||Column|Unit of medication quantity, imputed; values include alphanumeric characters, "DK", or "CANNOT BE COMPUTED".
2033
+ MySQL-001|vg_meps01|h213a|RXMD19X|float||Column|Amount paid by other insurance for prescribed medications (imputed).
2034
+ MySQL-001|vg_meps01|h213a|RXMR19X|float||Column|Amount paid by Medicare (imputed) in dollars, ranges from $0.01 to $23,136.88
2035
+ MySQL-001|vg_meps01|h213a|RXNAME|nvarchar||Column|60.0-character imputed field capturing medicine names in A-Z format; values may be missing (indicated by "-15").
2036
+ MySQL-001|vg_meps01|h213a|RXNDC|bigint||Column|Unique National Drug Code identifier for prescribed medications.
2037
+ MySQL-001|vg_meps01|h213a|RXOF19X|float||Column|Amount paid by other insurance for prescribed medicines, imputed.
2038
+ MySQL-001|vg_meps01|h213a|RXOT19X|float||Column|Amount paid by other insurance, imputed; numeric format (7.2); values range from $0.20 to $4,229.30.
2039
+ MySQL-001|vg_meps01|h213a|RXPV19X|float||Column|Amount paid by private insurance for prescribed medicines, in dollars (imputed).
2040
+ MySQL-001|vg_meps01|h213a|RXQUANTY|float||Column|Imputed numeric field (7.2) representing prescribed medication quantity; range 0.05 to 6399.
2041
+ MySQL-001|vg_meps01|h213a|RXRECIDX|bigint||Column|Unique identifier for prescribed medications, 19-character alphanumeric with imputed values.
2042
+ MySQL-001|vg_meps01|h213a|RXSF19X|float||Column|Amount paid by self or family for prescribed medicines (imputed).
2043
+ MySQL-001|vg_meps01|h213a|RXSL19X|float||Column|Amount paid by state and local government for prescribed medicines (imputed), 8.2 numeric format.
2044
+ MySQL-001|vg_meps01|h213a|RXSTRENG|nvarchar||Column|Imputed strength of medication, 50-character format, with values ranging from 'A-Z' or undefined ('-15 CANNOT BE COMPUTED').
2045
+ MySQL-001|vg_meps01|h213a|RXSTRUNT|nvarchar||Column|Unit of medication strength, imputed, with values ranging from A-Z and -15 for "CANNOT BE COMPUTED.
2046
+ MySQL-001|vg_meps01|h213a|RXTR19X|float||Column|Amount paid by TRICARE, imputed, in ranges from $0.03 to $20,484.48.
2047
+ MySQL-001|vg_meps01|h213a|RXVA19X|float||Column|Amount paid by other insurance, imputed (numeric, 7.2).
2048
+ MySQL-001|vg_meps01|h213a|RXWC19X|float||Column|**Amount paid by Workers Compensation (imputed), numeric value.**
2049
+ MySQL-001|vg_meps01|h213a|RXXP19X|float||Column|Sum of payments from self/family minus other insurance (imputed).
2050
+ MySQL-001|vg_meps01|h213a|TC1|smallint||Column|Therapeutic class category for multiple medications.
2051
+ MySQL-001|vg_meps01|h213a|TC1S1|smallint||Column|Therapeutic subclass for '', defining specific drug categories.
2052
+ MySQL-001|vg_meps01|h213a|TC1S1_1|smallint||Column|Sub-sub-class for multiple therapeutic classifications, specifically for T C1S1.
2053
+ MySQL-001|vg_meps01|h213a|TC1S1_2|smallint||Column|Multi-therapeutic sub-sub-class for therapeutic sub-class #1.
2054
+ MySQL-001|vg_meps01|h213a|TC1S2|smallint||Column|Therapeutic subclass #2 under therapeutic class C1.
2055
+ MySQL-001|vg_meps01|h213a|TC1S2_1|smallint||Column|Represents therapeutic sub-sub-class for multiple tumor classifications (T C1S2).
2056
+ MySQL-001|vg_meps01|h213a|TC1S3|smallint||Column|Therapeutic subclass #3 for therapeutic class C1, values: -1 (inapplicable), 67 (anxiolytics, sedatives, hypnotics).
2057
+ MySQL-001|vg_meps01|h213a|TC1S3_1|smallint||Column|Multum therapeutic sub-sub-class for therapeutic sub-class #3, with defined values.
2058
+ MySQL-001|vg_meps01|h213a|TC2|smallint||Column|Multiclass therapeutic classification #2 for tumor-related drugs.
2059
+ MySQL-001|vg_meps01|h213a|TC2S1|smallint||Column|Therapeutic Subclass #1 for Class #2 in Multum Classification System.
2060
+ MySQL-001|vg_meps01|h213a|TC2S1_1|smallint||Column|MUL TUM therapeutic sub-sub-class for therapeutic sub-class #1 of class #2.
2061
+ MySQL-001|vg_meps01|h213a|TC2S1_2|smallint||Column|Sub-subclass for therapeutic class #2, subclass #1.
2062
+ MySQL-001|vg_meps01|h213a|TC2S2|smallint||Column|Therapeutic subclass #2 for class #2; values include INAPPLICABLE and ANTIPSORIATICS.
2063
+ MySQL-001|vg_meps01|h213a|TC3|smallint||Column|Therapeutic class #3, primarily includes nutritional and immunologic agents.
2064
+ MySQL-001|vg_meps01|h213a|TC3S1|smallint||Column|Therapeutic subclass #1 for therapeutic class #3 (immunosuppressive agents, minerals, and electrolytes).
2065
+ MySQL-001|vg_meps01|h213a|TC3S1_1|smallint||Column|Sub-subclass for multum therapeutic classification #1 under class #3.
2066
+ MySQL-001|vg_meps01|h213a|VARPSU|int||Column|Variable '': Represents the primary sampling unit (PSU) identifier for survey data analysis.
2067
+ MySQL-001|vg_meps01|h213a|VARSTR|smallint||Column|A 4-digit numerical code representing variance estimation strata for 2019.
2068
+ MySQL-001|vg_meps01|h213a||||Table|This is the pharmacy utilization table. Use DUPERSID for unique person identifier. This table can have multiple rows per identifier.
2069
+ MySQL-001|vg_meps01|h213a|PREGCAT|nvarchar|N|Value|
2070
+ MySQL-001|vg_meps01|h213a|PREGCAT|nvarchar|C|Value|
2071
+ MySQL-001|vg_meps01|h213a|PREGCAT|nvarchar|B|Value|
2072
+ MySQL-001|vg_meps01|h213a|PREGCAT|nvarchar|D|Value|
2073
+ MySQL-001|vg_meps01|h213a|PREGCAT|nvarchar|X|Value|
2074
+ MySQL-001|vg_meps01|h213a|PREGCAT|nvarchar|-15|Value|
2075
+ MySQL-001|vg_meps01|h213a|PREGCAT|nvarchar|A|Value|
2076
+ MySQL-001|vg_meps01|h213a|RXDRGNAM|nvarchar|ATORVASTATIN|Value|
2077
+ MySQL-001|vg_meps01|h213a|RXDRGNAM|nvarchar|LEVOTHYROXINE|Value|
2078
+ MySQL-001|vg_meps01|h213a|RXDRGNAM|nvarchar|LISINOPRIL|Value|
2079
+ MySQL-001|vg_meps01|h213a|RXDRGNAM|nvarchar|METFORMIN|Value|
2080
+ MySQL-001|vg_meps01|h213a|RXDRGNAM|nvarchar|AMLODIPINE|Value|
2081
+ MySQL-001|vg_meps01|h213a|RXDRGNAM|nvarchar|METOPROLOL|Value|
2082
+ MySQL-001|vg_meps01|h213a|RXDRGNAM|nvarchar|-15|Value|
2083
+ MySQL-001|vg_meps01|h213a|RXDRGNAM|nvarchar|ALBUTEROL|Value|
2084
+ MySQL-001|vg_meps01|h213a|RXDRGNAM|nvarchar|OMEPRAZOLE|Value|
2085
+ MySQL-001|vg_meps01|h213a|RXDRGNAM|nvarchar|LOSARTAN|Value|
2086
+ MySQL-001|vg_meps01|h213a|RXDRGNAM|nvarchar|GABAPENTIN|Value|
2087
+ MySQL-001|vg_meps01|h213a|RXDRGNAM|nvarchar|HYDROCHLOROTHIAZIDE|Value|
2088
+ MySQL-001|vg_meps01|h213a|RXDRGNAM|nvarchar|SIMVASTATIN|Value|
2089
+ MySQL-001|vg_meps01|h213a|RXDRGNAM|nvarchar|SERTRALINE|Value|
2090
+ MySQL-001|vg_meps01|h213a|RXDRGNAM|nvarchar|ACETAMINOPHEN-HYDROCODONE|Value|
2091
+ MySQL-001|vg_meps01|h213a|RXFORM|nvarchar|TABS|Value|
2092
+ MySQL-001|vg_meps01|h213a|RXFORM|nvarchar|CAPS|Value|
2093
+ MySQL-001|vg_meps01|h213a|RXFORM|nvarchar|-15|Value|
2094
+ MySQL-001|vg_meps01|h213a|RXFORM|nvarchar|TB24|Value|
2095
+ MySQL-001|vg_meps01|h213a|RXFORM|nvarchar|SOLN|Value|
2096
+ MySQL-001|vg_meps01|h213a|RXFORM|nvarchar|TAB|Value|
2097
+ MySQL-001|vg_meps01|h213a|RXFORM|nvarchar|CPDR|Value|
2098
+ MySQL-001|vg_meps01|h213a|RXFORM|nvarchar|AERS|Value|
2099
+ MySQL-001|vg_meps01|h213a|RXFORM|nvarchar|TBEC|Value|
2100
+ MySQL-001|vg_meps01|h213a|RXFORM|nvarchar|SOPN|Value|
2101
+ MySQL-001|vg_meps01|h213a|RXFORM|nvarchar|CP24|Value|
2102
+ MySQL-001|vg_meps01|h213a|RXFORM|nvarchar|SUSP|Value|
2103
+ MySQL-001|vg_meps01|h213a|RXFORM|nvarchar|TBCR|Value|
2104
+ MySQL-001|vg_meps01|h213a|RXFORM|nvarchar|STRP|Value|
2105
+ MySQL-001|vg_meps01|h213a|RXFORM|nvarchar|MISC|Value|
2106
+ MySQL-001|vg_meps01|h213a|RXFRMUNT|nvarchar|EA|Value|
2107
+ MySQL-001|vg_meps01|h213a|RXFRMUNT|nvarchar|-15|Value|
2108
+ MySQL-001|vg_meps01|h213a|RXFRMUNT|nvarchar|ML|Value|
2109
+ MySQL-001|vg_meps01|h213a|RXFRMUNT|nvarchar|GM|Value|
2110
+ MySQL-001|vg_meps01|h213a|RXFRMUNT|nvarchar|G|Value|
2111
+ MySQL-001|vg_meps01|h213a|RXFRMUNT|nvarchar|-8|Value|
2112
+ MySQL-001|vg_meps01|h213a|RXFRMUNT|nvarchar|test strips|Value|
2113
+ MySQL-001|vg_meps01|h213a|RXFRMUNT|nvarchar|Pen needles|Value|
2114
+ MySQL-001|vg_meps01|h213a|RXFRMUNT|nvarchar|lancets|Value|
2115
+ MySQL-001|vg_meps01|h213a|RXFRMUNT|nvarchar|OTHER|Value|
2116
+ MySQL-001|vg_meps01|h213a|RXFRMUNT|nvarchar|SRN|Value|
2117
+ MySQL-001|vg_meps01|h213a|RXFRMUNT|nvarchar|INH|Value|
2118
+ MySQL-001|vg_meps01|h213a|RXFRMUNT|nvarchar|GELC|Value|
2119
+ MySQL-001|vg_meps01|h213a|RXFRMUNT|nvarchar|Device|Value|
2120
+ MySQL-001|vg_meps01|h213a|RXFRMUNT|nvarchar|NDL|Value|
2121
+ MySQL-001|vg_meps01|h213a|RXNAME|nvarchar|ATORVASTATIN|Value|
2122
+ MySQL-001|vg_meps01|h213a|RXNAME|nvarchar|LISINOPRIL|Value|
2123
+ MySQL-001|vg_meps01|h213a|RXNAME|nvarchar|METFORMIN|Value|
2124
+ MySQL-001|vg_meps01|h213a|RXNAME|nvarchar|LEVOTHYROXIN|Value|
2125
+ MySQL-001|vg_meps01|h213a|RXNAME|nvarchar|AMLODIPINE|Value|
2126
+ MySQL-001|vg_meps01|h213a|RXNAME|nvarchar|OMEPRAZOLE|Value|
2127
+ MySQL-001|vg_meps01|h213a|RXNAME|nvarchar|LOSARTAN POT|Value|
2128
+ MySQL-001|vg_meps01|h213a|RXNAME|nvarchar|GABAPENTIN|Value|
2129
+ MySQL-001|vg_meps01|h213a|RXNAME|nvarchar|HYDROCHLOROT|Value|
2130
+ MySQL-001|vg_meps01|h213a|RXNAME|nvarchar|SIMVASTATIN|Value|
2131
+ MySQL-001|vg_meps01|h213a|RXNAME|nvarchar|ALBUTEROL|Value|
2132
+ MySQL-001|vg_meps01|h213a|RXNAME|nvarchar|SERTRALINE|Value|
2133
+ MySQL-001|vg_meps01|h213a|RXNAME|nvarchar|METOPROL TAR|Value|
2134
+ MySQL-001|vg_meps01|h213a|RXNAME|nvarchar|METOPROL SUC|Value|
2135
+ MySQL-001|vg_meps01|h213a|RXNAME|nvarchar|FUROSEMIDE|Value|
2136
+ MySQL-001|vg_meps01|h213a|RXSTRENG|nvarchar|-15|Value|
2137
+ MySQL-001|vg_meps01|h213a|RXSTRENG|nvarchar|10|Value|
2138
+ MySQL-001|vg_meps01|h213a|RXSTRENG|nvarchar|20|Value|
2139
+ MySQL-001|vg_meps01|h213a|RXSTRENG|nvarchar|100|Value|
2140
+ MySQL-001|vg_meps01|h213a|RXSTRENG|nvarchar|40|Value|
2141
+ MySQL-001|vg_meps01|h213a|RXSTRENG|nvarchar|50|Value|
2142
+ MySQL-001|vg_meps01|h213a|RXSTRENG|nvarchar|5|Value|
2143
+ MySQL-001|vg_meps01|h213a|RXSTRENG|nvarchar|25|Value|
2144
+ MySQL-001|vg_meps01|h213a|RXSTRENG|nvarchar|500|Value|
2145
+ MySQL-001|vg_meps01|h213a|RXSTRENG|nvarchar|1|Value|
2146
+ MySQL-001|vg_meps01|h213a|RXSTRENG|nvarchar|300|Value|
2147
+ MySQL-001|vg_meps01|h213a|RXSTRENG|nvarchar|75|Value|
2148
+ MySQL-001|vg_meps01|h213a|RXSTRENG|nvarchar|108|Value|
2149
+ MySQL-001|vg_meps01|h213a|RXSTRENG|nvarchar|150|Value|
2150
+ MySQL-001|vg_meps01|h213a|RXSTRENG|nvarchar|200|Value|
2151
+ MySQL-001|vg_meps01|h213a|RXSTRUNT|nvarchar|MG|Value|
2152
+ MySQL-001|vg_meps01|h213a|RXSTRUNT|nvarchar|-15|Value|
2153
+ MySQL-001|vg_meps01|h213a|RXSTRUNT|nvarchar|MCG|Value|
2154
+ MySQL-001|vg_meps01|h213a|RXSTRUNT|nvarchar|%|Value|
2155
+ MySQL-001|vg_meps01|h213a|RXSTRUNT|nvarchar|MCG/ACT|Value|
2156
+ MySQL-001|vg_meps01|h213a|RXSTRUNT|nvarchar|MG/ML|Value|
2157
+ MySQL-001|vg_meps01|h213a|RXSTRUNT|nvarchar|UNIT/ML|Value|
2158
+ MySQL-001|vg_meps01|h213a|RXSTRUNT|nvarchar|UNIT|Value|
2159
+ MySQL-001|vg_meps01|h213a|RXSTRUNT|nvarchar|MEQ|Value|
2160
+ MySQL-001|vg_meps01|h213a|RXSTRUNT|nvarchar|GM|Value|
2161
+ MySQL-001|vg_meps01|h213a|RXSTRUNT|nvarchar|MG/MG|Value|
2162
+ MySQL-001|vg_meps01|h213a|RXSTRUNT|nvarchar|MG/24HR|Value|
2163
+ MySQL-001|vg_meps01|h213a|RXSTRUNT|nvarchar|MG/GM|Value|
2164
+ MySQL-001|vg_meps01|h213a|RXSTRUNT|nvarchar|MCG/ML|Value|
2165
+ MySQL-001|vg_meps01|h213a|RXSTRUNT|nvarchar|MCG/HR|Value|
2166
+ MySQL-001|vg_meps01|h214|ACCDNWRK|smallint||Column|Indicates if an accident occurred at work: 1=Yes, 2=No, 3=Does not work.
2167
+ MySQL-001|vg_meps01|h214|AGEDIAG|smallint||Column|Age at which the condition was diagnosed, ranging from 0 to 85 years.
2168
+ MySQL-001|vg_meps01|h214|CCSR1X|nvarchar||Column|Clinical classification refined code 1, edited (e.g., BLD000, CIR000).
2169
+ MySQL-001|vg_meps01|h214|CCSR2X|nvarchar||Column|Secondary edited clinical classification refined code for medical conditions.
2170
+ MySQL-001|vg_meps01|h214|CCSR3X|nvarchar||Column|Refined clinical classification code, third edited version, with defined values (e.g., INF000 - INF011).
2171
+ MySQL-001|vg_meps01|h214|CONDIDX|bigint||Column|Unique identifier for a specific medical condition in the dataset.
2172
+ MySQL-001|vg_meps01|h214|CONDN|int||Column|Unique identifier for a medical condition, ranging from 1 to 54.
2173
+ MySQL-001|vg_meps01|h214|CONDRN|int||Column|Round number in which the medical condition was first reported (values: 1 to 5).
2174
+ MySQL-001|vg_meps01|h214|CRND1|smallint||Column|Indicates presence of condition information in Round 1 with values: -1 (inapplicable), 0 (no), 1 (yes).
2175
+ MySQL-001|vg_meps01|h214|CRND2|smallint||Column|Indicates presence of condition information in Round 2 (-1: inapplicable, 0: no, 1: yes)
2176
+ MySQL-001|vg_meps01|h214|CRND3|int||Column|Values: 0 (NO), 1 (YES) - indicates condition information presence in Round 3.
2177
+ MySQL-001|vg_meps01|h214|CRND4|int||Column|Indicates presence of condition information in Round 4: -1=Inapplicable, 0=No, 1=Yes.
2178
+ MySQL-001|vg_meps01|h214|CRND5|int||Column|Indicates whether condition information is available in Round 5 (1=Yes, 0=No).
2179
+ MySQL-001|vg_meps01|h214|DUID|int||Column|Internal Server Error
2180
+ MySQL-001|vg_meps01|h214|DUPERSID|bigint||Column|Unique identifier combining DUID and PID for individuals in the MEPS dataset.
2181
+ MySQL-001|vg_meps01|h214|ERNUM|int||Column|Number of emergency room events associated with a condition.
2182
+ MySQL-001|vg_meps01|h214|HHNUM|int||Column|Number of home health events associated with a condition, ranging from 0 to 60.
2183
+ MySQL-001|vg_meps01|h214|ICD10CDX|nvarchar||Column|Represents the ICD-10-CM code for a medical condition, formatted as 3-character string.
2184
+ MySQL-001|vg_meps01|h214|INJURY|int||Column|Binary indicator (1=yes, 2=no) if condition resulted from accident/injury.
2185
+ MySQL-001|vg_meps01|h214|IPNUM|int||Column|Number of inpatient events associated with the condition, ranging from 0 to 6.
2186
+ MySQL-001|vg_meps01|h214|OBNUM|int||Column|Number of office-based medical events associated with a condition, range 0-499.
2187
+ MySQL-001|vg_meps01|h214|OPNUM|int||Column|Number of outpatient events associated with a condition (0 to 141).
2188
+ MySQL-001|vg_meps01|h214|PANEL|int||Column|Identifies the survey panel number (e.g., 23 or 24) for data collection.
2189
+ MySQL-001|vg_meps01|h214|PERWT19F|float||Column|Weight for 2019 expenditure file, used for survey weighting.
2190
+ MySQL-001|vg_meps01|h214|PID|int||Column|Unique identifier for an individual within a panel.
2191
+ MySQL-001|vg_meps01|h214|RXNUM|int||Column|Number of prescribed medicines associated with a condition.
2192
+ MySQL-001|vg_meps01|h214|VARPSU|int||Column|Variance estimation primary sampling unit (PSU), specific to 2019 data.
2193
+ MySQL-001|vg_meps01|h214|VARSTR|smallint||Column|A stratum identifier used for variance estimation in the 2019 dataset.
2194
+ MySQL-001|vg_meps01|h214||||Table|Use DUPERSID for unique person identifier
2195
+ MySQL-001|vg_meps01|h214|CCSR1X|nvarchar|CIR007|Value|
2196
+ MySQL-001|vg_meps01|h214|CCSR1X|nvarchar|END010|Value|
2197
+ MySQL-001|vg_meps01|h214|CCSR1X|nvarchar|MUS010|Value|
2198
+ MySQL-001|vg_meps01|h214|CCSR1X|nvarchar|END002|Value|
2199
+ MySQL-001|vg_meps01|h214|CCSR1X|nvarchar|MBD005|Value|
2200
+ MySQL-001|vg_meps01|h214|CCSR1X|nvarchar|END001|Value|
2201
+ MySQL-001|vg_meps01|h214|CCSR1X|nvarchar|MBD002|Value|
2202
+ MySQL-001|vg_meps01|h214|CCSR1X|nvarchar|MUS006|Value|
2203
+ MySQL-001|vg_meps01|h214|CCSR1X|nvarchar|RSP009|Value|
2204
+ MySQL-001|vg_meps01|h214|CCSR1X|nvarchar|DIG004|Value|
2205
+ MySQL-001|vg_meps01|h214|CCSR1X|nvarchar|NVS016|Value|
2206
+ MySQL-001|vg_meps01|h214|CCSR1X|nvarchar|SYM016|Value|
2207
+ MySQL-001|vg_meps01|h214|CCSR1X|nvarchar|MUS011|Value|
2208
+ MySQL-001|vg_meps01|h214|CCSR1X|nvarchar|INJ067|Value|
2209
+ MySQL-001|vg_meps01|h214|CCSR1X|nvarchar|RSP006|Value|
2210
+ MySQL-001|vg_meps01|h214|CCSR2X|nvarchar|-1|Value|
2211
+ MySQL-001|vg_meps01|h214|CCSR2X|nvarchar|END005|Value|
2212
+ MySQL-001|vg_meps01|h214|CCSR2X|nvarchar|MUS038|Value|
2213
+ MySQL-001|vg_meps01|h214|CCSR2X|nvarchar|RSP006|Value|
2214
+ MySQL-001|vg_meps01|h214|CCSR2X|nvarchar|RSP007|Value|
2215
+ MySQL-001|vg_meps01|h214|CCSR2X|nvarchar|SKN007|Value|
2216
+ MySQL-001|vg_meps01|h214|CCSR2X|nvarchar|SYM010|Value|
2217
+ MySQL-001|vg_meps01|h214|CCSR2X|nvarchar|INF008|Value|
2218
+ MySQL-001|vg_meps01|h214|CCSR2X|nvarchar|NVS016|Value|
2219
+ MySQL-001|vg_meps01|h214|CCSR2X|nvarchar|SKN005|Value|
2220
+ MySQL-001|vg_meps01|h214|CCSR2X|nvarchar|FAC009|Value|
2221
+ MySQL-001|vg_meps01|h214|CCSR2X|nvarchar|SKN002|Value|
2222
+ MySQL-001|vg_meps01|h214|CCSR2X|nvarchar|INF003|Value|
2223
+ MySQL-001|vg_meps01|h214|CCSR2X|nvarchar|INF009|Value|
2224
+ MySQL-001|vg_meps01|h214|CCSR2X|nvarchar|INF000|Value|
2225
+ MySQL-001|vg_meps01|h214|CCSR3X|nvarchar|-1|Value|
2226
+ MySQL-001|vg_meps01|h214|CCSR3X|nvarchar|INF004|Value|
2227
+ MySQL-001|vg_meps01|h214|ICD10CDX|nvarchar|I10|Value|
2228
+ MySQL-001|vg_meps01|h214|ICD10CDX|nvarchar|-15|Value|
2229
+ MySQL-001|vg_meps01|h214|ICD10CDX|nvarchar|E78|Value|
2230
+ MySQL-001|vg_meps01|h214|ICD10CDX|nvarchar|E11|Value|
2231
+ MySQL-001|vg_meps01|h214|ICD10CDX|nvarchar|F41|Value|
2232
+ MySQL-001|vg_meps01|h214|ICD10CDX|nvarchar|M25|Value|
2233
+ MySQL-001|vg_meps01|h214|ICD10CDX|nvarchar|F32|Value|
2234
+ MySQL-001|vg_meps01|h214|ICD10CDX|nvarchar|J45|Value|
2235
+ MySQL-001|vg_meps01|h214|ICD10CDX|nvarchar|M19|Value|
2236
+ MySQL-001|vg_meps01|h214|ICD10CDX|nvarchar|M54|Value|
2237
+ MySQL-001|vg_meps01|h214|ICD10CDX|nvarchar|K21|Value|
2238
+ MySQL-001|vg_meps01|h214|ICD10CDX|nvarchar|G47|Value|
2239
+ MySQL-001|vg_meps01|h214|ICD10CDX|nvarchar|T78|Value|
2240
+ MySQL-001|vg_meps01|h214|ICD10CDX|nvarchar|M79|Value|
2241
+ MySQL-001|vg_meps01|h214|ICD10CDX|nvarchar|E07|Value|
2242
+ MySQL-001|vg_meps01|h216_narrow|ACTLIM31|smallint||Column|Indicates any limitation in work, housework, or school activities. Values: 1 = Yes, 2 = No.
2243
+ MySQL-001|vg_meps01|h216_narrow|ADACLS42|smallint||Column|Frequency of feeling accomplished less due to physical problems in the past 4 weeks (VR -12).
2244
+ MySQL-001|vg_meps01|h216_narrow|ADCLIM42|smallint||Column|Health limits climbing stairs (VR-12 scale, past 4 weeks).
2245
+ MySQL-001|vg_meps01|h216_narrow|ADDAYA42|smallint||Column|Delay in medical care due to cost (1=Yes, 2=No, -8=Don't Know, -7=Refused, -1=Inapplicable).
2246
+ MySQL-001|vg_meps01|h216_narrow|ADDPRS42|smallint||Column|Indicates if the respondent was a proxy in Round 4/2 (1 = RU member, 2 = proxy).
2247
+ MySQL-001|vg_meps01|h216_narrow|ADEFRT42|smallint||Column|Frequency of feeling everything required effort in the past 30 days (1=All, 6=None).
2248
+ MySQL-001|vg_meps01|h216_narrow|ADEMLS42|smallint||Column|Accomplished less in past 4 weeks due to mental health problems (VR -12 scale).
2249
+ MySQL-001|vg_meps01|h216_narrow|ADENGY42|smallint||Column|Value indicating dental care coverage by Indian Health Service during reference period R4/2 (2 = No, 1 = Yes).
2250
+ MySQL-001|vg_meps01|h216_narrow|ADGENH42|smallint||Column|An error occurred: Expecting value: line 5 column 1 (char 4)
2251
+ MySQL-001|vg_meps01|h216_narrow|ADHDADDX|smallint||Column|Diagnosis of ADHD or ADD (1 = Yes, 0 = No).
2252
+ MySQL-001|vg_meps01|h216_narrow|ADHDAGED|smallint||Column|Age at diagnosis of ADHD, ranging from 0 to 17 years.
2253
+ MySQL-001|vg_meps01|h216_narrow|ADHOPE42|smallint||Column|Frequency of feeling hopeless in the past 30 days (self-reported).
2254
+ MySQL-001|vg_meps01|h216_narrow|ADINSA42|smallint||Column|Variance estimation stratum identifier for 2019, numeric values ranging from 2001 to 2117.
2255
+ MySQL-001|vg_meps01|h216_narrow|ADINSB42|smallint||Column|2.0 Num; -8=Don't Know, -7=Refused, -1=Inapplicable, 1-5=Excellent to Poor health status evaluation.
2256
+ MySQL-001|vg_meps01|h216_narrow|ADINTR42|smallint||Column|Perceived health status (Round 4/2), values: 1=Excellent, 2=Very Good, 3=Good, 4=Fair, 5=Poor, -8=DK, -7=Refused, -1=Inapplicable.
2257
+ MySQL-001|vg_meps01|h216_narrow|ADLHLP31|smallint||Column|ADL (Activities of Daily Living) screener indicating assistance needed for daily tasks (Round 3/1).
2258
+ MySQL-001|vg_meps01|h216_narrow|ADMWCF42|smallint||Column|Frequency of feeling worthless in the past 30 days (self-reported).
2259
+ MySQL-001|vg_meps01|h216_narrow|ADNERV42|smallint||Column|Frequency of feeling nervous in the past 30 days, measured on a scale.
2260
+ MySQL-001|vg_meps01|h216_narrow|ADOVER42|smallint||Column|Belief in overcoming illnesses without medical assistance (self-assessment).
2261
+ MySQL-001|vg_meps01|h216_narrow|ADPAIN42|smallint||Column|Frequency of pain limiting normal work in the past 4 weeks (VR -12 scale).
2262
+ MySQL-001|vg_meps01|h216_narrow|ADPCFL42|smallint||Column|Frequency of feeling calm/peaceful in the past 4 weeks (values: -12 = missing, 1-5 = frequency scale).
2263
+ MySQL-001|vg_meps01|h216_narrow|ADPROX42|smallint||Column|Relationship of proxy respondent to adult (e.g., parent, spouse, other).
2264
+ MySQL-001|vg_meps01|h216_narrow|ADPRST42|smallint||Column|Frequency of feeling downhearted/depressed in the past 4 weeks (VR -12 scale).
2265
+ MySQL-001|vg_meps01|h216_narrow|ADREST42|smallint||Column|Frequency of feeling restless in the past 30 days.
2266
+ MySQL-001|vg_meps01|h216_narrow|ADRISK42|smallint||Column|Likelihood of taking risks, measured on a scale (values not explicitly defined in context).
2267
+ MySQL-001|vg_meps01|h216_narrow|ADSAD42|smallint||Column|Frequency of feeling sad in the past 30 days (self-reported).
2268
+ MySQL-001|vg_meps01|h216_narrow|ADSOCA42|smallint||Column|Health stopped social activities in past 4 weeks (VR -12 scale).
2269
+ MySQL-001|vg_meps01|h216_narrow|ADUPRO42|smallint||Column|Problem getting along with adults (ages 5-17), Round 4/2.
2270
+ MySQL-001|vg_meps01|h216_narrow|ADWKLM42|smallint||Column|Work limitations in past 4 weeks due to physical problems (VR-12 scale).
2271
+ MySQL-001|vg_meps01|h216_narrow|ADWRTH42|smallint||Column|Frequency of feeling worthless in the past 30 days (self-reported).
2272
+ MySQL-001|vg_meps01|h216_narrow|AGELAST|int||Column|Person's age at last eligibility period, categorized into ranges: 0-4, 5-17, 18-24, 25-44, 45-64, 65-85.
2273
+ MySQL-001|vg_meps01|h216_narrow|AIDHLP31|smallint||Column|ADL (Activities of Daily Living) assistance indicator for Round 3/1 (1 = Yes, 0 = No).
2274
+ MySQL-001|vg_meps01|h216_narrow|ANYLMI19|smallint||Column|Coverage by any private insurance in 2019. Values: 1 = Covered, 2 = Not covered.
2275
+ MySQL-001|vg_meps01|h216_narrow|ARTHAGED|smallint||Column|Age group in Round 3/1 (edited/imputed), weighted by PERWT19F, with values representing specific age ranges (e.g., 0-4, 5-17, etc.).
2276
+ MySQL-001|vg_meps01|h216_narrow|ARTHDX|smallint||Column|Arthritis diagnosis (>17); values: 1 = Yes, 2 = No, -8 = DK, -7 = Refused, -1 = Inapplicable.
2277
+ MySQL-001|vg_meps01|h216_narrow|ARTHTYPE|smallint||Column|Classification of person's arthritis type; values include specific diagnostic types or -1 for "Inapplicable".
2278
+ MySQL-001|vg_meps01|h216_narrow|ASACUT31|smallint||Column|Indicates military full-time active duty status in Round 3/1; values: -1 (inapplicable), 1 (active duty), 0 (not active duty).
2279
+ MySQL-001|vg_meps01|h216_narrow|ASATAK31|smallint||Column|Numeric, 3.0 format; -15: Cannot compute, -8: Don't know, -7: Refused, -1: Inapplicable, 1: Yes, 2: No.
2280
+ MySQL-001|vg_meps01|h216_narrow|ASDALY31|smallint||Column|Indicates if respondent currently takes preventive daily asthma medications. Values: -8=DK, -1=Inapplicable, 1=Yes, 2=No.
2281
+ MySQL-001|vg_meps01|h216_narrow|ASEVFL31|smallint||Column|Numeric variable indicating values: "-15: Cannot be computed," "-1: Inapplicable," "1-4: Specific categorical responses.
2282
+ MySQL-001|vg_meps01|h216_narrow|ASMRCN31|smallint||Column|12 months, asked non-VA provider to recommend other doctors: -15 Cannot compute, -1 Inapplicable, 1 Yes, 2 No.
2283
+ MySQL-001|vg_meps01|h216_narrow|ASPKFL31|smallint||Column|Variable describing whether the respondent asked for prescription refills in round 3/1 (): -1: Inapplicable; 1: Yes; 2: No.
2284
+ MySQL-001|vg_meps01|h216_narrow|ASPREV31|smallint||Column|Awareness of VA services by a VA provider (Values: -1 Inapplicable; 1 Yes; 2 No).
2285
+ MySQL-001|vg_meps01|h216_narrow|ASSTIL31|smallint||Column|Indicates if assistive technology for daily activities was received; values: -15, -1, 1-Yes.
2286
+ MySQL-001|vg_meps01|h216_narrow|ASTHAGED|smallint||Column|Age at asthma diagnosis; values: -15 Cannot Compute, -8 DK, -7 Refused, -1 Inapplicable, 0-85 years.
2287
+ MySQL-001|vg_meps01|h216_narrow|ASTHDX|smallint||Column|'': Indicates asthma diagnosis status; values: -8 DK, -7 REFUSED, -1 INAPPLICABLE, 1 YES, 2 NO.
2288
+ MySQL-001|vg_meps01|h216_narrow|ASTHEP31|smallint||Column|Numeric; -15 Cannot compute, 1-3 perception of provider's help improving/maintaining health; -1 inapplicable.
2289
+ MySQL-001|vg_meps01|h216_narrow|ASWNFL31|smallint||Column|3.0 Num; Values: -15=Cannot compute, -1=Inapplicable, 1-4 different sources/no caregiving service received; Total=28,512.
2290
+ MySQL-001|vg_meps01|h216_narrow|BENDIF31|smallint||Column|Organizational type of business: -8=DON�T KNOW, -7=REFUSED, -2=PREVIOUSLY DETERMINED, -1=NOT APPLICABLE, 1=INCORPORATED, 2=PROPRIETORSHIP, 3=PARTNERSHIP.
2291
+ MySQL-001|vg_meps01|h216_narrow|CABLADDR|smallint||Column|Residential address for cable services; used to identify service location. Numeric format. Missing values: -1 Inapplicable, -8 DK.
2292
+ MySQL-001|vg_meps01|h216_narrow|CABREAST|smallint||Column|Value definitions: -1 Inapplicable, 1 Breastfed exclusively, 2 Mixed feeding (breast+formula), 3 Formula-fed exclusively.
2293
+ MySQL-001|vg_meps01|h216_narrow|CACERVIX|smallint||Column|Diagnosed cervical cancer status for individuals older than 17; values: -8=Don't Know, -7=Refused, -1=Inapplicable, 1=Yes, 2=No.
2294
+ MySQL-001|vg_meps01|h216_narrow|CACOLON|smallint||Column|Cancer diagnosis of colon cancer in individuals aged 17 and older.
2295
+ MySQL-001|vg_meps01|h216_narrow|CALUNG|smallint||Column|Diagnosed lung cancer, ages 18+; valid values: 1 (yes), 2 (no). Format: Numeric.
2296
+ MySQL-001|vg_meps01|h216_narrow|CALYMPH|smallint||Column|Cancer diagnosis for non-Hodgkin's lymphoma in individuals >17 years old; values: -8=DK, -7=Refused, -1=Inapplicable, 1=Yes, 2=No.
2297
+ MySQL-001|vg_meps01|h216_narrow|CAMELANO|smallint||Column|Cancer diagnosed as melanoma of the skin for individuals older than 17.
2298
+ MySQL-001|vg_meps01|h216_narrow|CANCERDX|smallint||Column|Indicates cancer diagnosis (>17); values: -15 Cannot Compute, -8 DK, -7 Refused, -1 Inapplicable, 1 Yes, 2 No.
2299
+ MySQL-001|vg_meps01|h216_narrow|CAOTHER|smallint||Column|Indicates if criteria categorize respondent as �other,� based on activity, location, and household/institutionalization status.
2300
+ MySQL-001|vg_meps01|h216_narrow|CAPROSTA|smallint||Column|Prostate cancer diagnosis status (>17); -1 INAPPLICABLE, -7 REFUSED, -8 DK, 1 YES, 2 NO.
2301
+ MySQL-001|vg_meps01|h216_narrow|CASKINDK|smallint||Column|'': Diagnosis of skin cancer of unknown type for individuals over 17 years old.
2302
+ MySQL-001|vg_meps01|h216_narrow|CASKINNM|smallint||Column|Indicates diagnosis of non-melanoma skin cancer for individuals over 17 years old.
2303
+ MySQL-001|vg_meps01|h216_narrow|CAUTERUS|smallint||Column|Total annual expenditures (facility and doctor) for outpatient visits in U.S. dollars.
2304
+ MySQL-001|vg_meps01|h216_narrow|CHAPPT42|smallint||Column|Child's appointment availability within last 12 months (0-17 years), during round 4/2 interview timeframe.
2305
+ MySQL-001|vg_meps01|h216_narrow|CHCOUN42|smallint||Column|Numeric; response indicates frequency child�s doctor explained things clearly in past 12 months (0-17): -1 (Inapplicable), -8 (Don�t Know), 1 (Never), 2 (Sometimes), 3 (Usually), 4 (Always).
2306
+ MySQL-001|vg_meps01|h216_narrow|CHEMPB42|smallint||Column|Child required prescription medication for health/behavioral condition in past 12 months (0-17).
2307
+ MySQL-001|vg_meps01|h216_narrow|CHEXPL42|smallint||Column|CAHPS: Past 12 months, child�s doctor explained things clearly for children aged 0-17 in round 4/2.
2308
+ MySQL-001|vg_meps01|h216_narrow|CHEYRE42_M18|smallint||Column|CAHPS: Past 12 months appointment for child specialist (ages 0-17); linked to Round 4/2.
2309
+ MySQL-001|vg_meps01|h216_narrow|CHHECR42|smallint||Column|Rating of child healthcare quality (0�10): -1 Inapplicable, -8 Don't know, 0 Worst, 10 Best, others intermediate.
2310
+ MySQL-001|vg_meps01|h216_narrow|CHILCR42|smallint||Column|Indicates if the person has children; -8=Don't know, -7=Refused, -1=Inapplicable, 1=Yes, 2=No.
2311
+ MySQL-001|vg_meps01|h216_narrow|CHILWW42|smallint||Column|Child received needed care for illness/injury when wanted in past 12 months; values: -8=DK, -1=Inapplicable, 1=Never, 2=Sometimes, 3=Usually, 4=Always.
2312
+ MySQL-001|vg_meps01|h216_narrow|CHLICO42|smallint||Column|CSHCN: Ongoing condition limits child's usual activities for 12+ months (0-17 years).
2313
+ MySQL-001|vg_meps01|h216_narrow|CHLIHB42|smallint||Column|Binary variable indicating if a child's health/behavior condition limited usual activities (-1=INAPPLICABLE, 1=YES, 2=NO).
2314
+ MySQL-001|vg_meps01|h216_narrow|CHLIMI42|smallint||Column|CSHCN: Indicates if the child (0-17) is limited in any way due to health/behavioral conditions in round 4/2.
2315
+ MySQL-001|vg_meps01|h216_narrow|CHLIST42|smallint||Column|CAHPS: Child's doctor listened in past 12 months (0-17 years). Values: 1 = Yes, 2 = No, -1 = Inapplicable.
2316
+ MySQL-001|vg_meps01|h216_narrow|CHPMCN42|smallint||Column|Indicator if a child had a medical condition requiring prescription medication for 12+ months (ages 0-17).
2317
+ MySQL-001|vg_meps01|h216_narrow|CHPMED42|smallint||Column|CSHCN: Indicates if child needs prescribed medication; values: 1=Yes, 2=No, -1=Inapplicable, -7=Refused, -8=DK.
2318
+ MySQL-001|vg_meps01|h216_narrow|CHPMHB42|smallint||Column|Prescribed medication for child�s health/behavior condition in last 12 months. Values: -1 Inapplicable, 1 Yes, 2 No.
2319
+ MySQL-001|vg_meps01|h216_narrow|CHPRTM42|smallint||Column|Time spent by child�s doctor during last 12 months (0-17 years), as perceived in Round 4/2.
2320
+ MySQL-001|vg_meps01|h216_narrow|CHRESP42|smallint||Column|Indicates how often the child's doctor showed responsiveness (0=Never, 4=Always, -8=Don't know, -1=Inapplicable).
2321
+ MySQL-001|vg_meps01|h216_narrow|CHRTCR42|smallint||Column|CAHPS:12-month measure assessing whether a routine care appointment for children (0-17) could be scheduled ("Yes/No").
2322
+ MySQL-001|vg_meps01|h216_narrow|CHRTWW42|smallint||Column|Child's ability to get a routine appointment when wanted within the last 12 months: -8=Don't know, -7=Refused, -1=Inapplicable, 1=Never, 2=Sometimes, 3=Usually, 4=Always.
2323
+ MySQL-001|vg_meps01|h216_narrow|CHSERV42|smallint||Column|Indicates if child needed medical/other services (0-17) in the last 12 months during round 4/2.
2324
+ MySQL-001|vg_meps01|h216_narrow|CHSPEC42_M18|smallint||Column|Appointment scheduling with a specialist for children (ages 0�17) in the last 12 months; CAHPS survey.
2325
+ MySQL-001|vg_meps01|h216_narrow|CHSRCN42|smallint||Column|CSHCN: Child's service needs for condition lasting 12+ months (0-17 years), specific to Round 4/2.
2326
+ MySQL-001|vg_meps01|h216_narrow|CHSRHB42|smallint||Column|CSHCN:SERV FOR HL TH/BEHV COND(0-17)-R4/2, 2.0 Num. Values: -8=Don't Know, -1=Inapplicable, 1=Yes, 2=No.
2327
+ MySQL-001|vg_meps01|h216_narrow|CHTHCO42|smallint||Column|CSHCN:THER COND LA ST 12+ MOS (0-17)-R4/2; Numeric: -8=DK, -1=Inapplicable, 1=Yes, 2=No
2328
+ MySQL-001|vg_meps01|h216_narrow|CHTHER42|smallint||Column|CSHCN: Child needs specific therapy (0-17), values: -15(Cannot Compute), -8(Don't Know), -7(Refused), -1(Inapplicable), 1(Yes), 2(No).
2329
+ MySQL-001|vg_meps01|h216_narrow|CHTHHB42|smallint||Column|CSHCN:SPEC THER FOR HL TH+COND(0-17)-R4/2; Values: -15 Cannot compute, -8 DK, -7 Refused, -1 Inapplicable, 1 Yes.
2330
+ MySQL-001|vg_meps01|h216_narrow|COGLIM31|smallint||Column|Coverage by cognitive limitations, any time in R3/1; values: -1=Inapplicable, 1=Yes, 2=No.
2331
+ MySQL-001|vg_meps01|h216_narrow|CSHCN42|smallint||Column|Child with special health care needs (CSHCN), aged 0-17 in Round 4/2 interview.
2332
+ MySQL-001|vg_meps01|h216_narrow|DADPRO42|smallint||Column|Problems getting along with father (ages 5-17); values: -15 Cannot compute, -8 DK, -7 Refused, 0 No problem, 1-4 Severity.
2333
+ MySQL-001|vg_meps01|h216_narrow|DFCOG42|smallint||Column|Cognitive difficulties in reasoning, learning, or remembering; values: -8 (Don't Know), -7 (Refused), -1 (Inapplicable), 1 (Yes), 2 (No).
2334
+ MySQL-001|vg_meps01|h216_narrow|DFDRSB42|smallint||Column|Indicator for difficulty dressing/bathing: -8=Don�t Know, -7=Refused, -1=Inapplicable, 1=Yes, 2=No.
2335
+ MySQL-001|vg_meps01|h216_narrow|DFERND42|smallint||Column|Difficulty doing errands alone (round 4/2). Values: -8: Don't know, -7: Refused, -1: Inapplicable, 1: Yes, 2: No.
2336
+ MySQL-001|vg_meps01|h216_narrow|DFHEAR42|smallint||Column|Indicates serious difficulty hearing: -1 Inapplicable, 1 Yes, 2 No, -7 Refused, -8 Don't know.
2337
+ MySQL-001|vg_meps01|h216_narrow|DFSEE42|smallint||Column|Serious difficulty seeing even with glasses during reporting round 4/2; numeric format, coded -1, 1, or 2.
2338
+ MySQL-001|vg_meps01|h216_narrow|DFWLKC42|smallint||Column|Numeric code indicating "Person who joined RU" during a specific period; weighted by PERWT19F for population estimates.
2339
+ MySQL-001|vg_meps01|h216_narrow|DIABAGED|smallint||Column|Age at diabetes diagnosis; values: -8 (Don't know), -7 (Refused), -1 (Inapplicable), 0�85 (Age in years).
2340
+ MySQL-001|vg_meps01|h216_narrow|DIABDX_M18|smallint||Column|Indicator of diabetes diagnosis status, coded as: 1 (Diagnosed), 2 (Not Diagnosed), -1 (Inapplicable). Num format, length 2.
2341
+ MySQL-001|vg_meps01|h216_narrow|DOBMM|int||Column|Month of birth, numeric format ranging from 01 (January) to 12 (December)
2342
+ MySQL-001|vg_meps01|h216_narrow|DOBYY|smallint||Column|Year of birth; values: -15 (cannot compute), -8 (don�t know), -7 (refused), -1 (inapplicable), 1�12 (month mapping).
2343
+ MySQL-001|vg_meps01|h216_narrow|DUID|bigint||Column|Unique household or person-level identification number for survey data linkage within the MEPS dataset.
2344
+ MySQL-001|vg_meps01|h216_narrow|DUPERSID|bigint||Column|Unique identifier assigned to each person-level record in the dataset.
2345
+ MySQL-001|vg_meps01|h216_narrow|ELGRND19|int||Column|Indicator for electric coverage in 2019; values: -1=Inapplicable, 1=Yes, 2=No. Numeric, starts at position 2065.
2346
+ MySQL-001|vg_meps01|h216_narrow|FAMINC19|int||Column|Total family income in 2019; numeric monetary value representing cumulative income of all family members.
2347
+ MySQL-001|vg_meps01|h216_narrow|FNGRDF31|smallint||Column|Perceived general health status, coded as: -8 DK, -7 Refused, -1 Inapplicable, 1 Excellent, 2 Very Good, 3 Good, 4 Fair, 5 Poor.
2348
+ MySQL-001|vg_meps01|h216_narrow|GETTRB42|smallint||Column|Indicator for receiving treatment from provider (R4/2). Values: -8=Don't Know, -7=Refused, -1=Inapplicable, 1=Yes, 2=No.
2349
+ MySQL-001|vg_meps01|h216_narrow|HAVFUN42|smallint||Column|Indicator for enjoying activities in past week; values: -8 (DK), -7 (Refused), -1 (Inapplicable), 1 (Yes), 2 (No).
2350
+ MySQL-001|vg_meps01|h216_narrow|HISPANX|int||Column|Numeric, 1=Hispanic, 2=Non-Hispanic, -1=Inapplicable; demographic identifier weighted by PERWT19F.
2351
+ MySQL-001|vg_meps01|h216_narrow|HOMEBH42|smallint||Column|Reports problems with behavior at home for individuals aged 5-17 during round 4/2 interview period.
2352
+ MySQL-001|vg_meps01|h216_narrow|HSELIM31|smallint||Column|Health condition impacts ability to perform daily living tasks/household activities - Round 3/1
2353
+ MySQL-001|vg_meps01|h216_narrow|IADLHP31|smallint||Column|Identifies if the respondent requires help with Instrumental Activities of Daily Living (IADL) in round 3/1.
2354
+ MySQL-001|vg_meps01|h216_narrow|INSCOV19|int||Column|HEALTH INSURANCE COVERAGE INDICATOR 2019 (value of 1: Any Private, 2: Public Only, 3: Uninsured)
2355
+ MySQL-001|vg_meps01|h216_narrow|INSURC19|int||Column|FULL YEAR INSURANCE COVERAGE STATUS 2019 (value of 1: <65 Any Private, 2: <65 Public Only, 3: <65 Uninsured, 4-6: 65+ Medicare, 7-8: 65+ Uninsured)
2356
+ MySQL-001|vg_meps01|h216_narrow|JTPAIN31_M18|smallint||Column|JOINT PAIN LAST 12 MONTHS (>17) (value of 1: Yes, 2: No)
2357
+ MySQL-001|vg_meps01|h216_narrow|K6SUM42|smallint||Column|Summary measure of K6 psychological distress scale; range 0-24, with higher scores indicating greater psychological distress.
2358
+ MySQL-001|vg_meps01|h216_narrow|KIDPRO42|smallint||Column|Numeric (2.0); Response regarding child's provider explanation (-8=Don't know; -7=Refused; -1=Inapplicable; 1=Yes; 2=No).
2359
+ MySQL-001|vg_meps01|h216_narrow|LFTDIF31|smallint||Column|Difficulty lifting 10 pounds - RD 3/1; -8: DK, -7: Refused, -1: Inapplicable, 1: No difficulty, 2: Some, 3: A lot, 4: Unable.
2360
+ MySQL-001|vg_meps01|h216_narrow|LSTETH53|smallint||Column|Last month/year the individual served in active-duty military, value -1=Inapplicable, 1-12=Month.
2361
+ MySQL-001|vg_meps01|h216_narrow|MILDIF31|smallint||Column|Numeric (2.0). Start 408, End 409. Values: -8=DK, -1=Inapplicable, 1=Yes, 2=No. Weighted by PERWT19F.
2362
+ MySQL-001|vg_meps01|h216_narrow|MOMPRO42|smallint||Column|Indicator of whether provider explained options: -8=Don't Know, -7=Refused, -1=Inapplicable, 1=Yes, 2=No.
2363
+ MySQL-001|vg_meps01|h216_narrow|NERVAF42|smallint||Column|Numeric, response values: -8=Don't Know, -7=Refused, -1=Inapplicable, 1=Yes, 2=No.
2364
+ MySQL-001|vg_meps01|h216_narrow|OFTSMK53|smallint||Column|Frequency of smoking in round 5/3; values: -1 INAPPLICABLE, 1 DAILY, 2 SOME DAYS, 3 NOT AT ALL.
2365
+ MySQL-001|vg_meps01|h216_narrow|PANEL|int||Column|Indicator for specific survey panel, numeric, values correspond to panel IDs (e.g., 1 for panel one, etc.).
2366
+ MySQL-001|vg_meps01|h216_narrow|PHQ242|smallint||Column|Overall rating of feelings over the past two weeks, assessed through a structured self-assessment question.
2367
+ MySQL-001|vg_meps01|h216_narrow|PHYEXE53|smallint||Column|Physical exercise participation; values: -1 INAPPLICABLE, 1 YES, 2 NO.
2368
+ MySQL-001|vg_meps01|h216_narrow|PID|int||Column|Person Number
2369
+ MySQL-001|vg_meps01|h216_narrow|POVCAT19|int||Column|FAMILY INC AS % OF POVERTY LINE (value 1: Poor/Negative, 2: Near Poor, 3: Low Income, 4: Middle, 5: High Income)
2370
+ MySQL-001|vg_meps01|h216_narrow|POVLEV19|float||Column|FAMILY INC AS % OF POVERTY LINE - CONTINUOUS Variable
2371
+ MySQL-001|vg_meps01|h216_narrow|RACEV2X|int||Column|Race (value of 1: White, 2: Black, 3: Amer Indian, 4: Asian Indian, 5: Chinese, 6: Filipino, 10: Other Asian, 12: Multiple races)
2372
+ MySQL-001|vg_meps01|h216_narrow|RCHDIF31|smallint||Column|Indicator for difficulty receiving needed medical care: -8=Don't know, -7=Refused, -1=Inapplicable, 1=Yes, 2=No.
2373
+ MySQL-001|vg_meps01|h216_narrow|RXEXP19|int||Column|Total prescription expenses for 2019 in dollars, ranging from $0 to $234,970.
2374
+ MySQL-001|vg_meps01|h216_narrow|SAQELIG|tinyint||Column|Indicator of eligibility for the Self-Administered Questionnaire (SAQ); binary variable denoting participation status.
2375
+ MySQL-001|vg_meps01|h216_narrow|SCHLBH42|smallint||Column|Indicates if school attendance was hindered in round 4/2: -1 Inapplicable, 1 Yes, 2 No, etc.
2376
+ MySQL-001|vg_meps01|h216_narrow|SCHLIM31|smallint||Column|SCHOOL LIMITATION - RD 3/1: 3.0 Num; -1 INAPPLICABLE, 1 YES, 2 NO; Start 549, End 551.
2377
+ MySQL-001|vg_meps01|h216_narrow|SCHPRO42|smallint||Column|CAHPS school-related variable (0-17), values: -8=DK, -1=Inapplicable, 1=Never, 2=Sometimes, 3=Usually, 4=Always.
2378
+ MySQL-001|vg_meps01|h216_narrow|SEX|int||Column|Biological sex: 1 = Male, 2 = Female
2379
+ MySQL-001|vg_meps01|h216_narrow|SIBPRO42|smallint||Column|Problem getting along with siblings (ages 5-17); coded values: -15, -8, -7, -1, 0, 1-3, 4, 99.
2380
+ MySQL-001|vg_meps01|h216_narrow|SOCLIM31|smallint||Column|Household size on the reference date (Round 3/1); values represent exact family member counts or inapplicable data (-1).
2381
+ MySQL-001|vg_meps01|h216_narrow|SPRPRO42|smallint||Column|Provider spent enough time with person during Round 4/2 interview: -8=Don't know, -7=Refused, -1=Inapplicable, 1=Yes, 2=No.
2382
+ MySQL-001|vg_meps01|h216_narrow|STNDIF31|smallint||Column|Indicator if deceased in round 3/1: -1 Not applicable, 31 Yes (deceased); numeric, two-digit code.
2383
+ MySQL-001|vg_meps01|h216_narrow|STPDIF31|smallint||Column|Indicator of difficulty climbing steps (range: none to severe) during reference period (Round 3/1).
2384
+ MySQL-001|vg_meps01|h216_narrow|TOTEXP19|int||Column|Total health care expenditures in 2019; includes all sources of payment. Format: Numeric. Values vary.
2385
+ MySQL-001|vg_meps01|h216_narrow|TOTSLF19|int||Column|Indicates self-coverage eligibility status in 2019 (-1: Not Applicable, 1: Self-covered, 2: Not Self-covered).
2386
+ MySQL-001|vg_meps01|h216_narrow|TOTTCH19|int||Column|Final total number of eligible persons for survey analysis in 2019 weighted by PERWT19F.
2387
+ MySQL-001|vg_meps01|h216_narrow|TTLP19X|int||Column|Total pre-tax income for 2019 in dollars (numeric, unweighted values range from -72406 to 72406).
2388
+ MySQL-001|vg_meps01|h216_narrow|UNABLE31|smallint||Column|Measures difficulty with activities in the past 31 days: -15 Cannot Compute, -1 Inapplicable, 1-3 Problem severity, 4 Very Big Problem.
2389
+ MySQL-001|vg_meps01|h216_narrow|UNHAP42|smallint||Column|Indicator for whether a person is unhappy: Values: -8 (Don't know), -7 (Refused), -1 (Inapplicable), 1 (Yes), 2 (No).
2390
+ MySQL-001|vg_meps01|h216_narrow|WAGEP19X|int||Column|Numeric; person's wage income in 2019; $0, $9-$303,114, weighted by PERWT19F; unweighted total: 28,512 observations.
2391
+ MySQL-001|vg_meps01|h216_narrow|WLKDIF31|smallint||Column|Difficulty walking: -8 DK, -1 Inapplicable, 1 No difficulty, 2 Some difficulty, 3 A lot of difficulty, 4 Unable to do.
2392
+ MySQL-001|vg_meps01|h216_narrow|WLKLIM31|smallint||Column|2.0 numeric: -8=Don�t know, -1=Inapplicable, 1=Yes, 2=No. Indicates child�s health limitation status.
2393
+ MySQL-001|vg_meps01|h216_narrow|WRKLIM31|smallint||Column|Indicates work limitations (-8=Don't know, -7=Refused, -1=Inapplicable, 1=Yes, 2=No).
2394
+ MySQL-001|vg_meps01|h216_narrow|PERWT19F|float||Column|Person-level weight that allows us to extrapolate to the US population.
2395
+ MySQL-001|vg_meps01|h216_narrow||||Table|This is the main table, has enrollment, expenditure, and key demographic information. Can be linked to pharmacy and medical tables if needed using DUPERSID.
vg_/csvs/flagged_examples.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "database": "capgpbsh_vg_chinook",
4
+ "user_messages": "Give me a bar plot of sales, by album, of U2\n",
5
+ "example_code": "\nimport pandas as pd\nimport matplotlib.pyplot as plt\n\n# Query to find the ArtistId of U2\nquery_artist_id = \"SELECT ArtistId FROM Artist WHERE Name = 'U2'\"\nartist_id = pd.read_sql(query_artist_id, conn).iloc[0, 0]\n\n# Query to find the sales by album for U2\nquery_album_sales = \"\"\"\n SELECT a.Title, SUM(il.Quantity) AS TotalSales\n FROM Album a\n JOIN Artist ar ON a.ArtistId = ar.ArtistId\n JOIN Track t ON a.AlbumId = t.AlbumId\n JOIN InvoiceLine il ON t.TrackId = il.TrackId\n WHERE ar.ArtistId = %s\n GROUP BY a.Title\n ORDER BY SUM(il.Quantity) DESC\n\"\"\" % artist_id\n\n# Use pd.read_sql to execute the query and get the sales by album for U2\nalbum_sales = pd.read_sql(query_album_sales, conn)\n\n# Create a bar plot of sales by album using Matplotlib\nplt.figure(figsize=(10, 6))\nplt.bar(album_sales['Title'], album_sales['TotalSales'])\nplt.xlabel('Album Title')\nplt.ylabel('Total Sales')\nplt.title('Album Sales by U2')\nplt.xticks(rotation=90)\nplt.tight_layout()\nplt.show()"
6
+ }
7
+ ]
vg_/csvs/training_examples.json CHANGED
@@ -5,78 +5,168 @@
5
  "example_code": "# Using Server: MUZAYAN_GRAM\\SQLEXPRESS, and database: MEPS.\n#conn = Driver={SQL Server};Server=MUZAYAN_GRAM\\SQLEXPRESS;Database=MEPS;Trusted_Connection=yes;.\n\nimport pandas as pd\nimport matplotlib.pyplot as plt\nfrom matplotlib_venn import venn2\n\n# Query the data\ndiabetes_query = \"\"\"\n SELECT DUPERSID, DIABDX_M18, AGELAST\n FROM h216_narrow\n\"\"\"\ndiabetes_df = pd.read_sql(diabetes_query, conn)\n\n# Filter the data\ndiabetes_df = diabetes_df[(diabetes_df['DIABDX_M18'] == '1') | (diabetes_df['DIABDX_M18'] == '2')]\ndiabetes_df = diabetes_df[diabetes_df['AGELAST'].notna()]\ndiabetes_df['DIABDX_M18'] = diabetes_df['DIABDX_M18'].apply(lambda x: True if x == '1' else False)\ndiabetes_df['UNDER_50'] = diabetes_df['AGELAST'].apply(lambda x: True if int(x) < 50 else False)\n\n# Create a venn diagram\ndiabetes_count = diabetes_df['DIABDX_M18'].sum()\nunder_50_count = diabetes_df['UNDER_50'].sum()\nboth_count = ((diabetes_df['DIABDX_M18']) & (diabetes_df['UNDER_50'])).sum()\n\n# Plot the venn diagram\nplt.figure(figsize=(8, 8))\nv = venn2(subsets=(diabetes_count - both_count, under_50_count - both_count, both_count), set_labels=('Diabetes', 'Under 50'))\n\n# Show the plot\nplt.show()"
6
  },
7
  {
8
- "database": "Db_Test_01",
9
- "user_messages": "give me a histogram for rings",
10
- "example_code": "# Using Server: MUZAYAN_GRAM\\SQLEXPRESS, and database: Db_Test_01.\n#conn = Driver={SQL Server};Server=MUZAYAN_GRAM\\SQLEXPRESS;Database=Db_Test_01;Trusted_Connection=yes;.\n\nimport matplotlib.pyplot as plt\nimport pandas as pd\nimport numpy as np\n\n# Read the 'Rings' column from the 'd_Abalone' table\nrings = pd.read_sql(\"SELECT Rings FROM d_Abalone\", conn)\n\n# Create a histogram with a black edge around the bars\nplt.hist(rings['Rings'], edgecolor='black')\nplt.xlabel('Number of Rings')\nplt.ylabel('Frequency')\nplt.title('Histogram of Abalone Rings')\nplt.show()"
11
  },
12
  {
13
- "database": "Db_Test_01",
14
- "user_messages": "give me a histogram for rings; color it red",
15
- "example_code": "# Using Server: MUZAYAN_GRAM\\SQLEXPRESS, and database: Db_Test_01.\n#conn = Driver={SQL Server};Server=MUZAYAN_GRAM\\SQLEXPRESS;Database=Db_Test_01;Trusted_Connection=yes;.\n\nimport matplotlib.pyplot as plt\nimport pandas as pd\n\n# Read the 'Rings' column from the 'd_Abalone' table\nrings = pd.read_sql(\"SELECT Rings FROM d_Abalone\", conn)\n\n# Create a histogram with a black edge around the bars and color the bars red\nplt.hist(rings['Rings'], edgecolor='black', color='red')\nplt.xlabel('Number of Rings')\nplt.ylabel('Frequency')\nplt.title('Histogram of Abalone Rings')\nplt.show()"
16
  },
17
  {
18
- "database": "Chinook",
19
- "user_messages": "how many artists in the data?",
20
- "example_code": "# Using Server: MUZAYAN_GRAM\\SQLEXPRESS, and database: Chinook.\n#conn = Driver={SQL Server};Server=MUZAYAN_GRAM\\SQLEXPRESS;Database=Chinook;Trusted_Connection=yes;.\n\nimport pandas as pd\n\n# Read the Artist table into a Pandas dataframe\nartist_df = pd.read_sql(\"SELECT * FROM Artist\", conn)\n\n# Count the number of artists\nnum_artists = len(artist_df)\n\nprint(f\"There are {num_artists} artists in the data.\")"
21
  },
22
  {
23
  "database": "MEPS",
24
- "user_messages": "what is the relationship between income and diabetes, from meps;",
25
- "example_code": "# Using Server: MUZAYAN_GRAM\\SQLEXPRESS, and database: MEPS.\n#conn = Driver={SQL Server};Server=MUZAYAN_GRAM\\SQLEXPRESS;Database=MEPS;Trusted_Connection=yes;.\n\nimport pandas as pd\nimport numpy as np\nimport matplotlib.pyplot as plt\nimport seaborn as sns\n\n# Load the h216_narrow table into a pandas dataframe\ndf = pd.read_sql('SELECT * FROM h216_narrow', conn)\n\n# Filter the dataframe to include only rows where DIABDX_M18 is '1' (Diagnosed) or '2' (Not Diagnosed)\ndf = df[df['DIABDX_M18'].isin(['1', '2'])]\n\n# Convert DIABDX_M18 to a categorical variable\ndf['DIABDX_M18'] = df['DIABDX_M18'].astype('category')\ndf['DIABDX_M18'] = df['DIABDX_M18'].cat.rename_categories({'1': 'Diagnosed', '2': 'Not Diagnosed'})\n\n# Plot the distribution of total family income (FAMINC19) by diabetes diagnosis status\nplt.figure(figsize=(10,6))\nsns.boxplot(data=df, x='DIABDX_M18', y='FAMINC19')\nplt.title('Distribution of Total Family Income by Diabetes Diagnosis Status')\nplt.xlabel('Diabetes Diagnosis Status')\nplt.ylabel('Total Family Income')\nplt.show()\n\n# Calculate the mean and standard deviation of total family income by diabetes diagnosis status\ngrouped_df = df.groupby('DIABDX_M18')['FAMINC19'].agg(['mean', 'std'])\nprint(grouped_df)\n\n# Perform a t-test to compare the mean total family income between diagnosed and not diagnosed groups\nfrom scipy.stats import ttest_ind\ndiagnosed_group = df[df['DIABDX_M18'] == 'Diagnosed']['FAMINC19']\nnot_diagnosed_group = df[df['DIABDX_M18'] == 'Not Diagnosed']['FAMINC19']\nt_stat, p_val = ttest_ind(diagnosed_group, not_diagnosed_group)\nprint(f'T-statistic: {t_stat}, p-value: {p_val}')"
26
  },
27
  {
28
  "database": "MEPS",
29
- "user_messages": "give me a venn diagram for people with arthritis and with diabetes",
30
- "example_code": "# Using Server: MUZAYAN_GRAM\\SQLEXPRESS, and database: MEPS.\n#conn = Driver={SQL Server};Server=MUZAYAN_GRAM\\SQLEXPRESS;Database=MEPS;Trusted_Connection=yes;.\n\nimport pandas as pd\nimport matplotlib.pyplot as plt\nfrom matplotlib_venn import venn2\n\n# Read data from the database\nquery = \"\"\"\n SELECT DUPERSID, ARTHDX, DIABDX_M18\n FROM h216_narrow\n WHERE ARTHDX = '1' OR DIABDX_M18 = '1'\n\"\"\"\ndata = pd.read_sql(query, conn)\n\n# Create a boolean mask for people with arthritis and people with diabetes\nhas_arthritis = data['ARTHDX'] == '1'\nhas_diabetes = data['DIABDX_M18'] == '1'\n\n# Calculate the overlap between the two groups\narthritis_only = len(data[has_arthritis & ~has_diabetes])\ndiabetes_only = len(data[~has_arthritis & has_diabetes])\nboth = len(data[has_arthritis & has_diabetes])\ntotal = len(data)\n\n# Create the Venn diagram\nplt.figure(figsize=(8, 8))\nvenn2(subsets=(arthritis_only, diabetes_only, both), set_labels=('Arthritis', 'Diabetes'))\n\n# Show the total number of patients in the title\nplt.title(f'Venn Diagram of Arthritis and Diabetes (Total: {total})')\n\n# Display the plot\nplt.show()"
31
  },
32
  {
33
- "database": "Db_Test_01",
34
- "user_messages": "give me a model to predict rings",
35
- "example_code": "# Using Server: MUZAYAN_GRAM\\SQLEXPRESS, and database: Db_Test_01.\n#conn = Driver={SQL Server};Server=MUZAYAN_GRAM\\SQLEXPRESS;Database=Db_Test_01;Trusted_Connection=yes;.\n\n# Import necessary libraries\nimport pandas as pd\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.ensemble import RandomForestRegressor\nfrom sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score\nfrom sklearn.preprocessing import OneHotEncoder\n\n# Load the data from the MS SQL database\ndf = pd.read_sql('SELECT * FROM d_Abalone', conn)\n\n# One-hot-encode the Sex column\ndf = pd.get_dummies(df, columns=['Sex'])\n\n# Define features (X) and target (y)\nX = df.drop(['Rings'], axis=1)\ny = df['Rings']\n\n# Split the data into training and testing sets\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n\n# Create and train a random forest regressor model\nmodel = RandomForestRegressor(n_estimators=100, random_state=42)\nmodel.fit(X_train, y_train)\n\n# Make predictions on the test set\ny_pred = model.predict(X_test)\n\n# Calculate performance metrics\nmse = mean_squared_error(y_test, y_pred)\nmae = mean_absolute_error(y_test, y_pred)\nr2 = r2_score(y_test, y_pred)\n\n# Print performance metrics\nprint(f\"Mean Squared Error: {mse:.2f}\")\nprint(f\"Mean Absolute Error: {mae:.2f}\")\nprint(f\"R2 Score: {r2:.2f}\")"
36
  },
37
  {
38
- "database": "BenchmarkingDb_Smol",
39
- "user_messages": "give me a model to predict iris",
40
- "example_code": "# Using Server: MUZAYAN_GRAM\\SQLEXPRESS, and database: BenchmarkingDb_Smol.\n#conn = Driver={SQL Server};Server=MUZAYAN_GRAM\\SQLEXPRESS;Database=BenchmarkingDb_Smol;Trusted_Connection=yes;.\n\n# Import necessary libraries\nimport pandas as pd\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.preprocessing import StandardScaler\nfrom sklearn.ensemble import RandomForestClassifier\nfrom sklearn.metrics import accuracy_score, classification_report, confusion_matrix\n\n# Load the iris dataset from the database\niris_data = pd.read_sql(\"SELECT * FROM iris\", conn)\n\n# Define the features (X) and the target variable (y)\nX = iris_data[['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']]\ny = iris_data['target']\n\n# Split the data into training and testing sets\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n\n# Standardize the features\nscaler = StandardScaler()\nX_train = scaler.fit_transform(X_train)\nX_test = scaler.transform(X_test)\n\n# Train a random forest classifier on the training data\nrfc = RandomForestClassifier(n_estimators=100, random_state=42)\nrfc.fit(X_train, y_train)\n\n# Make predictions on the test data\ny_pred = rfc.predict(X_test)\n\n# Evaluate the model's performance\nprint(\"Accuracy:\", accuracy_score(y_test, y_pred))\nprint(\"Classification Report:\")\nprint(classification_report(y_test, y_pred))\nprint(\"Confusion Matrix:\")\nprint(confusion_matrix(y_test, y_pred))"
 
 
 
 
 
41
  },
42
  {
43
  "database": "MEPS",
44
  "user_messages": "how many diabetics in meps?",
45
- "example_code": "# Using Server: MUZAYAN_GRAM\\SQLEXPRESS, and database: MEPS.\n#conn = Driver={SQL Server};Server=MUZAYAN_GRAM\\SQLEXPRESS;Database=MEPS;Trusted_Connection=yes;.\n\n# Import necessary libraries\nimport pandas as pd\n\n# Query the database to count the number of diabetics\nquery = \"\"\"\n SELECT COUNT(DUPERSID) \n FROM h216_narrow \n WHERE DIABDX_M18 = '1' \n\"\"\"\n# Read the query results into a pandas DataFrame\ndf = pd.read_sql(query, conn)\n\n# Print the number of diabetics in the database\nprint(\"Number of diabetics in MEPS:\", df.iloc[0, 0])"
46
  },
47
  {
48
  "database": "MEPS",
49
- "user_messages": "give me the pmpm total healthcare cost from meps for diabetics",
50
- "example_code": "# Using Server: MUZAYAN_GRAM\\SQLEXPRESS, and database: MEPS.\n#conn = Driver={SQL Server};Server=MUZAYAN_GRAM\\SQLEXPRESS;Database=MEPS;Trusted_Connection=yes;.\n\n# Import necessary libraries\nimport pandas as pd\n\n# Query the database to get the total healthcare cost for diabetics\nquery = \"\"\"\n SELECT SUM(TOTEXP19) as total_cost, COUNT(DUPERSID) as total_diabetics \n FROM h216_narrow \n WHERE DIABDX_M18 = '1' \n\"\"\"\n# Read the query results into a pandas DataFrame\ndf = pd.read_sql(query, conn)\n\n# Calculate the PMPM total healthcare cost for diabetics\ntotal_cost = df['total_cost'].iloc[0]\ntotal_diabetics = df['total_diabetics'].iloc[0]\npmpm_cost = (total_cost / total_diabetics) / 12 # Assuming 12 months in a year\n\n# Print the PMPM total healthcare cost for diabetics\nprint(\"PMPM total healthcare cost for diabetics in MEPS: $\", round(pmpm_cost, 2))"
51
  },
52
  {
53
- "database": "Chinook",
54
- "user_messages": "from the chinook db, can you plot the number of customers by country on a map, and color code it by number of customers (white for few, and green for more)?",
55
- "example_code": "# Using Server: MUZAYAN_GRAM\\SQLEXPRESS, and database: Chinook.\n#conn = Driver={SQL Server};Server=MUZAYAN_GRAM\\SQLEXPRESS;Database=Chinook;Trusted_Connection=yes;.\n\nimport pandas as pd\nimport plotly.graph_objects as go\n\n# Read the Customer table into a Pandas dataframe\ncustomer_df = pd.read_sql(\"SELECT Country FROM Customer\", conn)\n\n# Group the customers by country and count the number of customers\ncountry_counts = customer_df['Country'].value_counts().reset_index()\ncountry_counts.columns = ['Country', 'Count']\n\n# Use plotly to create a choropleth map\nfig = go.Figure(data=go.Choropleth(\n locations=country_counts['Country'], # Spatial coordinates\n z = country_counts['Count'], # Data to display\n locationmode = 'country names',\n colorscale=[[0, \"rgba(255, 255, 255, 1)\"], [1, \"rgba(0, 128, 0, 1)\"]],\n colorbar_title = \"Number of customers\"\n ))\n\nfig.update_layout(\n title_text='Number of Customers by Country',\n geo_scope='world',\n )\n\nfig.show()"
56
  },
57
  {
58
- "database": "Db_Test_01",
59
- "user_messages": "For the Abalone dataset, give me the count of Rings > 10.",
60
- "example_code": "# Using Server: MUZAYAN_GRAM\\SQLEXPRESS, and database: Db_Test_01.\n#conn = Driver={SQL Server};Server=MUZAYAN_GRAM\\SQLEXPRESS;Database=Db_Test_01;Trusted_Connection=yes;.\n\nimport pandas as pd\n\n# Read the 'Rings' column from the 'd_Abalone' table where Rings > 10\nrings_over_10 = pd.read_sql(\"SELECT count(*) as count FROM d_Abalone WHERE Rings > 10\", conn)\n\n# Print the result\nprint(rings_over_10['count'].iloc[0])"
61
  },
62
  {
63
  "database": "MEPS",
64
- "user_messages": "give me the top 10 drugs (in terms of cost) taken by those with diabetes, from meps",
65
- "example_code": "# Using Server: MUZAYAN_GRAM\\SQLEXPRESS, and database: MEPS.\n#conn = Driver={SQL Server};Server=MUZAYAN_GRAM\\SQLEXPRESS;Database=MEPS;Trusted_Connection=yes;.\n\n# Import necessary libraries\nimport pandas as pd\n\n# Query the database to get top 10 drugs taken by diabetics\nquery = \"\"\"\n SELECT TOP 10 RXDRGNAM, SUM(RXEXP19) as Total_Cost\n FROM h213a\n INNER JOIN h216_narrow ON h213a.DUPERSID = h216_narrow.DUPERSID\n WHERE h216_narrow.DIABDX_M18 = '1'\n GROUP BY RXDRGNAM\n ORDER BY Total_Cost DESC\n\"\"\"\n\n# Read the query results into a pandas DataFrame\ndf = pd.read_sql(query, conn)\n\n# Print the top 10 drugs taken by diabetics\nprint(\"Top 10 drugs taken by diabetics in MEPS:\")\ndf"
66
  },
67
  {
68
  "database": "MEPS",
69
- "user_messages": "give me the top 10 drugs (in terms of cost) taken by those with diabetes, from meps; express in a pie chart",
70
- "example_code": "# Using Server: MUZAYAN_GRAM\\SQLEXPRESS, and database: MEPS.\n#conn = Driver={SQL Server};Server=MUZAYAN_GRAM\\SQLEXPRESS;Database=MEPS;Trusted_Connection=yes;.\n\n# Import necessary libraries\nimport pandas as pd\nimport matplotlib.pyplot as plt\n\n# Query the database to get top 10 drugs taken by diabetics\nquery = \"\"\"\n SELECT TOP 10 RXDRGNAM, SUM(RXEXP19) as Total_Cost\n FROM h213a\n INNER JOIN h216_narrow ON h213a.DUPERSID = h216_narrow.DUPERSID\n WHERE h216_narrow.DIABDX_M18 = '1'\n GROUP BY RXDRGNAM\n ORDER BY Total_Cost DESC\n\"\"\"\n\n# Read the query results into a pandas DataFrame\ndf = pd.read_sql(query, conn)\n\n# Print the top 10 drugs taken by diabetics\nprint(\"Top 10 drugs taken by diabetics in MEPS:\")\ndf\n\n# Plot a pie chart to visualize the top 10 drugs taken by diabetics\nplt.figure(figsize=(10,8))\nplt.pie(df['Total_Cost'], labels = df['RXDRGNAM'], autopct='%1.1f%%', textprops={'fontsize': 10})\nplt.title('Top 10 Drugs Taken by Diabetics in MEPS')\nplt.show()"
71
  },
72
  {
73
  "database": "MEPS",
74
- "user_messages": "what are the top 10 drugs, by cost, for people that have arthritis. Use meps; Re-trying...; for cost, use sum of RXSF19X, RXOF19X, RXOT19X and RXPV19X",
75
- "example_code": "# Using Server: MUZAYAN_GRAM\\SQLEXPRESS, and database: MEPS.\n#conn = Driver={SQL Server};Server=MUZAYAN_GRAM\\SQLEXPRESS;Database=MEPS;Trusted_Connection=yes;.\n\n# Import necessary libraries\nimport pandas as pd\n\n# Query the database to get top 10 drugs taken by people with arthritis\nquery = \"\"\"\n SELECT TOP 10 h213a.RXDRGNAM, SUM(h213a.RXSF19X + h213a.RXOF19X + h213a.RXOT19X + h213a.RXPV19X) as Total_Cost\n FROM h213a\n INNER JOIN h216_narrow ON h213a.DUPERSID = h216_narrow.DUPERSID\n WHERE h216_narrow.ARTHDX = '1'\n GROUP BY h213a.RXDRGNAM\n ORDER BY Total_Cost DESC\n\"\"\"\n\n# Read the query results into a pandas DataFrame\ndf = pd.read_sql(query, conn)\n\n# Print the top 10 drugs taken by people with arthritis\nprint(\"Top 10 drugs taken by people with arthritis in MEPS:\")\ndf"
76
  },
77
  {
78
  "database": "MEPS",
79
- "user_messages": "give me the top 10 drugs, in terms of amount paid by insurance, for individuals with diabetes.",
80
- "example_code": "# Using Server: MUZAYAN_GRAM\\SQLEXPRESS, and database: MEPS.\n#conn = Driver={SQL Server};Server=MUZAYAN_GRAM\\SQLEXPRESS;Database=MEPS;Trusted_Connection=yes;.\n\n# Import necessary libraries\nimport pandas as pd\n\n# Query the database to get top 10 drugs taken by diabetics\nquery = \"\"\"\n SELECT TOP 10 h213a.RXDRGNAM, SUM(h213a.RXPV19X+RXOF19X+RXOT19X+RXMR19X+RXSL19X) as Total_Paid\n FROM h213a\n INNER JOIN h216_narrow ON h213a.DUPERSID = h216_narrow.DUPERSID\n WHERE h216_narrow.DIABDX_M18 = '1'\n GROUP BY h213a.RXDRGNAM\n ORDER BY Total_Paid DESC\n\"\"\"\n\n# Read the query results into a pandas DataFrame\ndf = pd.read_sql(query, conn)\n\n# Print the top 10 drugs taken by diabetics\nprint(\"Top 10 drugs taken by diabetics in MEPS, in terms of amount paid by insurance:\")\ndf"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
81
  }
82
  ]
 
5
  "example_code": "# Using Server: MUZAYAN_GRAM\\SQLEXPRESS, and database: MEPS.\n#conn = Driver={SQL Server};Server=MUZAYAN_GRAM\\SQLEXPRESS;Database=MEPS;Trusted_Connection=yes;.\n\nimport pandas as pd\nimport matplotlib.pyplot as plt\nfrom matplotlib_venn import venn2\n\n# Query the data\ndiabetes_query = \"\"\"\n SELECT DUPERSID, DIABDX_M18, AGELAST\n FROM h216_narrow\n\"\"\"\ndiabetes_df = pd.read_sql(diabetes_query, conn)\n\n# Filter the data\ndiabetes_df = diabetes_df[(diabetes_df['DIABDX_M18'] == '1') | (diabetes_df['DIABDX_M18'] == '2')]\ndiabetes_df = diabetes_df[diabetes_df['AGELAST'].notna()]\ndiabetes_df['DIABDX_M18'] = diabetes_df['DIABDX_M18'].apply(lambda x: True if x == '1' else False)\ndiabetes_df['UNDER_50'] = diabetes_df['AGELAST'].apply(lambda x: True if int(x) < 50 else False)\n\n# Create a venn diagram\ndiabetes_count = diabetes_df['DIABDX_M18'].sum()\nunder_50_count = diabetes_df['UNDER_50'].sum()\nboth_count = ((diabetes_df['DIABDX_M18']) & (diabetes_df['UNDER_50'])).sum()\n\n# Plot the venn diagram\nplt.figure(figsize=(8, 8))\nv = venn2(subsets=(diabetes_count - both_count, under_50_count - both_count, both_count), set_labels=('Diabetes', 'Under 50'))\n\n# Show the plot\nplt.show()"
6
  },
7
  {
8
+ "database": "MEPS",
9
+ "user_messages": "give me the average pharmacy claim cost for people with and without diabetes",
10
+ "example_code": "\nimport pandas as pd\n\n# Query the database to get the pharmacy claim cost for people with and without diabetes\nquery = \"\"\"\n SELECT \n CASE \n WHEN DIABAGED > 0 THEN 'With Diabetes'\n ELSE 'Without Diabetes'\n END AS diabetes_status,\n AVG(RXEXP19) AS average_pharmacy_claim_cost\n FROM \n h216_narrow\n GROUP BY \n CASE \n WHEN DIABAGED > 0 THEN 'With Diabetes'\n ELSE 'Without Diabetes'\n END\n\"\"\"\n\n# Read the query results into a pandas DataFrame\ndf = pd.read_sql(query, conn)\n\n# Print the average pharmacy claim cost for people with and without diabetes\nfor index, row in df.iterrows():\n print(f\"Average pharmacy claim cost for people {row['diabetes_status'].lower()}: ${round(row['average_pharmacy_claim_cost'], 2)}\")\n\ndf"
11
  },
12
  {
13
+ "database": "Chinook",
14
+ "user_messages": "how many artists are there?",
15
+ "example_code": "\nimport pandas as pd\nimport pyodbc\n\n# SQL query to count the number of artists\nquery = \"SELECT COUNT(*) AS NumberOfArtists FROM Artist\"\n\n# Execute the query and read the result into a pandas DataFrame\ndf = pd.read_sql(query, conn)\n\n# Print the number of artists\nprint(\"Number of artists:\", df['NumberOfArtists'][0])"
16
  },
17
  {
18
+ "database": "MEPS",
19
+ "user_messages": "give me a venn diagram for people with diabetes and those with asthma",
20
+ "example_code": "\nimport pandas as pd\nimport matplotlib.pyplot as plt\nfrom matplotlib_venn import venn2\n\n# Query the data for diabetes and asthma\nquery = \"\"\"\n SELECT DUPERSID, DIABDX_M18, ASTHDX\n FROM h216_narrow\n\"\"\"\ndf = pd.read_sql(query, conn)\n\n# Filter the data to include only diagnosed cases\ndf['DIABDX_M18'] = df['DIABDX_M18'].apply(lambda x: True if x == '1' else False)\ndf['ASTHDX'] = df['ASTHDX'].apply(lambda x: True if x == '1' else False)\n\n# Calculate counts for the Venn diagram\ndiabetes_count = df['DIABDX_M18'].sum()\nasthma_count = df['ASTHDX'].sum()\nboth_count = (df['DIABDX_M18'] & df['ASTHDX']).sum()\n\n# Plot the Venn diagram\nplt.figure(figsize=(8, 8))\nvenn2(subsets=(diabetes_count - both_count, asthma_count - both_count, both_count),\n set_labels=('Diabetes', 'Asthma'))\n\n# Show the plot\nplt.title(\"Venn Diagram of People with Diabetes and Asthma\")\nplt.show()"
21
  },
22
  {
23
  "database": "MEPS",
24
+ "user_messages": "visualize a venn diagram for people that have diabetes and arthritis, using meps",
25
+ "example_code": "\nimport pandas as pd\nimport matplotlib.pyplot as plt\nfrom matplotlib_venn import venn2\n\n# Query the data for diabetes and arthritis\nquery = \"\"\"\n SELECT DUPERSID, DIABDX_M18, ARTHDX\n FROM h216_narrow\n\"\"\"\ndf = pd.read_sql(query, conn)\n\n# Filter and process the data\ndf = df[(df['DIABDX_M18'].isin(['1', '2'])) & (df['ARTHDX'].isin(['1', '2']))] # Filter valid responses\ndf['Diabetes'] = df['DIABDX_M18'].apply(lambda x: True if x == '1' else False) # Convert to boolean\ndf['Arthritis'] = df['ARTHDX'].apply(lambda x: True if x == '1' else False) # Convert to boolean\n\n# Calculate counts for the Venn diagram\ndiabetes_count = df['Diabetes'].sum()\narthritis_count = df['Arthritis'].sum()\nboth_count = (df['Diabetes'] & df['Arthritis']).sum()\n\n# Create and plot the Venn diagram\nplt.figure(figsize=(8, 8))\nvenn2(subsets=(diabetes_count - both_count, arthritis_count - both_count, both_count),\n set_labels=('Diabetes', 'Arthritis'))\nplt.title('Venn Diagram of People with Diabetes and Arthritis')\nplt.show()"
26
  },
27
  {
28
  "database": "MEPS",
29
+ "user_messages": "visualize a venn diagram for people that have diabetes and asthma, from meps",
30
+ "example_code": "\nimport pandas as pd\nimport matplotlib.pyplot as plt\nfrom matplotlib_venn import venn2\n\n# Query the data for diabetes and asthma\nquery = \"\"\"\n SELECT DUPERSID, DIABDX_M18, ASTHDX\n FROM h216_narrow\n\"\"\"\ndf = pd.read_sql(query, conn)\n\n# Filter the data to include only diagnosed cases\ndf['DIABDX_M18'] = df['DIABDX_M18'].apply(lambda x: True if x == '1' else False)\ndf['ASTHDX'] = df['ASTHDX'].apply(lambda x: True if x == '1' else False)\n\n# Calculate counts for the Venn diagram\ndiabetes_count = df['DIABDX_M18'].sum()\nasthma_count = df['ASTHDX'].sum()\nboth_count = (df['DIABDX_M18'] & df['ASTHDX']).sum()\n\n# Plot the Venn diagram\nplt.figure(figsize=(8, 8))\nvenn2(subsets=(diabetes_count - both_count, asthma_count - both_count, both_count),\n set_labels=('Diabetes', 'Asthma'))\n\n# Show the plot\nplt.title(\"Venn Diagram of People with Diabetes and Asthma\")\nplt.show()"
31
  },
32
  {
33
+ "database": "MEPS",
34
+ "user_messages": "\"give me the average pharmacy claim cost for people with and without diabetes\n\"\n",
35
+ "example_code": "\nimport pandas as pd\n\n# Query to calculate average pharmacy claim cost by diabetes status\nquery = \"\"\"\nSELECT\n CASE\n WHEN CAST(DIABDX_M18 AS INT) = 1 THEN 'With Diabetes'\n ELSE 'Without Diabetes'\n END AS diabetes_status,\n AVG(RXEXP19) AS avg_pharmacy_claim_cost\nFROM \n h216_narrow\nGROUP BY \n CASE\n WHEN CAST(DIABDX_M18 AS INT) = 1 THEN 'With Diabetes'\n ELSE 'Without Diabetes'\n END\n\"\"\"\n\ndf = pd.read_sql(query, conn)\n\n# Format and print results\nprint(\"Average Pharmacy Claim Costs (2019)\")\nprint(\"-----------------------------------\")\nfor _, row in df.iterrows():\n status = row['diabetes_status'].lower()\n cost = round(row['avg_pharmacy_claim_cost'], 2)\n print(f\"{status}: ${cost:,.2f}\")"
36
  },
37
  {
38
+ "database": "MEPS",
39
+ "user_messages": "what is the relationship between income and diabetes",
40
+ "example_code": "\nimport pandas as pd\n\n# Query to calculate correlation between income and diabetes\nquery = \"\"\"\nSELECT \n FAMINC19 AS income,\n CASE\n WHEN DIABDX_M18 = '1' THEN 1\n ELSE 0\n END AS diabetes\nFROM \n h216_narrow\nWHERE \n FAMINC19 IS NOT NULL AND DIABDX_M18 IS NOT NULL\n\"\"\"\n\ndf = pd.read_sql(query, conn)\n\n# Calculate correlation between income and diabetes\ncorrelation = df['income'].corr(df['diabetes'])\nprint(f\"Correlation between income and diabetes: {correlation:.4f}\")\n\n# Calculate average income for people with and without diabetes\navg_income_with_diabetes = df[df['diabetes'] == 1]['income'].mean()\navg_income_without_diabetes = df[df['diabetes'] == 0]['income'].mean()\n\n# Print average income for people with and without diabetes\nprint(f\"Average income for people with diabetes: ${avg_income_with_diabetes:,.2f}\")\nprint(f\"Average income for people without diabetes: ${avg_income_without_diabetes:,.2f}\")\n\n# Calculate percentage of people with diabetes at each income level\nincome_bins = [0, 25000, 50000, 75000, 100000, float('inf')]\ndf['income_bin'] = pd.cut(df['income'], bins=income_bins)\ndiabetes_rates = df.groupby('income_bin')['diabetes'].mean()\n\n# Print percentage of people with diabetes at each income level\nprint(\"\\nPercentage of people with diabetes at each income level:\")\nprint(diabetes_rates)"
41
+ },
42
+ {
43
+ "database": "MEPS",
44
+ "user_messages": "what is the relationship between income and diabetes",
45
+ "example_code": "\nimport pandas as pd\n\n# Query to calculate correlation between income and diabetes\nquery = \"\"\"\nSELECT \n FAMINC19 AS income,\n CASE\n WHEN DIABDX_M18 = '1' THEN 1\n ELSE 0\n END AS diabetes\nFROM \n h216_narrow\nWHERE \n FAMINC19 IS NOT NULL AND DIABDX_M18 IS NOT NULL\n\"\"\"\n\ndf = pd.read_sql(query, conn)\n\n# Calculate correlation between income and diabetes\ncorrelation = df['income'].corr(df['diabetes'])\nprint(f\"Correlation between income and diabetes: {correlation:.4f}\")\n\n# Calculate average income for people with and without diabetes\navg_income_with_diabetes = df[df['diabetes'] == 1]['income'].mean()\navg_income_without_diabetes = df[df['diabetes'] == 0]['income'].mean()\n\n# Print average income for people with and without diabetes\nprint(f\"Average income for people with diabetes: ${avg_income_with_diabetes:,.2f}\")\nprint(f\"Average income for people without diabetes: ${avg_income_without_diabetes:,.2f}\")\n\n# Create bins for income levels\nincome_bins = [0, 25000, 50000, 75000, 100000, float('inf')]\nincome_labels = ['<25k', '25k-50k', '50k-75k', '75k-100k', '100k+']\ndf['income_bin'] = pd.cut(df['income'], bins=income_bins, labels=income_labels)\n\n# Calculate percentage of people with diabetes at each income level\ndiabetes_rates = df.groupby('income_bin')['diabetes'].mean()\n\n# Print percentage of people with diabetes at each income level\nprint(\"\\nPercentage of people with diabetes at each income level:\")\nprint(diabetes_rates * 100) # Convert to percentage"
46
  },
47
  {
48
  "database": "MEPS",
49
  "user_messages": "how many diabetics in meps?",
50
+ "example_code": "\nimport pandas as pd\n\n# Query the data\ndiabetes_query = \"\"\"\n SELECT DIABDX_M18\n FROM h216_narrow\n\"\"\"\ndiabetes_df = pd.read_sql(diabetes_query, conn)\n\n# Filter the data\ndiabetes_df = diabetes_df[diabetes_df['DIABDX_M18'].notna()]\ndiabetes_df['DIABDX_M18'] = diabetes_df['DIABDX_M18'].apply(lambda x: True if x == '1' else False)\n\n# Get the count of diabetics\ndiabetic_count = diabetes_df['DIABDX_M18'].sum()\n\n# Calculate the percentage of diabetics\ntotal_count = len(diabetes_df)\ndiabetic_percentage = (diabetic_count / total_count) * 100\n\nprint(f\"The total count of diabetics in MEPS is: {diabetic_count}\")\nprint(f\"The percentage of diabetics in MEPS is: {diabetic_percentage:.2f}%\")"
51
  },
52
  {
53
  "database": "MEPS",
54
+ "user_messages": "give me a venn diagram for people with arthritis and those with diabetes, from meps",
55
+ "example_code": "\nimport pandas as pd\nimport matplotlib.pyplot as plt\nfrom matplotlib_venn import venn2\n\n# Query the data for diabetes and arthritis\nquery = \"\"\"\n SELECT DUPERSID, DIABDX_M18, ARTHDX\n FROM h216_narrow\n\"\"\"\ndf = pd.read_sql(query, conn)\n\n# Filter and process the data\ndf = df[(df['DIABDX_M18'].isin(['1', '2'])) & (df['ARTHDX'].isin(['1', '2']))] # Filter valid responses\ndf['Diabetes'] = df['DIABDX_M18'].apply(lambda x: True if x == '1' else False) # Convert to boolean\ndf['Arthritis'] = df['ARTHDX'].apply(lambda x: True if x == '1' else False) # Convert to boolean\n\n# Calculate counts for the Venn diagram\ndiabetes_count = df['Diabetes'].sum()\narthritis_count = df['Arthritis'].sum()\nboth_count = (df['Diabetes'] & df['Arthritis']).sum()\n\n# Create and plot the Venn diagram\nplt.figure(figsize=(8, 8))\nvenn2(subsets=(diabetes_count - both_count, arthritis_count - both_count, both_count),\n set_labels=('Diabetes', 'Arthritis'))\nplt.title('Venn Diagram of People with Diabetes and Arthritis')\nplt.show()"
56
  },
57
  {
58
+ "database": "MEPS",
59
+ "user_messages": "give me a venn diagram for people with asthma and those with arthritis, from meps",
60
+ "example_code": "\nimport pandas as pd\nimport matplotlib.pyplot as plt\nfrom matplotlib_venn import venn2\n\n# Query the data for asthma and arthritis\nquery = \"\"\"\n SELECT DUPERSID, ASTHDX, ARTHDX\n FROM h216_narrow\n\"\"\"\ndf = pd.read_sql(query, conn)\n\n# Filter and process the data\ndf = df[(df['ASTHDX'].isin(['1', '2'])) & (df['ARTHDX'].isin(['1', '2']))] # Filter valid responses\ndf['Asthma'] = df['ASTHDX'].apply(lambda x: True if x == '1' else False) # Convert to boolean\ndf['Arthritis'] = df['ARTHDX'].apply(lambda x: True if x == '1' else False) # Convert to boolean\n\n# Calculate counts for the Venn diagram\nasthma_count = df['Asthma'].sum()\narthritis_count = df['Arthritis'].sum()\nboth_count = (df['Asthma'] & df['Arthritis']).sum()\n\n# Create and plot the Venn diagram\nplt.figure(figsize=(8, 8))\nvenn2(subsets=(asthma_count - both_count, arthritis_count - both_count, both_count),\n set_labels=('Asthma', 'Arthritis'))\nplt.title('Venn Diagram of People with Asthma and Arthritis')\nplt.show()"
61
  },
62
  {
63
+ "database": "MEPS",
64
+ "user_messages": "give me the top 10 drugs, in terms of amount paid by self, for individuals with diabetes. ",
65
+ "example_code": "\nimport pandas as pd\n\n# Query the database to get the top 10 drugs paid by self for individuals with diabetes\nquery = \"\"\"\n SELECT TOP 10 \n RXDRGNAM,\n SUM(RXSF19X) AS total_self_paid\n FROM \n h213a\n INNER JOIN \n h216_narrow ON h213a.DUPERSID = h216_narrow.DUPERSID\n WHERE \n h216_narrow.DIABAGED > 0\n GROUP BY \n RXDRGNAM\n ORDER BY \n total_self_paid DESC\n\"\"\"\n\n# Read the query results into a pandas DataFrame\ndf = pd.read_sql(query, conn)\n\n# Print the top 10 drugs paid by self for individuals with diabetes\ndf"
66
  },
67
  {
68
  "database": "MEPS",
69
+ "user_messages": "from meps, give me the average total healthcare cost for people with diabetes and asthma, with just diabetes, and with neither\n\n",
70
+ "example_code": "\nimport pandas as pd\n\n# Query the database to get the average total healthcare cost for people with diabetes and asthma, with just diabetes, and with neither\nquery = \"\"\"\n SELECT \n CASE \n WHEN DIABAGED > 0 AND ASTHAGED > 0 THEN 'Diabetes and Asthma'\n WHEN DIABAGED > 0 THEN 'Diabetes Only'\n ELSE 'Neither'\n END AS condition_status,\n AVG(TOTEXP19) AS average_healthcare_cost\n FROM \n h216_narrow\n WHERE \n DIABAGED IS NOT NULL AND ASTHAGED IS NOT NULL\n GROUP BY \n CASE \n WHEN DIABAGED > 0 AND ASTHAGED > 0 THEN 'Diabetes and Asthma'\n WHEN DIABAGED > 0 THEN 'Diabetes Only'\n ELSE 'Neither'\n END\n\"\"\"\n\n# Read the query results into a pandas DataFrame\ndf = pd.read_sql(query, conn)\n\n# Print the average healthcare cost for people with diabetes and asthma, with just diabetes, and with neither\nfor index, row in df.iterrows():\n print(f\"Average total healthcare cost for people with {row['condition_status'].lower()}: ${round(row['average_healthcare_cost'], 2)}\")"
71
  },
72
  {
73
  "database": "MEPS",
74
+ "user_messages": "from meps, explore the relationship of income with total healthcare cost",
75
+ "example_code": "\nimport pandas as pd\nimport matplotlib.pyplot as plt\nfrom sklearn.linear_model import LinearRegression\nimport numpy as np\n\n# Query the database to get the income and total healthcare cost\nquery = \"\"\"\n SELECT \n FAMINC19,\n TOTEXP19\n FROM \n h216_narrow\n WHERE \n FAMINC19 IS NOT NULL AND TOTEXP19 IS NOT NULL\n\"\"\"\n\n# Read the query results into a pandas DataFrame\ndf = pd.read_sql(query, conn)\n\n# Calculate the correlation coefficient between income and total healthcare cost\ncorrelation_coefficient = df['FAMINC19'].corr(df['TOTEXP19'])\nprint(f\"Correlation coefficient between income and total healthcare cost: {round(correlation_coefficient, 2)}\")\n\n# Create a scatter plot to visualize the relationship\nplt.figure(figsize=(10, 6))\nplt.scatter(df['FAMINC19'], df['TOTEXP19'])\nplt.xlabel('Family Income')\nplt.ylabel('Total Healthcare Cost')\nplt.title('Relationship between Income and Total Healthcare Cost')\nplt.show()\n\n# Perform linear regression to model the relationship\nX = df['FAMINC19'].values.reshape(-1, 1)\ny = df['TOTEXP19'].values\nmodel = LinearRegression()\nmodel.fit(X, y)\nprint(f\"Coefficient of determination (R^2): {round(model.score(X, y), 2)}\")\n\n# Predict total healthcare cost for different income levels\nincome_levels = np.array([20000, 50000, 100000]).reshape(-1, 1)\npredicted_costs = model.predict(income_levels)\nfor i, income in enumerate(income_levels):\n print(f\"Predicted total healthcare cost for income ${income[0]}: ${round(predicted_costs[i], 2)}\")\n\n# Calculate the average total healthcare cost for different income quartiles\ndf['income_quartile'] = pd.qcut(df['FAMINC19'], q=4, labels=['Q1', 'Q2', 'Q3', 'Q4'])\naverage_costs = df.groupby('income_quartile')['TOTEXP19'].mean()\nprint(\"Average total healthcare cost for different income quartiles:\")\nfor index, row in average_costs.items():\n print(f\"{index}: ${round(row, 2)}\")"
76
  },
77
  {
78
  "database": "MEPS",
79
+ "user_messages": "give me a venn diagram for those with diabetes, asthma, and arthritis. Use DUPERSID to get counts of those that have both diabetes and asthma, etc.",
80
+ "example_code": "\nimport pandas as pd\nimport matplotlib.pyplot as plt\nfrom matplotlib_venn import venn3\n\n# Query the data for diabetes, asthma, and arthritis\nquery = \"\"\"\n SELECT DUPERSID, DIABDX_M18, ASTHDX, ARTHDX\n FROM h216_narrow\n\"\"\"\ndf = pd.read_sql(query, conn)\n\n# Filter the data to include only diagnosed cases\ndf['DIABDX_M18'] = df['DIABDX_M18'].apply(lambda x: True if x == '1' else False)\ndf['ASTHDX'] = df['ASTHDX'].apply(lambda x: True if x == '1' else False)\ndf['ARTHDX'] = df['ARTHDX'].apply(lambda x: True if x == '1' else False)\n\n# Calculate counts for the Venn diagram\ndiabetes_count = df['DIABDX_M18'].sum()\nasthma_count = df['ASTHDX'].sum()\narthritis_count = df['ARTHDX'].sum()\n\nboth_diabetes_asthma = (df['DIABDX_M18'] & df['ASTHDX']).sum()\nboth_diabetes_arthritis = (df['DIABDX_M18'] & df['ARTHDX']).sum()\nboth_asthma_arthritis = (df['ASTHDX'] & df['ARTHDX']).sum()\n\nall_three = (df['DIABDX_M18'] & df['ASTHDX'] & df['ARTHDX']).sum()\n\n# Calculate the subsets\ndiabetes_only = diabetes_count - both_diabetes_asthma - both_diabetes_arthritis + all_three\nasthma_only = asthma_count - both_diabetes_asthma - both_asthma_arthritis + all_three\narthritis_only = arthritis_count - both_diabetes_arthritis - both_asthma_arthritis + all_three\n\n# Plot the Venn diagram\nplt.figure(figsize=(8, 8))\nvenn3(subsets=(diabetes_only, asthma_only, both_diabetes_asthma - all_three, \n arthritis_only, both_diabetes_arthritis - all_three, both_asthma_arthritis - all_three, all_three),\n set_labels=('Diabetes', 'Asthma', 'Arthritis'))\n\n# Show the plot\nplt.title(\"Venn Diagram of People with Diabetes, Asthma, and Arthritis\")\nplt.show()"
81
  },
82
  {
83
  "database": "MEPS",
84
+ "user_messages": "can i get the portion of cost paid by self by insurance coverage type (private, public, uninsured)",
85
+ "example_code": "\nimport pandas as pd\nimport matplotlib.pyplot as plt\nimport numpy as np\n\n# Query the database to get the required data\nquery = \"\"\"\n SELECT \n CASE\n WHEN INSCOV19 = 1 THEN 'Private'\n WHEN INSCOV19 = 2 THEN 'Public'\n WHEN INSCOV19 = 3 THEN 'Uninsured'\n ELSE 'Other'\n END AS InsuranceCoverage,\n CASE\n WHEN TOTEXP19 = 0 THEN NULL -- avoid divide by zero\n ELSE TOTSLF19 / TOTEXP19\n END AS PortionOfCostPaidBySelf\n FROM \n h216_narrow\n WHERE \n TOTEXP19 IS NOT NULL AND TOTSLF19 IS NOT NULL\n\"\"\"\n\n# Read the query results into a pandas DataFrame\ndf = pd.read_sql(query, conn)\ndf = df.dropna() # drop any rows with missing values (i.e., TOTEXP19 = 0)\n\n# Calculate the average portion of cost paid by self by insurance coverage type\naverage_portion_paid = df.groupby('InsuranceCoverage')['PortionOfCostPaidBySelf'].mean()\nprint(\"Average portion of cost paid by self by insurance coverage type:\")\nfor index, row in average_portion_paid.items():\n print(f\"{index}: {round(row, 2)}\")\n\n# Create a bar chart to visualize the average portion of cost paid by self by insurance coverage type\nplt.figure(figsize=(10, 6))\nplt.bar(average_portion_paid.index, average_portion_paid.values)\nplt.xlabel('Insurance Coverage')\nplt.ylabel('Average Portion of Cost Paid by Self')\nplt.title('Average Portion of Cost Paid by Self by Insurance Coverage Type')\nplt.show()\n\n# Calculate the median portion of cost paid by self by insurance coverage type\nmedian_portion_paid = df.groupby('InsuranceCoverage')['PortionOfCostPaidBySelf'].median()\nprint(\"Median portion of cost paid by self by insurance coverage type:\")\nfor index, row in median_portion_paid.items():\n print(f\"{index}: {round(row, 2)}\")\n\n# Calculate the standard deviation of portion of cost paid by self by insurance coverage type\nstd_dev_portion_paid = df.groupby('InsuranceCoverage')['PortionOfCostPaidBySelf'].std()\nprint(\"Standard deviation of portion of cost paid by self by insurance coverage type:\")\nfor index, row in std_dev_portion_paid.items():\n print(f\"{index}: {round(row, 2)}\")"
86
+ },
87
+ {
88
+ "database": "capgpbsh_vg_chinook",
89
+ "user_messages": "how many artists are there?",
90
+ "example_code": "\nimport pandas as pd\n\n# Query to count the number of artists\nquery = \"SELECT COUNT(ArtistId) FROM Artist\"\n\n# Use pd.read_sql to execute the query and get the result\nresult = pd.read_sql(query, conn)\n\n# Print the result\nprint(\"Number of artists:\", result.iloc[0, 0])"
91
+ },
92
+ {
93
+ "database": "capgpbsh_vg_chinook",
94
+ "user_messages": "how many artists are there?",
95
+ "example_code": "\nimport pandas as pd\n\n# Query to count the number of artists\nquery = \"SELECT COUNT(ArtistId) AS ArtistCount FROM Artist\"\n\n# Use pd.read_sql to execute the query and get the result\nresult = pd.read_sql(query, conn)\n\n# Print the result\nprint(\"Number of artists:\", result.iloc[0, 0])"
96
+ },
97
+ {
98
+ "database": "capgpbsh_vg_chinook",
99
+ "user_messages": "Rank the top selling albums by that famous irish band.",
100
+ "example_code": "\n# Import necessary libraries\nimport pandas as pd\n\n# Query to find the ArtistId of U2\nquery_artist_id = \"SELECT ArtistId FROM Artist WHERE Name = 'U2'\"\n\n# Use pd.read_sql to execute the query and get the ArtistId of U2\nartist_id = pd.read_sql(query_artist_id, conn).iloc[0, 0]\n\n# Query to find the top selling albums of U2\nquery_top_albums = \"\"\"\n SELECT TOP 10 a.Title, SUM(il.Quantity) AS TotalSales\n FROM InvoiceLine il\n JOIN Track t ON il.TrackId = t.TrackId\n JOIN Album a ON t.AlbumId = a.AlbumId\n WHERE a.ArtistId = %s\n GROUP BY a.Title\n ORDER BY SUM(il.Quantity) DESC\n\"\"\" % artist_id\n\n# Use pd.read_sql to execute the query and get the top selling albums\ntop_albums = pd.read_sql(query_top_albums, conn)\n\n# Print the result\nprint(\"Top selling albums by U2:\")\ntop_albums"
101
+ },
102
+ {
103
+ "database": "capgpbsh_vg_chinook",
104
+ "user_messages": "Give me a bar plot of sales, by album, of U2; use plotly",
105
+ "example_code": "\nimport pandas as pd\nimport plotly.express as px\n\n# Query to find the ArtistId of U2\nquery_artist_id = \"SELECT ArtistId FROM Artist WHERE Name = 'U2'\"\nartist_id = pd.read_sql(query_artist_id, conn).iloc[0, 0]\n\n# Query to find the sales by album for U2\nquery_album_sales = \"\"\"\n SELECT a.Title, SUM(il.Quantity) AS TotalSales\n FROM Album a\n JOIN Artist ar ON a.ArtistId = ar.ArtistId\n JOIN Track t ON a.AlbumId = t.AlbumId\n JOIN InvoiceLine il ON t.TrackId = il.TrackId\n WHERE ar.ArtistId = %s\n GROUP BY a.Title\n ORDER BY SUM(il.Quantity) DESC\n\"\"\" % artist_id\n\n# Use pd.read_sql to execute the query and get the sales by album for U2\nalbum_sales = pd.read_sql(query_album_sales, conn)\n\n# Create a bar plot of sales by album using Plotly\nfig = px.bar(album_sales, x='Title', y='TotalSales', title=\"Album Sales by U2\")\nfig.update_layout(xaxis_title=\"Album Title\", yaxis_title=\"Total Sales\")\nfig.show()"
106
+ },
107
+ {
108
+ "database": "capgpbsh_vg_chinook",
109
+ "user_messages": "from the chinook db, can you plot the number of customers by country on a map, and color code it by number of customers (white for few, and green for more)?",
110
+ "example_code": "\nimport pandas as pd\nimport plotly.graph_objects as go\n\n# Query to find the number of customers by country\nquery_customer_country = \"\"\"\n SELECT Country, COUNT(CustomerId) AS NumberOfCustomers\n FROM Customer\n GROUP BY Country\n\"\"\"\n\n# Use pd.read_sql to execute the query and get the number of customers by country\ncustomer_country = pd.read_sql(query_customer_country, conn)\n\n# Create a world map plot of customers by country using Plotly\nfig = go.Figure(data=go.Choropleth(\n locations=customer_country['Country'], # Spatial coordinates\n z = customer_country['NumberOfCustomers'].astype(float), # Data to color by\n locationmode = 'country names', # Set of locations match entries in `locations`\n autocolorscale=False,\n colorscale='Greens',\n colorbar_title = \"Number of Customers\"\n))\n\nfig.update_layout(\n title_text='Number of Customers by Country',\n geo_scope='world',\n)\n\nfig.show()"
111
+ },
112
+ {
113
+ "database": "capgpbsh_vg_chinook",
114
+ "user_messages": "from the chinook db, can you plot the number of customers by country on a map, and color code it by number of customers (white for few, and green for more)?",
115
+ "example_code": "\n# Import necessary libraries\nimport pandas as pd\nimport plotly.graph_objects as go\n\n# Query to find the number of customers by country\nquery_customer_country = \"\"\"\n SELECT Country, COUNT(CustomerId) AS NumberOfCustomers\n FROM Customer\n GROUP BY Country\n\"\"\"\n\n# Use pd.read_sql to execute the query and get the number of customers by country\ncustomer_country = pd.read_sql(query_customer_country, conn)\n\n# Create a world map plot of customers by country using Plotly\nfig = go.Figure(data=go.Choropleth(\n locations=customer_country['Country'], # Spatial coordinates\n z = customer_country['NumberOfCustomers'].astype(float), # Data to color by\n locationmode = 'country names', # Set of locations match entries in `locations`\n autocolorscale=False,\n colorscale='Greens', # Set colorscale to 'Greens' to color code by number of customers\n reversescale=False, # Set reversescale to False to map low values to white and high values to green\n marker_line_color='darkgray', # Set marker line color to darkgray for better visibility\n marker_line_width=0.5, # Set marker line width to 0.5 for better visibility\n colorbar_title = \"Number of Customers\"\n))\n\nfig.update_layout(\n title_text='Number of Customers by Country',\n geo_scope='world',\n)\n\n# Show the plot\nfig.show()"
116
+ },
117
+ {
118
+ "database": "capgpbsh_vg_chinook",
119
+ "user_messages": "Give me a word cloud for genre\n",
120
+ "example_code": "\nimport pandas as pd\nfrom wordcloud import WordCloud\nimport matplotlib.pyplot as plt\n\n# Query to get all genres\nquery = \"SELECT Name FROM Genre\"\n\n# Use pd.read_sql to execute the query and get the result\ngenres = pd.read_sql(query, conn)\n\n# Join all genres into a single string\ngenre_string = ' '.join(genres['Name'].tolist())\n\n# Create a word cloud\nwordcloud = WordCloud(width=800, height=400).generate(genre_string)\n\n# Display the word cloud\nplt.figure(figsize=(10, 5))\nplt.imshow(wordcloud, interpolation='bilinear')\nplt.axis('off')\nplt.show()"
121
+ },
122
+ {
123
+ "database": "capgpbsh_vg_chinook",
124
+ "user_messages": "what are the top 3 selling genre in chinook data?\n",
125
+ "example_code": "\n# Import necessary libraries\nimport pandas as pd\n\n# Query to find the top 3 selling genres\nquery_top_genres = \"\"\"\n SELECT g.Name, SUM(il.Quantity) AS TotalSales\n FROM Genre g\n JOIN Track t ON g.GenreId = t.GenreId\n JOIN InvoiceLine il ON t.TrackId = il.TrackId\n GROUP BY g.Name\n ORDER BY SUM(il.Quantity) DESC\n LIMIT 3\n\"\"\"\n\n# Use pd.read_sql to execute the query and get the top selling genres\ntop_genres = pd.read_sql(query_top_genres, conn)\n\n# Print the result\nprint(\"Top 3 selling genres:\")\nprint(top_genres)"
126
+ },
127
+ {
128
+ "database": "capgpbsh_vg_chinook",
129
+ "user_messages": "Give me the top 10 selling artists.",
130
+ "example_code": "\nimport pandas as pd\n\n# Query to get the top 10 selling artists\nquery = \"\"\"\n SELECT a.Name, SUM(il.Quantity) as TotalSold\n FROM Artist a\n JOIN Album al ON a.ArtistId = al.ArtistId\n JOIN Track t ON al.AlbumId = t.AlbumId\n JOIN InvoiceLine il ON t.TrackId = il.TrackId\n GROUP BY a.Name\n ORDER BY TotalSold DESC\n LIMIT 10\n\"\"\"\n\n# Use pd.read_sql to execute the query and get the result\nresult = pd.read_sql(query, conn)\n\n# Print the result\nprint(result)"
131
+ },
132
+ {
133
+ "database": "capgpbsh_vg_chinook",
134
+ "user_messages": "rank the employees making the most sales",
135
+ "example_code": "\n# Import necessary libraries\nimport pandas as pd\n\n# Query to find the top selling employees\nquery_top_employees = \"\"\"\n SELECT e.FirstName, e.LastName, SUM(i.Total) AS TotalSales\n FROM Invoice i\n JOIN Customer c ON i.CustomerId = c.CustomerId\n JOIN Employee e ON c.SupportRepId = e.EmployeeId\n GROUP BY e.FirstName, e.LastName\n ORDER BY SUM(i.Total) DESC\n\"\"\"\n\n# Use pd.read_sql to execute the query and get the top selling employees\ntop_employees = pd.read_sql(query_top_employees, conn)\n\n# Print the result\nprint(\"Top selling employees:\")\nprint(top_employees)"
136
+ },
137
+ {
138
+ "database": "MEPS",
139
+ "user_messages": "give me a visual to understand the total healthcare spend, and the medical and pharmacy as proportions of total\n",
140
+ "example_code": "\nimport pandas as pd\nimport matplotlib.pyplot as plt\n\n# Query the database to get the total healthcare spend and pharmacy spend\nquery = \"\"\"\n SELECT \n TOTEXP19,\n RXEXP19\n FROM \n h216_narrow\n WHERE \n TOTEXP19 IS NOT NULL AND RXEXP19 IS NOT NULL\n\"\"\"\n\n# Read the query results into a pandas DataFrame\ndf = pd.read_sql(query, conn)\n\n# Calculate the total healthcare spend and pharmacy spend\ntotal_healthcare_spend = df['TOTEXP19'].sum()\npharmacy_spend = df['RXEXP19'].sum()\n\n# Calculate the proportions of total healthcare spend\npharmacy_proportion = (pharmacy_spend / total_healthcare_spend) * 100\n\n# Calculate the medical spend as the difference between total healthcare spend and pharmacy spend\nmedical_spend = total_healthcare_spend - pharmacy_spend\nmedical_proportion = (medical_spend / total_healthcare_spend) * 100\n\n# Create a pie chart to visualize the proportions\nlabels = ['Medical', 'Pharmacy']\nsizes = [medical_proportion, pharmacy_proportion]\nplt.pie(sizes, labels=labels, autopct='%1.1f%%')\nplt.title('Proportion of Total Healthcare Spend')\nplt.show()\n\nprint(f\"Total healthcare spend: ${total_healthcare_spend:,.2f}\")\nprint(f\"Pharmacy spend as proportion of total healthcare spend: {pharmacy_proportion:.2f}%\")\nprint(f\"Medical spend as proportion of total healthcare spend: {medical_proportion:.2f}%\")"
141
+ },
142
+ {
143
+ "database": "MEPS",
144
+ "user_messages": "give me a visual to understand the total healthcare spend, and the medical and pharmacy as proportions of total\n",
145
+ "example_code": "\nimport pandas as pd\nimport matplotlib.pyplot as plt\n\nquery = \"\"\"\n SELECT \n SUM(TOTEXP19) AS total_healthcare_spend,\n SUM(RXEXP19) AS pharmacy_spend\n FROM \n h216_narrow\n WHERE \n TOTEXP19 IS NOT NULL AND RXEXP19 IS NOT NULL\n\"\"\"\n\n# Read the query results into a pandas DataFrame\ndf = pd.read_sql(query, conn)\n\n# Since the DataFrame has only one row, we can access the values directly\ntotal_healthcare_spend = df['total_healthcare_spend'].iloc[0]\npharmacy_spend = df['pharmacy_spend'].iloc[0]\n\n# Calculate the proportions of total healthcare spend\npharmacy_proportion = (pharmacy_spend / total_healthcare_spend) * 100\n\n# Calculate the medical spend as the difference between total healthcare spend and pharmacy spend\nmedical_spend = total_healthcare_spend - pharmacy_spend\nmedical_proportion = (medical_spend / total_healthcare_spend) * 100\n\n# Create a pie chart to visualize the proportions\nlabels = ['Medical', 'Pharmacy']\nsizes = [medical_proportion, pharmacy_proportion]\nplt.pie(sizes, labels=labels, autopct='%1.1f%%')\nplt.title('Proportion of Total Healthcare Spend')\nplt.show()\n\nprint(f\"Total healthcare spend: ${total_healthcare_spend:,.2f}\")\nprint(f\"Pharmacy spend as proportion of total healthcare spend: {pharmacy_proportion:.2f}%\")\nprint(f\"Medical spend as proportion of total healthcare spend: {medical_proportion:.2f}%\")"
146
+ },
147
+ {
148
+ "database": "freedb_vg_demo",
149
+ "user_messages": "how many artists in data?",
150
+ "example_code": "\nimport pandas as pd\n\n# Query the Artist table to get the number of artists\nartist_count_df = pd.read_sql_query(\"SELECT COUNT(*) FROM Artist\", conn)\n\n# Get the count from the DataFrame\nartist_count = artist_count_df.iloc[0, 0]\n\nprint(\"Number of artists in the data:\", artist_count)"
151
+ },
152
+ {
153
+ "database": "freedb_vg_demo",
154
+ "user_messages": "what are the top 3 selling genre in chinook data?\n",
155
+ "example_code": "\nimport pandas as pd\n\n# Define the query to get the top 3 selling genres\nquery = \"\"\"\n SELECT T.GenreId, G.Name, SUM(IL.Quantity) AS TotalSales\n FROM Track T\n JOIN Genre G ON T.GenreId = G.GenreId\n JOIN InvoiceLine IL ON T.TrackId = IL.TrackId\n GROUP BY T.GenreId, G.Name\n ORDER BY TotalSales DESC\n LIMIT 3\n\"\"\"\n\n# Use pd.read_sql_query to execute the query\ntop_selling_genres_df = pd.read_sql_query(query, conn)\n\n# Print the top 3 selling genres\nprint(top_selling_genres_df)"
156
+ },
157
+ {
158
+ "database": "freedb_vg_demo",
159
+ "user_messages": "Give me the top 10 selling artists.\n",
160
+ "example_code": "\nimport pandas as pd\n\n# Query the tables to get the top 10 selling artists\ntop_artists_df = pd.read_sql_query(\"\"\"\n SELECT a.Name, SUM(il.UnitPrice * il.Quantity) as TotalSales\n FROM InvoiceLine il\n JOIN Track t ON il.TrackId = t.TrackId\n JOIN Album alb ON t.AlbumId = alb.AlbumId\n JOIN Artist a ON alb.ArtistId = a.ArtistId\n GROUP BY a.Name\n ORDER BY TotalSales DESC\n LIMIT 10\n\"\"\", conn)\n\n# Print the top 10 selling artists\nprint(top_artists_df)"
161
+ },
162
+ {
163
+ "database": "freedb_vg_demo",
164
+ "user_messages": "Give me a bar plot of sales, by album, of U2\n",
165
+ "example_code": "\nimport pandas as pd\nimport plotly.express as px\n\n# Query the tables to get the sales by album for U2\nu2_album_sales_df = pd.read_sql_query(\"\"\"\n SELECT alb.Title, SUM(il.UnitPrice * il.Quantity) as TotalSales\n FROM InvoiceLine il\n JOIN Track t ON il.TrackId = t.TrackId\n JOIN Album alb ON t.AlbumId = alb.AlbumId\n JOIN Artist a ON alb.ArtistId = a.ArtistId\n WHERE a.Name = 'U2'\n GROUP BY alb.Title\n ORDER BY TotalSales DESC\n\"\"\", conn)\n\n# Create a bar plot of sales by album\nfig = px.bar(u2_album_sales_df, x='Title', y='TotalSales', title='Sales by Album for U2')\nfig.show()"
166
+ },
167
+ {
168
+ "database": "freedb_vg_demo",
169
+ "user_messages": "from the chinook db, can you plot the number of customers by country on a map, and color code it by number of customers (white for few, and green for more)?\n",
170
+ "example_code": "\nimport pandas as pd\nimport plotly.express as px\n\n# Define the query to get the number of customers by country\nquery = \"\"\"\n SELECT Country, COUNT(*) AS NumberOfCustomers\n FROM Customer\n GROUP BY Country\n\"\"\"\n\n# Use pd.read_sql_query to execute the query\ncustomers_by_country_df = pd.read_sql_query(query, conn)\n\n# Plot the number of customers by country using Plotly Express\nfig = px.choropleth(customers_by_country_df, locations=\"Country\", \n color=\"NumberOfCustomers\", \n color_continuous_scale='Greens', \n locationmode='country names')\n\nfig.update_layout(title='Number of Customers by Country')\nfig.show()"
171
  }
172
  ]