VerbaGPT_Demo001

Runtime error

App Files Files Community

predictivemodeler commited on Mar 15, 2025

Commit

a0281ea

verified ·

1 Parent(s): 3dbdbe7

Upload 3 files

Browse files

Files changed (3) hide show

vg_/csvs/create_embeddings.txt +470 -1
vg_/csvs/flagged_examples.json +7 -0
vg_/csvs/training_examples.json +126 -36

vg_/csvs/create_embeddings.txt CHANGED Viewed

@@ -1598,7 +1598,7 @@ MUZAYAN_GRAM\SQLEXPRESS|MEPS|h214|ICD10CDX|nvarchar|G47|Value|
 MUZAYAN_GRAM\SQLEXPRESS|MEPS|h214|ICD10CDX|nvarchar|T78|Value|
 MUZAYAN_GRAM\SQLEXPRESS|MEPS|h214|ICD10CDX|nvarchar|M79|Value|
 MUZAYAN_GRAM\SQLEXPRESS|MEPS|h214|ICD10CDX|nvarchar|E07|Value|
-MUZAYAN_GRAM\SQLEXPRESS|MEPS|h216_narrow|ACTLIM31|smallint||Column|Indicates any limitation in work, housework, or school activities (Round 3/1). Values: 1 = Yes, 2 = No.
 MUZAYAN_GRAM\SQLEXPRESS|MEPS|h216_narrow|ADACLS42|smallint||Column|Frequency of feeling accomplished less due to physical problems in the past 4 weeks (VR -12).
 MUZAYAN_GRAM\SQLEXPRESS|MEPS|h216_narrow|ADCLIM42|smallint||Column|Health limits climbing stairs (VR-12 scale, past 4 weeks).
 MUZAYAN_GRAM\SQLEXPRESS|MEPS|h216_narrow|ADDAYA42|smallint||Column|Delay in medical care due to cost (1=Yes, 2=No, -8=Don't Know, -7=Refused, -1=Inapplicable).
@@ -1750,6 +1750,7 @@ MUZAYAN_GRAM\SQLEXPRESS|MEPS|h216_narrow|WAGEP19X|int||Column|Numeric; person's
 MUZAYAN_GRAM\SQLEXPRESS|MEPS|h216_narrow|WLKDIF31|smallint||Column|Difficulty walking: -8 DK, -1 Inapplicable, 1 No difficulty, 2 Some difficulty, 3 A lot of difficulty, 4 Unable to do.
 MUZAYAN_GRAM\SQLEXPRESS|MEPS|h216_narrow|WLKLIM31|smallint||Column|2.0 numeric: -8=Don�t know, -1=Inapplicable, 1=Yes, 2=No. Indicates child�s health limitation status.
 MUZAYAN_GRAM\SQLEXPRESS|MEPS|h216_narrow|WRKLIM31|smallint||Column|Indicates work limitations (-8=Don't know, -7=Refused, -1=Inapplicable, 1=Yes, 2=No).
 MUZAYAN_GRAM\SQLEXPRESS|MEPS|h216_narrow||||Table|This is the main table, has enrollment, expenditure, and key demographic information. Can be linked to pharmacy and medical tables if needed using DUPERSID.
 MUZAYAN_GRAM\SQLEXPRESS|IRS|incyallagi21||||Db_File|The Statistics of Income (SOI) Division�s county data is tabulated using individual income tax returns (Forms 1040) filed with the Internal Revenue Service (IRS) during the 12-month period, January 1, 2022 to December 31, 2022.
 MUZAYAN_GRAM\SQLEXPRESS|IRS|incyallagi21||||Table|All money amounts are in thousands of dollars (so 1 = 1,000). To calculate averages, divide by number of returns (N1)
@@ -1924,3 +1925,471 @@ MUZAYAN_GRAM\SQLEXPRESS|IRS|incyallagi21|N11902|Num||Column|Number of returns wi
 MUZAYAN_GRAM\SQLEXPRESS|IRS|incyallagi21|A11902|Num||Column|Overpayments refunded amount [22] ; 1040:35a
 MUZAYAN_GRAM\SQLEXPRESS|IRS|incyallagi21|N12000|Num||Column|Number of returns with credit to next year�s estimated tax; 1040:36
 MUZAYAN_GRAM\SQLEXPRESS|IRS|incyallagi21|A12000|Num||Column|Credited to next year�s estimated tax amount; 1040:36

 MUZAYAN_GRAM\SQLEXPRESS|MEPS|h214|ICD10CDX|nvarchar|T78|Value|
 MUZAYAN_GRAM\SQLEXPRESS|MEPS|h214|ICD10CDX|nvarchar|M79|Value|
 MUZAYAN_GRAM\SQLEXPRESS|MEPS|h214|ICD10CDX|nvarchar|E07|Value|
+MUZAYAN_GRAM\SQLEXPRESS|MEPS|h216_narrow|ACTLIM31|smallint||Column|Indicates any limitation in work, housework, or school activities. Values: 1 = Yes, 2 = No.
 MUZAYAN_GRAM\SQLEXPRESS|MEPS|h216_narrow|ADACLS42|smallint||Column|Frequency of feeling accomplished less due to physical problems in the past 4 weeks (VR -12).
 MUZAYAN_GRAM\SQLEXPRESS|MEPS|h216_narrow|ADCLIM42|smallint||Column|Health limits climbing stairs (VR-12 scale, past 4 weeks).
 MUZAYAN_GRAM\SQLEXPRESS|MEPS|h216_narrow|ADDAYA42|smallint||Column|Delay in medical care due to cost (1=Yes, 2=No, -8=Don't Know, -7=Refused, -1=Inapplicable).
 MUZAYAN_GRAM\SQLEXPRESS|MEPS|h216_narrow|WLKDIF31|smallint||Column|Difficulty walking: -8 DK, -1 Inapplicable, 1 No difficulty, 2 Some difficulty, 3 A lot of difficulty, 4 Unable to do.
 MUZAYAN_GRAM\SQLEXPRESS|MEPS|h216_narrow|WLKLIM31|smallint||Column|2.0 numeric: -8=Don�t know, -1=Inapplicable, 1=Yes, 2=No. Indicates child�s health limitation status.
 MUZAYAN_GRAM\SQLEXPRESS|MEPS|h216_narrow|WRKLIM31|smallint||Column|Indicates work limitations (-8=Don't know, -7=Refused, -1=Inapplicable, 1=Yes, 2=No).
+MUZAYAN_GRAM\SQLEXPRESS|MEPS|h216_narrow|PERWT19F|float||Column|Person-level weight that allows us to extrapolate to the US population.
 MUZAYAN_GRAM\SQLEXPRESS|MEPS|h216_narrow||||Table|This is the main table, has enrollment, expenditure, and key demographic information. Can be linked to pharmacy and medical tables if needed using DUPERSID.
 MUZAYAN_GRAM\SQLEXPRESS|IRS|incyallagi21||||Db_File|The Statistics of Income (SOI) Division�s county data is tabulated using individual income tax returns (Forms 1040) filed with the Internal Revenue Service (IRS) during the 12-month period, January 1, 2022 to December 31, 2022.
 MUZAYAN_GRAM\SQLEXPRESS|IRS|incyallagi21||||Table|All money amounts are in thousands of dollars (so 1 = 1,000). To calculate averages, divide by number of returns (N1)
 MUZAYAN_GRAM\SQLEXPRESS|IRS|incyallagi21|A11902|Num||Column|Overpayments refunded amount [22] ; 1040:35a
 MUZAYAN_GRAM\SQLEXPRESS|IRS|incyallagi21|N12000|Num||Column|Number of returns with credit to next year�s estimated tax; 1040:36
 MUZAYAN_GRAM\SQLEXPRESS|IRS|incyallagi21|A12000|Num||Column|Credited to next year�s estimated tax amount; 1040:36
+MySQL-001|freedb_vg_demo|Album|AlbumId|int||Column|
+MySQL-001|freedb_vg_demo|Album|ArtistId|int||Column|
+MySQL-001|freedb_vg_demo|Album|Title|nvarchar||Column|
+MySQL-001|freedb_vg_demo|Album||||Table|
+MySQL-001|freedb_vg_demo|Artist|ArtistId|int||Column|
+MySQL-001|freedb_vg_demo|Artist|Name|nvarchar||Column|
+MySQL-001|freedb_vg_demo|Artist||||Table|
+MySQL-001|freedb_vg_demo|Customer|Address|nvarchar||Column|
+MySQL-001|freedb_vg_demo|Customer|City|nvarchar||Column|
+MySQL-001|freedb_vg_demo|Customer|Company|nvarchar||Column|
+MySQL-001|freedb_vg_demo|Customer|Country|nvarchar||Column|
+MySQL-001|freedb_vg_demo|Customer|CustomerId|int||Column|
+MySQL-001|freedb_vg_demo|Customer|Email|nvarchar||Column|
+MySQL-001|freedb_vg_demo|Customer|Fax|nvarchar||Column|
+MySQL-001|freedb_vg_demo|Customer|FirstName|nvarchar||Column|
+MySQL-001|freedb_vg_demo|Customer|LastName|nvarchar||Column|
+MySQL-001|freedb_vg_demo|Customer|Phone|nvarchar||Column|
+MySQL-001|freedb_vg_demo|Customer|PostalCode|nvarchar||Column|
+MySQL-001|freedb_vg_demo|Customer|State|nvarchar||Column|
+MySQL-001|freedb_vg_demo|Customer|SupportRepId|int||Column|
+MySQL-001|freedb_vg_demo|Customer||||Table|
+MySQL-001|freedb_vg_demo|Employee|Address|nvarchar||Column|
+MySQL-001|freedb_vg_demo|Employee|BirthDate|datetime||Column|
+MySQL-001|freedb_vg_demo|Employee|City|nvarchar||Column|
+MySQL-001|freedb_vg_demo|Employee|Country|nvarchar||Column|
+MySQL-001|freedb_vg_demo|Employee|Email|nvarchar||Column|
+MySQL-001|freedb_vg_demo|Employee|EmployeeId|int||Column|
+MySQL-001|freedb_vg_demo|Employee|Fax|nvarchar||Column|
+MySQL-001|freedb_vg_demo|Employee|FirstName|nvarchar||Column|
+MySQL-001|freedb_vg_demo|Employee|HireDate|datetime||Column|
+MySQL-001|freedb_vg_demo|Employee|LastName|nvarchar||Column|
+MySQL-001|freedb_vg_demo|Employee|Phone|nvarchar||Column|
+MySQL-001|freedb_vg_demo|Employee|PostalCode|nvarchar||Column|
+MySQL-001|freedb_vg_demo|Employee|ReportsTo|int||Column|
+MySQL-001|freedb_vg_demo|Employee|State|nvarchar||Column|
+MySQL-001|freedb_vg_demo|Employee|Title|nvarchar||Column|
+MySQL-001|freedb_vg_demo|Employee||||Table|
+MySQL-001|freedb_vg_demo|Genre|GenreId|int||Column|
+MySQL-001|freedb_vg_demo|Genre|Name|nvarchar||Column|
+MySQL-001|freedb_vg_demo|Genre||||Table|
+MySQL-001|freedb_vg_demo|Invoice|BillingAddress|nvarchar||Column|
+MySQL-001|freedb_vg_demo|Invoice|BillingCity|nvarchar||Column|
+MySQL-001|freedb_vg_demo|Invoice|BillingCountry|nvarchar||Column|
+MySQL-001|freedb_vg_demo|Invoice|BillingPostalCode|nvarchar||Column|
+MySQL-001|freedb_vg_demo|Invoice|BillingState|nvarchar||Column|
+MySQL-001|freedb_vg_demo|Invoice|CustomerId|int||Column|
+MySQL-001|freedb_vg_demo|Invoice|InvoiceDate|datetime||Column|
+MySQL-001|freedb_vg_demo|Invoice|InvoiceId|int||Column|
+MySQL-001|freedb_vg_demo|Invoice|Total|numeric||Column|Total sale
+MySQL-001|freedb_vg_demo|Invoice||||Table|Invoices for years 2009 to 2013
+MySQL-001|freedb_vg_demo|InvoiceLine|InvoiceId|int||Column|
+MySQL-001|freedb_vg_demo|InvoiceLine|InvoiceLineId|int||Column|
+MySQL-001|freedb_vg_demo|InvoiceLine|Quantity|int||Column|
+MySQL-001|freedb_vg_demo|InvoiceLine|TrackId|int||Column|
+MySQL-001|freedb_vg_demo|InvoiceLine|UnitPrice|numeric||Column|
+MySQL-001|freedb_vg_demo|InvoiceLine||||Table|Detailed line items for Invoices
+MySQL-001|freedb_vg_demo|MediaType|MediaTypeId|int||Column|
+MySQL-001|freedb_vg_demo|MediaType|Name|nvarchar||Column|
+MySQL-001|freedb_vg_demo|MediaType||||Table|
+MySQL-001|freedb_vg_demo|Playlist|Name|nvarchar||Column|
+MySQL-001|freedb_vg_demo|Playlist|PlaylistId|int||Column|
+MySQL-001|freedb_vg_demo|Playlist||||Table|
+MySQL-001|freedb_vg_demo|PlaylistTrack|PlaylistId|int||Column|
+MySQL-001|freedb_vg_demo|PlaylistTrack|TrackId|int||Column|
+MySQL-001|freedb_vg_demo|PlaylistTrack||||Table|
+MySQL-001|freedb_vg_demo|Track|AlbumId|int||Column|
+MySQL-001|freedb_vg_demo|Track|Bytes|int||Column|
+MySQL-001|freedb_vg_demo|Track|Composer|nvarchar||Column|
+MySQL-001|freedb_vg_demo|Track|GenreId|int||Column|
+MySQL-001|freedb_vg_demo|Track|MediaTypeId|int||Column|
+MySQL-001|freedb_vg_demo|Track|Milliseconds|int||Column|
+MySQL-001|freedb_vg_demo|Track|Name|nvarchar||Column|
+MySQL-001|freedb_vg_demo|Track|TrackId|int||Column|
+MySQL-001|freedb_vg_demo|Track|UnitPrice|numeric||Column|
+MySQL-001|freedb_vg_demo|Track||||Table|
+MySQL-001|freedb_vg_demo|||||Db_File|The Chinook data model represents a digital media store, including tables for artists, albums, media tracks, invoices, and customers.
+MySQL-001|freedb_vg_demo|Track|Name|nvarchar||Value|
+MySQL-001|vg_meps01|||||Db_File|This is the Medical Expenditure Panel Survey Database. Has enrollment, medical and pharmacy tables.
+MySQL-001|vg_meps01|h213a|DIABEQUIP|tinyint||Column|Indicates presence of diabetic equipment/supplies (1) or other items (2).
+MySQL-001|vg_meps01|h213a|DRUGIDX|bigint||Column|Unique identifier combining person ID and a counter for drug records.
+MySQL-001|vg_meps01|h213a|DUID|int||Column|Panel number concatenated with encrypted household identifier.
+MySQL-001|vg_meps01|h213a|DUPERSID|bigint||Column|Unique identifier combining DUID and PID.
+MySQL-001|vg_meps01|h213a|IMPFLAG|tinyint||Column|Indicates the method of expenditure data creation.
+MySQL-001|vg_meps01|h213a|INPCFLG|int||Column|Indicates presence in 2019 Prescribed Medicines file: 0=Not present, 1=Present.
+MySQL-001|vg_meps01|h213a|LINKIDX|bigint||Column|Unique identifier for linking to condition or other event files.
+MySQL-001|vg_meps01|h213a|PANEL|tinyint||Column|Number identifying the survey panel (1 to 6) in the MEPS HC-213A 2019 dataset.
+MySQL-001|vg_meps01|h213a|PCIMPFLG|tinyint||Column|Type of Health Component (HC) to Pharmacy Component (PC) prescription match.
+MySQL-001|vg_meps01|h213a|PERWT19F|float||Column|Weighting factor for 2019 expenditure data; values range from 633.006358 to 77911.063903.
+MySQL-001|vg_meps01|h213a|PHARTP1|smallint||Column|Defines the type of pharmacy provider for the first prescription: mail-order, in-store, HMO/clinic/hospital, online, etc.
+MySQL-001|vg_meps01|h213a|PHARTP2|smallint||Column|Categorical variable defining type of pharmacy provider for the 2nd instance. Values: 1=MAIL-ORDER, 2=IN ANOTHER STORE, 3=IN HMO/CLINIC/HOSPITAL, 4=DRUG STORE, 5=ON-LINE.
+MySQL-001|vg_meps01|h213a|PHARTP3|smallint||Column|Type of pharmacy provision (3rd occurrence), coded as: 1=Mail-order, 2=Another store, 3=HMO/clinic/hospital, 4=Drug store, 5=Online.
+MySQL-001|vg_meps01|h213a|PHARTP4|smallint||Column|Identifies the 4th type of pharmacy provider (e.g., mail-order, drug store). Values include: 1=mail-order, 2=in another store, 3=in HMO/clinic/hospital, 4=drug store, 5=online
+MySQL-001|vg_meps01|h213a|PHARTP5|smallint||Column|Type of pharmacy provider: 5th instance; values include mail-order, other store, HMO/clinic/hospital, drug store.
+MySQL-001|vg_meps01|h213a|PHARTP6|smallint||Column|Type of pharmacy provider - 6th: Mail-order, in another store, HMO/clinic/hospital, drug store, or online.
+MySQL-001|vg_meps01|h213a|PHARTP7|smallint||Column|Type of pharmacy (7th): 1=Mail-order, 3=HMO/clinic/hospital, 4=Drug store.
+MySQL-001|vg_meps01|h213a|PID|int||Column|Unique identifier for a person within a household, ranging from 101 to 503.
+MySQL-001|vg_meps01|h213a|PREGCAT|nvarchar||Column|Internal Server Error
+MySQL-001|vg_meps01|h213a|PURCHRD|tinyint||Column|'' indicates the weighted count of drug purchases.
+MySQL-001|vg_meps01|h213a|RXBEGMM|smallint||Column|Month the person started taking medication (format: 3.0 Num).
+MySQL-001|vg_meps01|h213a|RXBEGYRX|smallint||Column|Imputed year prescription began; weighted by PERWT19F.
+MySQL-001|vg_meps01|h213a|RXDAYSUP|smallint||Column|Imputed number of days supplied for prescribed medication.
+MySQL-001|vg_meps01|h213a|RXDRGNAM|nvarchar||Column|Multum medication name, imputed, 60-character alphanumeric string.
+MySQL-001|vg_meps01|h213a|RXFLG|tinyint||Column|NDC imputation source from PC donor record; indicates data origin.
+MySQL-001|vg_meps01|h213a|RXFORM|nvarchar||Column|Char (50.0): Dosage form of medication, imputed, weighted by PERWT19F.
+MySQL-001|vg_meps01|h213a|RXFRMUNT|nvarchar||Column|Unit of medication quantity, imputed; values include alphanumeric characters, "DK", or "CANNOT BE COMPUTED".
+MySQL-001|vg_meps01|h213a|RXMD19X|float||Column|Amount paid by other insurance for prescribed medications (imputed).
+MySQL-001|vg_meps01|h213a|RXMR19X|float||Column|Amount paid by Medicare (imputed) in dollars, ranges from $0.01 to $23,136.88
+MySQL-001|vg_meps01|h213a|RXNAME|nvarchar||Column|60.0-character imputed field capturing medicine names in A-Z format; values may be missing (indicated by "-15").
+MySQL-001|vg_meps01|h213a|RXNDC|bigint||Column|Unique National Drug Code identifier for prescribed medications.
+MySQL-001|vg_meps01|h213a|RXOF19X|float||Column|Amount paid by other insurance for prescribed medicines, imputed.
+MySQL-001|vg_meps01|h213a|RXOT19X|float||Column|Amount paid by other insurance, imputed; numeric format (7.2); values range from $0.20 to $4,229.30.
+MySQL-001|vg_meps01|h213a|RXPV19X|float||Column|Amount paid by private insurance for prescribed medicines, in dollars (imputed).
+MySQL-001|vg_meps01|h213a|RXQUANTY|float||Column|Imputed numeric field (7.2) representing prescribed medication quantity; range 0.05 to 6399.
+MySQL-001|vg_meps01|h213a|RXRECIDX|bigint||Column|Unique identifier for prescribed medications, 19-character alphanumeric with imputed values.
+MySQL-001|vg_meps01|h213a|RXSF19X|float||Column|Amount paid by self or family for prescribed medicines (imputed).
+MySQL-001|vg_meps01|h213a|RXSL19X|float||Column|Amount paid by state and local government for prescribed medicines (imputed), 8.2 numeric format.
+MySQL-001|vg_meps01|h213a|RXSTRENG|nvarchar||Column|Imputed strength of medication, 50-character format, with values ranging from 'A-Z' or undefined ('-15 CANNOT BE COMPUTED').
+MySQL-001|vg_meps01|h213a|RXSTRUNT|nvarchar||Column|Unit of medication strength, imputed, with values ranging from A-Z and -15 for "CANNOT BE COMPUTED.
+MySQL-001|vg_meps01|h213a|RXTR19X|float||Column|Amount paid by TRICARE, imputed, in ranges from $0.03 to $20,484.48.
+MySQL-001|vg_meps01|h213a|RXVA19X|float||Column|Amount paid by other insurance, imputed (numeric, 7.2).
+MySQL-001|vg_meps01|h213a|RXWC19X|float||Column|**Amount paid by Workers Compensation (imputed), numeric value.**
+MySQL-001|vg_meps01|h213a|RXXP19X|float||Column|Sum of payments from self/family minus other insurance (imputed).
+MySQL-001|vg_meps01|h213a|TC1|smallint||Column|Therapeutic class category for multiple medications.
+MySQL-001|vg_meps01|h213a|TC1S1|smallint||Column|Therapeutic subclass for '', defining specific drug categories.
+MySQL-001|vg_meps01|h213a|TC1S1_1|smallint||Column|Sub-sub-class for multiple therapeutic classifications, specifically for T C1S1.
+MySQL-001|vg_meps01|h213a|TC1S1_2|smallint||Column|Multi-therapeutic sub-sub-class for therapeutic sub-class #1.
+MySQL-001|vg_meps01|h213a|TC1S2|smallint||Column|Therapeutic subclass #2 under therapeutic class C1.
+MySQL-001|vg_meps01|h213a|TC1S2_1|smallint||Column|Represents therapeutic sub-sub-class for multiple tumor classifications (T C1S2).
+MySQL-001|vg_meps01|h213a|TC1S3|smallint||Column|Therapeutic subclass #3 for therapeutic class C1, values: -1 (inapplicable), 67 (anxiolytics, sedatives, hypnotics).
+MySQL-001|vg_meps01|h213a|TC1S3_1|smallint||Column|Multum therapeutic sub-sub-class for therapeutic sub-class #3, with defined values.
+MySQL-001|vg_meps01|h213a|TC2|smallint||Column|Multiclass therapeutic classification #2 for tumor-related drugs.
+MySQL-001|vg_meps01|h213a|TC2S1|smallint||Column|Therapeutic Subclass #1 for Class #2 in Multum Classification System.
+MySQL-001|vg_meps01|h213a|TC2S1_1|smallint||Column|MUL TUM therapeutic sub-sub-class for therapeutic sub-class #1 of class #2.
+MySQL-001|vg_meps01|h213a|TC2S1_2|smallint||Column|Sub-subclass for therapeutic class #2, subclass #1.
+MySQL-001|vg_meps01|h213a|TC2S2|smallint||Column|Therapeutic subclass #2 for class #2; values include INAPPLICABLE and ANTIPSORIATICS.
+MySQL-001|vg_meps01|h213a|TC3|smallint||Column|Therapeutic class #3, primarily includes nutritional and immunologic agents.
+MySQL-001|vg_meps01|h213a|TC3S1|smallint||Column|Therapeutic subclass #1 for therapeutic class #3 (immunosuppressive agents, minerals, and electrolytes).
+MySQL-001|vg_meps01|h213a|TC3S1_1|smallint||Column|Sub-subclass for multum therapeutic classification #1 under class #3.
+MySQL-001|vg_meps01|h213a|VARPSU|int||Column|Variable '': Represents the primary sampling unit (PSU) identifier for survey data analysis.
+MySQL-001|vg_meps01|h213a|VARSTR|smallint||Column|A 4-digit numerical code representing variance estimation strata for 2019.
+MySQL-001|vg_meps01|h213a||||Table|This is the pharmacy utilization table. Use DUPERSID for unique person identifier. This table can have multiple rows per identifier.
+MySQL-001|vg_meps01|h213a|PREGCAT|nvarchar|N|Value|
+MySQL-001|vg_meps01|h213a|PREGCAT|nvarchar|C|Value|
+MySQL-001|vg_meps01|h213a|PREGCAT|nvarchar|B|Value|
+MySQL-001|vg_meps01|h213a|PREGCAT|nvarchar|D|Value|
+MySQL-001|vg_meps01|h213a|PREGCAT|nvarchar|X|Value|
+MySQL-001|vg_meps01|h213a|PREGCAT|nvarchar|-15|Value|
+MySQL-001|vg_meps01|h213a|PREGCAT|nvarchar|A|Value|
+MySQL-001|vg_meps01|h213a|RXDRGNAM|nvarchar|ATORVASTATIN|Value|
+MySQL-001|vg_meps01|h213a|RXDRGNAM|nvarchar|LEVOTHYROXINE|Value|
+MySQL-001|vg_meps01|h213a|RXDRGNAM|nvarchar|LISINOPRIL|Value|
+MySQL-001|vg_meps01|h213a|RXDRGNAM|nvarchar|METFORMIN|Value|
+MySQL-001|vg_meps01|h213a|RXDRGNAM|nvarchar|AMLODIPINE|Value|
+MySQL-001|vg_meps01|h213a|RXDRGNAM|nvarchar|METOPROLOL|Value|
+MySQL-001|vg_meps01|h213a|RXDRGNAM|nvarchar|-15|Value|
+MySQL-001|vg_meps01|h213a|RXDRGNAM|nvarchar|ALBUTEROL|Value|
+MySQL-001|vg_meps01|h213a|RXDRGNAM|nvarchar|OMEPRAZOLE|Value|
+MySQL-001|vg_meps01|h213a|RXDRGNAM|nvarchar|LOSARTAN|Value|
+MySQL-001|vg_meps01|h213a|RXDRGNAM|nvarchar|GABAPENTIN|Value|
+MySQL-001|vg_meps01|h213a|RXDRGNAM|nvarchar|HYDROCHLOROTHIAZIDE|Value|
+MySQL-001|vg_meps01|h213a|RXDRGNAM|nvarchar|SIMVASTATIN|Value|
+MySQL-001|vg_meps01|h213a|RXDRGNAM|nvarchar|SERTRALINE|Value|
+MySQL-001|vg_meps01|h213a|RXDRGNAM|nvarchar|ACETAMINOPHEN-HYDROCODONE|Value|
+MySQL-001|vg_meps01|h213a|RXFORM|nvarchar|TABS|Value|
+MySQL-001|vg_meps01|h213a|RXFORM|nvarchar|CAPS|Value|
+MySQL-001|vg_meps01|h213a|RXFORM|nvarchar|-15|Value|
+MySQL-001|vg_meps01|h213a|RXFORM|nvarchar|TB24|Value|
+MySQL-001|vg_meps01|h213a|RXFORM|nvarchar|SOLN|Value|
+MySQL-001|vg_meps01|h213a|RXFORM|nvarchar|TAB|Value|
+MySQL-001|vg_meps01|h213a|RXFORM|nvarchar|CPDR|Value|
+MySQL-001|vg_meps01|h213a|RXFORM|nvarchar|AERS|Value|
+MySQL-001|vg_meps01|h213a|RXFORM|nvarchar|TBEC|Value|
+MySQL-001|vg_meps01|h213a|RXFORM|nvarchar|SOPN|Value|
+MySQL-001|vg_meps01|h213a|RXFORM|nvarchar|CP24|Value|
+MySQL-001|vg_meps01|h213a|RXFORM|nvarchar|SUSP|Value|
+MySQL-001|vg_meps01|h213a|RXFORM|nvarchar|TBCR|Value|
+MySQL-001|vg_meps01|h213a|RXFORM|nvarchar|STRP|Value|
+MySQL-001|vg_meps01|h213a|RXFORM|nvarchar|MISC|Value|
+MySQL-001|vg_meps01|h213a|RXFRMUNT|nvarchar|EA|Value|
+MySQL-001|vg_meps01|h213a|RXFRMUNT|nvarchar|-15|Value|
+MySQL-001|vg_meps01|h213a|RXFRMUNT|nvarchar|ML|Value|
+MySQL-001|vg_meps01|h213a|RXFRMUNT|nvarchar|GM|Value|
+MySQL-001|vg_meps01|h213a|RXFRMUNT|nvarchar|G|Value|
+MySQL-001|vg_meps01|h213a|RXFRMUNT|nvarchar|-8|Value|
+MySQL-001|vg_meps01|h213a|RXFRMUNT|nvarchar|test strips|Value|
+MySQL-001|vg_meps01|h213a|RXFRMUNT|nvarchar|Pen needles|Value|
+MySQL-001|vg_meps01|h213a|RXFRMUNT|nvarchar|lancets|Value|
+MySQL-001|vg_meps01|h213a|RXFRMUNT|nvarchar|OTHER|Value|
+MySQL-001|vg_meps01|h213a|RXFRMUNT|nvarchar|SRN|Value|
+MySQL-001|vg_meps01|h213a|RXFRMUNT|nvarchar|INH|Value|
+MySQL-001|vg_meps01|h213a|RXFRMUNT|nvarchar|GELC|Value|
+MySQL-001|vg_meps01|h213a|RXFRMUNT|nvarchar|Device|Value|
+MySQL-001|vg_meps01|h213a|RXFRMUNT|nvarchar|NDL|Value|
+MySQL-001|vg_meps01|h213a|RXNAME|nvarchar|ATORVASTATIN|Value|
+MySQL-001|vg_meps01|h213a|RXNAME|nvarchar|LISINOPRIL|Value|
+MySQL-001|vg_meps01|h213a|RXNAME|nvarchar|METFORMIN|Value|
+MySQL-001|vg_meps01|h213a|RXNAME|nvarchar|LEVOTHYROXIN|Value|
+MySQL-001|vg_meps01|h213a|RXNAME|nvarchar|AMLODIPINE|Value|
+MySQL-001|vg_meps01|h213a|RXNAME|nvarchar|OMEPRAZOLE|Value|
+MySQL-001|vg_meps01|h213a|RXNAME|nvarchar|LOSARTAN POT|Value|
+MySQL-001|vg_meps01|h213a|RXNAME|nvarchar|GABAPENTIN|Value|
+MySQL-001|vg_meps01|h213a|RXNAME|nvarchar|HYDROCHLOROT|Value|
+MySQL-001|vg_meps01|h213a|RXNAME|nvarchar|SIMVASTATIN|Value|
+MySQL-001|vg_meps01|h213a|RXNAME|nvarchar|ALBUTEROL|Value|
+MySQL-001|vg_meps01|h213a|RXNAME|nvarchar|SERTRALINE|Value|
+MySQL-001|vg_meps01|h213a|RXNAME|nvarchar|METOPROL TAR|Value|
+MySQL-001|vg_meps01|h213a|RXNAME|nvarchar|METOPROL SUC|Value|
+MySQL-001|vg_meps01|h213a|RXNAME|nvarchar|FUROSEMIDE|Value|
+MySQL-001|vg_meps01|h213a|RXSTRENG|nvarchar|-15|Value|
+MySQL-001|vg_meps01|h213a|RXSTRENG|nvarchar|10|Value|
+MySQL-001|vg_meps01|h213a|RXSTRENG|nvarchar|20|Value|
+MySQL-001|vg_meps01|h213a|RXSTRENG|nvarchar|100|Value|
+MySQL-001|vg_meps01|h213a|RXSTRENG|nvarchar|40|Value|
+MySQL-001|vg_meps01|h213a|RXSTRENG|nvarchar|50|Value|
+MySQL-001|vg_meps01|h213a|RXSTRENG|nvarchar|5|Value|
+MySQL-001|vg_meps01|h213a|RXSTRENG|nvarchar|25|Value|
+MySQL-001|vg_meps01|h213a|RXSTRENG|nvarchar|500|Value|
+MySQL-001|vg_meps01|h213a|RXSTRENG|nvarchar|1|Value|
+MySQL-001|vg_meps01|h213a|RXSTRENG|nvarchar|300|Value|
+MySQL-001|vg_meps01|h213a|RXSTRENG|nvarchar|75|Value|
+MySQL-001|vg_meps01|h213a|RXSTRENG|nvarchar|108|Value|
+MySQL-001|vg_meps01|h213a|RXSTRENG|nvarchar|150|Value|
+MySQL-001|vg_meps01|h213a|RXSTRENG|nvarchar|200|Value|
+MySQL-001|vg_meps01|h213a|RXSTRUNT|nvarchar|MG|Value|
+MySQL-001|vg_meps01|h213a|RXSTRUNT|nvarchar|-15|Value|
+MySQL-001|vg_meps01|h213a|RXSTRUNT|nvarchar|MCG|Value|
+MySQL-001|vg_meps01|h213a|RXSTRUNT|nvarchar|%|Value|
+MySQL-001|vg_meps01|h213a|RXSTRUNT|nvarchar|MCG/ACT|Value|
+MySQL-001|vg_meps01|h213a|RXSTRUNT|nvarchar|MG/ML|Value|
+MySQL-001|vg_meps01|h213a|RXSTRUNT|nvarchar|UNIT/ML|Value|
+MySQL-001|vg_meps01|h213a|RXSTRUNT|nvarchar|UNIT|Value|
+MySQL-001|vg_meps01|h213a|RXSTRUNT|nvarchar|MEQ|Value|
+MySQL-001|vg_meps01|h213a|RXSTRUNT|nvarchar|GM|Value|
+MySQL-001|vg_meps01|h213a|RXSTRUNT|nvarchar|MG/MG|Value|
+MySQL-001|vg_meps01|h213a|RXSTRUNT|nvarchar|MG/24HR|Value|
+MySQL-001|vg_meps01|h213a|RXSTRUNT|nvarchar|MG/GM|Value|
+MySQL-001|vg_meps01|h213a|RXSTRUNT|nvarchar|MCG/ML|Value|
+MySQL-001|vg_meps01|h213a|RXSTRUNT|nvarchar|MCG/HR|Value|
+MySQL-001|vg_meps01|h214|ACCDNWRK|smallint||Column|Indicates if an accident occurred at work: 1=Yes, 2=No, 3=Does not work.
+MySQL-001|vg_meps01|h214|AGEDIAG|smallint||Column|Age at which the condition was diagnosed, ranging from 0 to 85 years.
+MySQL-001|vg_meps01|h214|CCSR1X|nvarchar||Column|Clinical classification refined code 1, edited (e.g., BLD000, CIR000).
+MySQL-001|vg_meps01|h214|CCSR2X|nvarchar||Column|Secondary edited clinical classification refined code for medical conditions.
+MySQL-001|vg_meps01|h214|CCSR3X|nvarchar||Column|Refined clinical classification code, third edited version, with defined values (e.g., INF000 - INF011).
+MySQL-001|vg_meps01|h214|CONDIDX|bigint||Column|Unique identifier for a specific medical condition in the dataset.
+MySQL-001|vg_meps01|h214|CONDN|int||Column|Unique identifier for a medical condition, ranging from 1 to 54.
+MySQL-001|vg_meps01|h214|CONDRN|int||Column|Round number in which the medical condition was first reported (values: 1 to 5).
+MySQL-001|vg_meps01|h214|CRND1|smallint||Column|Indicates presence of condition information in Round 1 with values: -1 (inapplicable), 0 (no), 1 (yes).
+MySQL-001|vg_meps01|h214|CRND2|smallint||Column|Indicates presence of condition information in Round 2 (-1: inapplicable, 0: no, 1: yes)
+MySQL-001|vg_meps01|h214|CRND3|int||Column|Values: 0 (NO), 1 (YES) - indicates condition information presence in Round 3.
+MySQL-001|vg_meps01|h214|CRND4|int||Column|Indicates presence of condition information in Round 4: -1=Inapplicable, 0=No, 1=Yes.
+MySQL-001|vg_meps01|h214|CRND5|int||Column|Indicates whether condition information is available in Round 5 (1=Yes, 0=No).
+MySQL-001|vg_meps01|h214|DUID|int||Column|Internal Server Error
+MySQL-001|vg_meps01|h214|DUPERSID|bigint||Column|Unique identifier combining DUID and PID for individuals in the MEPS dataset.
+MySQL-001|vg_meps01|h214|ERNUM|int||Column|Number of emergency room events associated with a condition.
+MySQL-001|vg_meps01|h214|HHNUM|int||Column|Number of home health events associated with a condition, ranging from 0 to 60.
+MySQL-001|vg_meps01|h214|ICD10CDX|nvarchar||Column|Represents the ICD-10-CM code for a medical condition, formatted as 3-character string.
+MySQL-001|vg_meps01|h214|INJURY|int||Column|Binary indicator (1=yes, 2=no) if condition resulted from accident/injury.
+MySQL-001|vg_meps01|h214|IPNUM|int||Column|Number of inpatient events associated with the condition, ranging from 0 to 6.
+MySQL-001|vg_meps01|h214|OBNUM|int||Column|Number of office-based medical events associated with a condition, range 0-499.
+MySQL-001|vg_meps01|h214|OPNUM|int||Column|Number of outpatient events associated with a condition (0 to 141).
+MySQL-001|vg_meps01|h214|PANEL|int||Column|Identifies the survey panel number (e.g., 23 or 24) for data collection.
+MySQL-001|vg_meps01|h214|PERWT19F|float||Column|Weight for 2019 expenditure file, used for survey weighting.
+MySQL-001|vg_meps01|h214|PID|int||Column|Unique identifier for an individual within a panel.
+MySQL-001|vg_meps01|h214|RXNUM|int||Column|Number of prescribed medicines associated with a condition.
+MySQL-001|vg_meps01|h214|VARPSU|int||Column|Variance estimation primary sampling unit (PSU), specific to 2019 data.
+MySQL-001|vg_meps01|h214|VARSTR|smallint||Column|A stratum identifier used for variance estimation in the 2019 dataset.
+MySQL-001|vg_meps01|h214||||Table|Use DUPERSID for unique person identifier
+MySQL-001|vg_meps01|h214|CCSR1X|nvarchar|CIR007|Value|
+MySQL-001|vg_meps01|h214|CCSR1X|nvarchar|END010|Value|
+MySQL-001|vg_meps01|h214|CCSR1X|nvarchar|MUS010|Value|
+MySQL-001|vg_meps01|h214|CCSR1X|nvarchar|END002|Value|
+MySQL-001|vg_meps01|h214|CCSR1X|nvarchar|MBD005|Value|
+MySQL-001|vg_meps01|h214|CCSR1X|nvarchar|END001|Value|
+MySQL-001|vg_meps01|h214|CCSR1X|nvarchar|MBD002|Value|
+MySQL-001|vg_meps01|h214|CCSR1X|nvarchar|MUS006|Value|
+MySQL-001|vg_meps01|h214|CCSR1X|nvarchar|RSP009|Value|
+MySQL-001|vg_meps01|h214|CCSR1X|nvarchar|DIG004|Value|
+MySQL-001|vg_meps01|h214|CCSR1X|nvarchar|NVS016|Value|
+MySQL-001|vg_meps01|h214|CCSR1X|nvarchar|SYM016|Value|
+MySQL-001|vg_meps01|h214|CCSR1X|nvarchar|MUS011|Value|
+MySQL-001|vg_meps01|h214|CCSR1X|nvarchar|INJ067|Value|
+MySQL-001|vg_meps01|h214|CCSR1X|nvarchar|RSP006|Value|
+MySQL-001|vg_meps01|h214|CCSR2X|nvarchar|-1|Value|
+MySQL-001|vg_meps01|h214|CCSR2X|nvarchar|END005|Value|
+MySQL-001|vg_meps01|h214|CCSR2X|nvarchar|MUS038|Value|
+MySQL-001|vg_meps01|h214|CCSR2X|nvarchar|RSP006|Value|
+MySQL-001|vg_meps01|h214|CCSR2X|nvarchar|RSP007|Value|
+MySQL-001|vg_meps01|h214|CCSR2X|nvarchar|SKN007|Value|
+MySQL-001|vg_meps01|h214|CCSR2X|nvarchar|SYM010|Value|
+MySQL-001|vg_meps01|h214|CCSR2X|nvarchar|INF008|Value|
+MySQL-001|vg_meps01|h214|CCSR2X|nvarchar|NVS016|Value|
+MySQL-001|vg_meps01|h214|CCSR2X|nvarchar|SKN005|Value|
+MySQL-001|vg_meps01|h214|CCSR2X|nvarchar|FAC009|Value|
+MySQL-001|vg_meps01|h214|CCSR2X|nvarchar|SKN002|Value|
+MySQL-001|vg_meps01|h214|CCSR2X|nvarchar|INF003|Value|
+MySQL-001|vg_meps01|h214|CCSR2X|nvarchar|INF009|Value|
+MySQL-001|vg_meps01|h214|CCSR2X|nvarchar|INF000|Value|
+MySQL-001|vg_meps01|h214|CCSR3X|nvarchar|-1|Value|
+MySQL-001|vg_meps01|h214|CCSR3X|nvarchar|INF004|Value|
+MySQL-001|vg_meps01|h214|ICD10CDX|nvarchar|I10|Value|
+MySQL-001|vg_meps01|h214|ICD10CDX|nvarchar|-15|Value|
+MySQL-001|vg_meps01|h214|ICD10CDX|nvarchar|E78|Value|
+MySQL-001|vg_meps01|h214|ICD10CDX|nvarchar|E11|Value|
+MySQL-001|vg_meps01|h214|ICD10CDX|nvarchar|F41|Value|
+MySQL-001|vg_meps01|h214|ICD10CDX|nvarchar|M25|Value|
+MySQL-001|vg_meps01|h214|ICD10CDX|nvarchar|F32|Value|
+MySQL-001|vg_meps01|h214|ICD10CDX|nvarchar|J45|Value|
+MySQL-001|vg_meps01|h214|ICD10CDX|nvarchar|M19|Value|
+MySQL-001|vg_meps01|h214|ICD10CDX|nvarchar|M54|Value|
+MySQL-001|vg_meps01|h214|ICD10CDX|nvarchar|K21|Value|
+MySQL-001|vg_meps01|h214|ICD10CDX|nvarchar|G47|Value|
+MySQL-001|vg_meps01|h214|ICD10CDX|nvarchar|T78|Value|
+MySQL-001|vg_meps01|h214|ICD10CDX|nvarchar|M79|Value|
+MySQL-001|vg_meps01|h214|ICD10CDX|nvarchar|E07|Value|
+MySQL-001|vg_meps01|h216_narrow|ACTLIM31|smallint||Column|Indicates any limitation in work, housework, or school activities. Values: 1 = Yes, 2 = No.
+MySQL-001|vg_meps01|h216_narrow|ADACLS42|smallint||Column|Frequency of feeling accomplished less due to physical problems in the past 4 weeks (VR -12).
+MySQL-001|vg_meps01|h216_narrow|ADCLIM42|smallint||Column|Health limits climbing stairs (VR-12 scale, past 4 weeks).
+MySQL-001|vg_meps01|h216_narrow|ADDAYA42|smallint||Column|Delay in medical care due to cost (1=Yes, 2=No, -8=Don't Know, -7=Refused, -1=Inapplicable).
+MySQL-001|vg_meps01|h216_narrow|ADDPRS42|smallint||Column|Indicates if the respondent was a proxy in Round 4/2 (1 = RU member, 2 = proxy).
+MySQL-001|vg_meps01|h216_narrow|ADEFRT42|smallint||Column|Frequency of feeling everything required effort in the past 30 days (1=All, 6=None).
+MySQL-001|vg_meps01|h216_narrow|ADEMLS42|smallint||Column|Accomplished less in past 4 weeks due to mental health problems (VR -12 scale).
+MySQL-001|vg_meps01|h216_narrow|ADENGY42|smallint||Column|Value indicating dental care coverage by Indian Health Service during reference period R4/2 (2 = No, 1 = Yes).
+MySQL-001|vg_meps01|h216_narrow|ADGENH42|smallint||Column|An error occurred: Expecting value: line 5 column 1 (char 4)
+MySQL-001|vg_meps01|h216_narrow|ADHDADDX|smallint||Column|Diagnosis of ADHD or ADD (1 = Yes, 0 = No).
+MySQL-001|vg_meps01|h216_narrow|ADHDAGED|smallint||Column|Age at diagnosis of ADHD, ranging from 0 to 17 years.
+MySQL-001|vg_meps01|h216_narrow|ADHOPE42|smallint||Column|Frequency of feeling hopeless in the past 30 days (self-reported).
+MySQL-001|vg_meps01|h216_narrow|ADINSA42|smallint||Column|Variance estimation stratum identifier for 2019, numeric values ranging from 2001 to 2117.
+MySQL-001|vg_meps01|h216_narrow|ADINSB42|smallint||Column|2.0 Num; -8=Don't Know, -7=Refused, -1=Inapplicable, 1-5=Excellent to Poor health status evaluation.
+MySQL-001|vg_meps01|h216_narrow|ADINTR42|smallint||Column|Perceived health status (Round 4/2), values: 1=Excellent, 2=Very Good, 3=Good, 4=Fair, 5=Poor, -8=DK, -7=Refused, -1=Inapplicable.
+MySQL-001|vg_meps01|h216_narrow|ADLHLP31|smallint||Column|ADL (Activities of Daily Living) screener indicating assistance needed for daily tasks (Round 3/1).
+MySQL-001|vg_meps01|h216_narrow|ADMWCF42|smallint||Column|Frequency of feeling worthless in the past 30 days (self-reported).
+MySQL-001|vg_meps01|h216_narrow|ADNERV42|smallint||Column|Frequency of feeling nervous in the past 30 days, measured on a scale.
+MySQL-001|vg_meps01|h216_narrow|ADOVER42|smallint||Column|Belief in overcoming illnesses without medical assistance (self-assessment).
+MySQL-001|vg_meps01|h216_narrow|ADPAIN42|smallint||Column|Frequency of pain limiting normal work in the past 4 weeks (VR -12 scale).
+MySQL-001|vg_meps01|h216_narrow|ADPCFL42|smallint||Column|Frequency of feeling calm/peaceful in the past 4 weeks (values: -12 = missing, 1-5 = frequency scale).
+MySQL-001|vg_meps01|h216_narrow|ADPROX42|smallint||Column|Relationship of proxy respondent to adult (e.g., parent, spouse, other).
+MySQL-001|vg_meps01|h216_narrow|ADPRST42|smallint||Column|Frequency of feeling downhearted/depressed in the past 4 weeks (VR -12 scale).
+MySQL-001|vg_meps01|h216_narrow|ADREST42|smallint||Column|Frequency of feeling restless in the past 30 days.
+MySQL-001|vg_meps01|h216_narrow|ADRISK42|smallint||Column|Likelihood of taking risks, measured on a scale (values not explicitly defined in context).
+MySQL-001|vg_meps01|h216_narrow|ADSAD42|smallint||Column|Frequency of feeling sad in the past 30 days (self-reported).
+MySQL-001|vg_meps01|h216_narrow|ADSOCA42|smallint||Column|Health stopped social activities in past 4 weeks (VR -12 scale).
+MySQL-001|vg_meps01|h216_narrow|ADUPRO42|smallint||Column|Problem getting along with adults (ages 5-17), Round 4/2.
+MySQL-001|vg_meps01|h216_narrow|ADWKLM42|smallint||Column|Work limitations in past 4 weeks due to physical problems (VR-12 scale).
+MySQL-001|vg_meps01|h216_narrow|ADWRTH42|smallint||Column|Frequency of feeling worthless in the past 30 days (self-reported).
+MySQL-001|vg_meps01|h216_narrow|AGELAST|int||Column|Person's age at last eligibility period, categorized into ranges: 0-4, 5-17, 18-24, 25-44, 45-64, 65-85.
+MySQL-001|vg_meps01|h216_narrow|AIDHLP31|smallint||Column|ADL (Activities of Daily Living) assistance indicator for Round 3/1 (1 = Yes, 0 = No).
+MySQL-001|vg_meps01|h216_narrow|ANYLMI19|smallint||Column|Coverage by any private insurance in 2019. Values: 1 = Covered, 2 = Not covered.
+MySQL-001|vg_meps01|h216_narrow|ARTHAGED|smallint||Column|Age group in Round 3/1 (edited/imputed), weighted by PERWT19F, with values representing specific age ranges (e.g., 0-4, 5-17, etc.).
+MySQL-001|vg_meps01|h216_narrow|ARTHDX|smallint||Column|Arthritis diagnosis (>17); values: 1 = Yes, 2 = No, -8 = DK, -7 = Refused, -1 = Inapplicable.
+MySQL-001|vg_meps01|h216_narrow|ARTHTYPE|smallint||Column|Classification of person's arthritis type; values include specific diagnostic types or -1 for "Inapplicable".
+MySQL-001|vg_meps01|h216_narrow|ASACUT31|smallint||Column|Indicates military full-time active duty status in Round 3/1; values: -1 (inapplicable), 1 (active duty), 0 (not active duty).
+MySQL-001|vg_meps01|h216_narrow|ASATAK31|smallint||Column|Numeric, 3.0 format; -15: Cannot compute, -8: Don't know, -7: Refused, -1: Inapplicable, 1: Yes, 2: No.
+MySQL-001|vg_meps01|h216_narrow|ASDALY31|smallint||Column|Indicates if respondent currently takes preventive daily asthma medications. Values: -8=DK, -1=Inapplicable, 1=Yes, 2=No.
+MySQL-001|vg_meps01|h216_narrow|ASEVFL31|smallint||Column|Numeric variable indicating values: "-15: Cannot be computed," "-1: Inapplicable," "1-4: Specific categorical responses.
+MySQL-001|vg_meps01|h216_narrow|ASMRCN31|smallint||Column|12 months, asked non-VA provider to recommend other doctors: -15 Cannot compute, -1 Inapplicable, 1 Yes, 2 No.
+MySQL-001|vg_meps01|h216_narrow|ASPKFL31|smallint||Column|Variable describing whether the respondent asked for prescription refills in round 3/1 ():  -1: Inapplicable; 1: Yes; 2: No.
+MySQL-001|vg_meps01|h216_narrow|ASPREV31|smallint||Column|Awareness of VA services by a VA provider (Values: -1 Inapplicable; 1 Yes; 2 No).
+MySQL-001|vg_meps01|h216_narrow|ASSTIL31|smallint||Column|Indicates if assistive technology for daily activities was received; values: -15, -1, 1-Yes.
+MySQL-001|vg_meps01|h216_narrow|ASTHAGED|smallint||Column|Age at asthma diagnosis; values: -15 Cannot Compute, -8 DK, -7 Refused, -1 Inapplicable, 0-85 years.
+MySQL-001|vg_meps01|h216_narrow|ASTHDX|smallint||Column|'': Indicates asthma diagnosis status; values: -8 DK, -7 REFUSED, -1 INAPPLICABLE, 1 YES, 2 NO.
+MySQL-001|vg_meps01|h216_narrow|ASTHEP31|smallint||Column|Numeric; -15 Cannot compute, 1-3 perception of provider's help improving/maintaining health; -1 inapplicable.
+MySQL-001|vg_meps01|h216_narrow|ASWNFL31|smallint||Column|3.0 Num; Values: -15=Cannot compute, -1=Inapplicable, 1-4 different sources/no caregiving service received; Total=28,512.
+MySQL-001|vg_meps01|h216_narrow|BENDIF31|smallint||Column|Organizational type of business: -8=DON�T KNOW, -7=REFUSED, -2=PREVIOUSLY DETERMINED, -1=NOT APPLICABLE, 1=INCORPORATED, 2=PROPRIETORSHIP, 3=PARTNERSHIP.
+MySQL-001|vg_meps01|h216_narrow|CABLADDR|smallint||Column|Residential address for cable services; used to identify service location. Numeric format. Missing values: -1 Inapplicable, -8 DK.
+MySQL-001|vg_meps01|h216_narrow|CABREAST|smallint||Column|Value definitions: -1 Inapplicable, 1 Breastfed exclusively, 2 Mixed feeding (breast+formula), 3 Formula-fed exclusively.
+MySQL-001|vg_meps01|h216_narrow|CACERVIX|smallint||Column|Diagnosed cervical cancer status for individuals older than 17; values: -8=Don't Know, -7=Refused, -1=Inapplicable, 1=Yes, 2=No.
+MySQL-001|vg_meps01|h216_narrow|CACOLON|smallint||Column|Cancer diagnosis of colon cancer in individuals aged 17 and older.
+MySQL-001|vg_meps01|h216_narrow|CALUNG|smallint||Column|Diagnosed lung cancer, ages 18+; valid values: 1 (yes), 2 (no). Format: Numeric.
+MySQL-001|vg_meps01|h216_narrow|CALYMPH|smallint||Column|Cancer diagnosis for non-Hodgkin's lymphoma in individuals >17 years old; values: -8=DK, -7=Refused, -1=Inapplicable, 1=Yes, 2=No.
+MySQL-001|vg_meps01|h216_narrow|CAMELANO|smallint||Column|Cancer diagnosed as melanoma of the skin for individuals older than 17.
+MySQL-001|vg_meps01|h216_narrow|CANCERDX|smallint||Column|Indicates cancer diagnosis (>17); values: -15 Cannot Compute, -8 DK, -7 Refused, -1 Inapplicable, 1 Yes, 2 No.
+MySQL-001|vg_meps01|h216_narrow|CAOTHER|smallint||Column|Indicates if criteria categorize respondent as �other,� based on activity, location, and household/institutionalization status.
+MySQL-001|vg_meps01|h216_narrow|CAPROSTA|smallint||Column|Prostate cancer diagnosis status (>17); -1 INAPPLICABLE, -7 REFUSED, -8 DK, 1 YES, 2 NO.
+MySQL-001|vg_meps01|h216_narrow|CASKINDK|smallint||Column|'': Diagnosis of skin cancer of unknown type for individuals over 17 years old.
+MySQL-001|vg_meps01|h216_narrow|CASKINNM|smallint||Column|Indicates diagnosis of non-melanoma skin cancer for individuals over 17 years old.
+MySQL-001|vg_meps01|h216_narrow|CAUTERUS|smallint||Column|Total annual expenditures (facility and doctor) for outpatient visits in U.S. dollars.
+MySQL-001|vg_meps01|h216_narrow|CHAPPT42|smallint||Column|Child's appointment availability within last 12 months (0-17 years), during round 4/2 interview timeframe.
+MySQL-001|vg_meps01|h216_narrow|CHCOUN42|smallint||Column|Numeric; response indicates frequency child�s doctor explained things clearly in past 12 months (0-17): -1 (Inapplicable), -8 (Don�t Know), 1 (Never), 2 (Sometimes), 3 (Usually), 4 (Always).
+MySQL-001|vg_meps01|h216_narrow|CHEMPB42|smallint||Column|Child required prescription medication for health/behavioral condition in past 12 months (0-17).
+MySQL-001|vg_meps01|h216_narrow|CHEXPL42|smallint||Column|CAHPS: Past 12 months, child�s doctor explained things clearly for children aged 0-17 in round 4/2.
+MySQL-001|vg_meps01|h216_narrow|CHEYRE42_M18|smallint||Column|CAHPS: Past 12 months appointment for child specialist (ages 0-17); linked to Round 4/2.
+MySQL-001|vg_meps01|h216_narrow|CHHECR42|smallint||Column|Rating of child healthcare quality (0�10): -1 Inapplicable, -8 Don't know, 0 Worst, 10 Best, others intermediate.
+MySQL-001|vg_meps01|h216_narrow|CHILCR42|smallint||Column|Indicates if the person has children; -8=Don't know, -7=Refused, -1=Inapplicable, 1=Yes, 2=No.
+MySQL-001|vg_meps01|h216_narrow|CHILWW42|smallint||Column|Child received needed care for illness/injury when wanted in past 12 months; values: -8=DK, -1=Inapplicable, 1=Never, 2=Sometimes, 3=Usually, 4=Always.
+MySQL-001|vg_meps01|h216_narrow|CHLICO42|smallint||Column|CSHCN: Ongoing condition limits child's usual activities for 12+ months (0-17 years).
+MySQL-001|vg_meps01|h216_narrow|CHLIHB42|smallint||Column|Binary variable indicating if a child's health/behavior condition limited usual activities (-1=INAPPLICABLE, 1=YES, 2=NO).
+MySQL-001|vg_meps01|h216_narrow|CHLIMI42|smallint||Column|CSHCN: Indicates if the child (0-17) is limited in any way due to health/behavioral conditions in round 4/2.
+MySQL-001|vg_meps01|h216_narrow|CHLIST42|smallint||Column|CAHPS: Child's doctor listened in past 12 months (0-17 years). Values: 1 = Yes, 2 = No, -1 = Inapplicable.
+MySQL-001|vg_meps01|h216_narrow|CHPMCN42|smallint||Column|Indicator if a child had a medical condition requiring prescription medication for 12+ months (ages 0-17).
+MySQL-001|vg_meps01|h216_narrow|CHPMED42|smallint||Column|CSHCN: Indicates if child needs prescribed medication; values: 1=Yes, 2=No, -1=Inapplicable, -7=Refused, -8=DK.
+MySQL-001|vg_meps01|h216_narrow|CHPMHB42|smallint||Column|Prescribed medication for child�s health/behavior condition in last 12 months. Values: -1 Inapplicable, 1 Yes, 2 No.
+MySQL-001|vg_meps01|h216_narrow|CHPRTM42|smallint||Column|Time spent by child�s doctor during last 12 months (0-17 years), as perceived in Round 4/2.
+MySQL-001|vg_meps01|h216_narrow|CHRESP42|smallint||Column|Indicates how often the child's doctor showed responsiveness (0=Never, 4=Always, -8=Don't know, -1=Inapplicable).
+MySQL-001|vg_meps01|h216_narrow|CHRTCR42|smallint||Column|CAHPS:12-month measure assessing whether a routine care appointment for children (0-17) could be scheduled ("Yes/No").
+MySQL-001|vg_meps01|h216_narrow|CHRTWW42|smallint||Column|Child's ability to get a routine appointment when wanted within the last 12 months: -8=Don't know, -7=Refused, -1=Inapplicable, 1=Never, 2=Sometimes, 3=Usually, 4=Always.
+MySQL-001|vg_meps01|h216_narrow|CHSERV42|smallint||Column|Indicates if child needed medical/other services (0-17) in the last 12 months during round 4/2.
+MySQL-001|vg_meps01|h216_narrow|CHSPEC42_M18|smallint||Column|Appointment scheduling with a specialist for children (ages 0�17) in the last 12 months; CAHPS survey.
+MySQL-001|vg_meps01|h216_narrow|CHSRCN42|smallint||Column|CSHCN: Child's service needs for condition lasting 12+ months (0-17 years), specific to Round 4/2.
+MySQL-001|vg_meps01|h216_narrow|CHSRHB42|smallint||Column|CSHCN:SERV FOR HL TH/BEHV COND(0-17)-R4/2, 2.0 Num. Values: -8=Don't Know, -1=Inapplicable, 1=Yes, 2=No.
+MySQL-001|vg_meps01|h216_narrow|CHTHCO42|smallint||Column|CSHCN:THER COND LA ST 12+ MOS (0-17)-R4/2; Numeric: -8=DK, -1=Inapplicable, 1=Yes, 2=No
+MySQL-001|vg_meps01|h216_narrow|CHTHER42|smallint||Column|CSHCN: Child needs specific therapy (0-17), values: -15(Cannot Compute), -8(Don't Know), -7(Refused), -1(Inapplicable), 1(Yes), 2(No).
+MySQL-001|vg_meps01|h216_narrow|CHTHHB42|smallint||Column|CSHCN:SPEC THER FOR HL TH+COND(0-17)-R4/2; Values: -15 Cannot compute, -8 DK, -7 Refused, -1 Inapplicable, 1 Yes.
+MySQL-001|vg_meps01|h216_narrow|COGLIM31|smallint||Column|Coverage by cognitive limitations, any time in R3/1; values: -1=Inapplicable, 1=Yes, 2=No.
+MySQL-001|vg_meps01|h216_narrow|CSHCN42|smallint||Column|Child with special health care needs (CSHCN), aged 0-17 in Round 4/2 interview.
+MySQL-001|vg_meps01|h216_narrow|DADPRO42|smallint||Column|Problems getting along with father (ages 5-17); values: -15 Cannot compute, -8 DK, -7 Refused, 0 No problem, 1-4 Severity.
+MySQL-001|vg_meps01|h216_narrow|DFCOG42|smallint||Column|Cognitive difficulties in reasoning, learning, or remembering; values: -8 (Don't Know), -7 (Refused), -1 (Inapplicable), 1 (Yes), 2 (No).
+MySQL-001|vg_meps01|h216_narrow|DFDRSB42|smallint||Column|Indicator for difficulty dressing/bathing: -8=Don�t Know, -7=Refused, -1=Inapplicable, 1=Yes, 2=No.
+MySQL-001|vg_meps01|h216_narrow|DFERND42|smallint||Column|Difficulty doing errands alone (round 4/2). Values: -8: Don't know, -7: Refused, -1: Inapplicable, 1: Yes, 2: No.
+MySQL-001|vg_meps01|h216_narrow|DFHEAR42|smallint||Column|Indicates serious difficulty hearing: -1 Inapplicable, 1 Yes, 2 No, -7 Refused, -8 Don't know.
+MySQL-001|vg_meps01|h216_narrow|DFSEE42|smallint||Column|Serious difficulty seeing even with glasses during reporting round 4/2; numeric format, coded -1, 1, or 2.
+MySQL-001|vg_meps01|h216_narrow|DFWLKC42|smallint||Column|Numeric code indicating "Person who joined RU" during a specific period; weighted by PERWT19F for population estimates.
+MySQL-001|vg_meps01|h216_narrow|DIABAGED|smallint||Column|Age at diabetes diagnosis; values: -8 (Don't know), -7 (Refused), -1 (Inapplicable), 0�85 (Age in years).
+MySQL-001|vg_meps01|h216_narrow|DIABDX_M18|smallint||Column|Indicator of diabetes diagnosis status, coded as: 1 (Diagnosed), 2 (Not Diagnosed), -1 (Inapplicable). Num format, length 2.
+MySQL-001|vg_meps01|h216_narrow|DOBMM|int||Column|Month of birth, numeric format ranging from 01 (January) to 12 (December)
+MySQL-001|vg_meps01|h216_narrow|DOBYY|smallint||Column|Year of birth; values: -15 (cannot compute), -8 (don�t know), -7 (refused), -1 (inapplicable), 1�12 (month mapping).
+MySQL-001|vg_meps01|h216_narrow|DUID|bigint||Column|Unique household or person-level identification number for survey data linkage within the MEPS dataset.
+MySQL-001|vg_meps01|h216_narrow|DUPERSID|bigint||Column|Unique identifier assigned to each person-level record in the dataset.
+MySQL-001|vg_meps01|h216_narrow|ELGRND19|int||Column|Indicator for electric coverage in 2019; values: -1=Inapplicable, 1=Yes, 2=No. Numeric, starts at position 2065.
+MySQL-001|vg_meps01|h216_narrow|FAMINC19|int||Column|Total family income in 2019; numeric monetary value representing cumulative income of all family members.
+MySQL-001|vg_meps01|h216_narrow|FNGRDF31|smallint||Column|Perceived general health status, coded as: -8 DK, -7 Refused, -1 Inapplicable, 1 Excellent, 2 Very Good, 3 Good, 4 Fair, 5 Poor.
+MySQL-001|vg_meps01|h216_narrow|GETTRB42|smallint||Column|Indicator for receiving treatment from provider (R4/2). Values: -8=Don't Know, -7=Refused, -1=Inapplicable, 1=Yes, 2=No.
+MySQL-001|vg_meps01|h216_narrow|HAVFUN42|smallint||Column|Indicator for enjoying activities in past week; values: -8 (DK), -7 (Refused), -1 (Inapplicable), 1 (Yes), 2 (No).
+MySQL-001|vg_meps01|h216_narrow|HISPANX|int||Column|Numeric, 1=Hispanic, 2=Non-Hispanic, -1=Inapplicable; demographic identifier weighted by PERWT19F.
+MySQL-001|vg_meps01|h216_narrow|HOMEBH42|smallint||Column|Reports problems with behavior at home for individuals aged 5-17 during round 4/2 interview period.
+MySQL-001|vg_meps01|h216_narrow|HSELIM31|smallint||Column|Health condition impacts ability to perform daily living tasks/household activities - Round 3/1
+MySQL-001|vg_meps01|h216_narrow|IADLHP31|smallint||Column|Identifies if the respondent requires help with Instrumental Activities of Daily Living (IADL) in round 3/1.
+MySQL-001|vg_meps01|h216_narrow|INSCOV19|int||Column|HEALTH INSURANCE COVERAGE INDICATOR 2019 (value of 1: Any Private, 2: Public Only, 3: Uninsured)
+MySQL-001|vg_meps01|h216_narrow|INSURC19|int||Column|FULL YEAR INSURANCE COVERAGE STATUS 2019 (value of 1: <65 Any Private, 2: <65 Public Only, 3: <65 Uninsured, 4-6: 65+ Medicare, 7-8: 65+ Uninsured)
+MySQL-001|vg_meps01|h216_narrow|JTPAIN31_M18|smallint||Column|JOINT PAIN LAST 12 MONTHS (>17)  (value of 1: Yes, 2: No)
+MySQL-001|vg_meps01|h216_narrow|K6SUM42|smallint||Column|Summary measure of K6 psychological distress scale; range 0-24, with higher scores indicating greater psychological distress.
+MySQL-001|vg_meps01|h216_narrow|KIDPRO42|smallint||Column|Numeric (2.0); Response regarding child's provider explanation (-8=Don't know; -7=Refused; -1=Inapplicable; 1=Yes; 2=No).
+MySQL-001|vg_meps01|h216_narrow|LFTDIF31|smallint||Column|Difficulty lifting 10 pounds - RD 3/1; -8: DK, -7: Refused, -1: Inapplicable, 1: No difficulty, 2: Some, 3: A lot, 4: Unable.
+MySQL-001|vg_meps01|h216_narrow|LSTETH53|smallint||Column|Last month/year the individual served in active-duty military, value -1=Inapplicable, 1-12=Month.
+MySQL-001|vg_meps01|h216_narrow|MILDIF31|smallint||Column|Numeric (2.0). Start 408, End 409. Values: -8=DK, -1=Inapplicable, 1=Yes, 2=No. Weighted by PERWT19F.
+MySQL-001|vg_meps01|h216_narrow|MOMPRO42|smallint||Column|Indicator of whether provider explained options: -8=Don't Know, -7=Refused, -1=Inapplicable, 1=Yes, 2=No.
+MySQL-001|vg_meps01|h216_narrow|NERVAF42|smallint||Column|Numeric, response values: -8=Don't Know, -7=Refused, -1=Inapplicable, 1=Yes, 2=No.
+MySQL-001|vg_meps01|h216_narrow|OFTSMK53|smallint||Column|Frequency of smoking in round 5/3; values: -1 INAPPLICABLE, 1 DAILY, 2 SOME DAYS, 3 NOT AT ALL.
+MySQL-001|vg_meps01|h216_narrow|PANEL|int||Column|Indicator for specific survey panel, numeric, values correspond to panel IDs (e.g., 1 for panel one, etc.).
+MySQL-001|vg_meps01|h216_narrow|PHQ242|smallint||Column|Overall rating of feelings over the past two weeks, assessed through a structured self-assessment question.
+MySQL-001|vg_meps01|h216_narrow|PHYEXE53|smallint||Column|Physical exercise participation; values: -1 INAPPLICABLE, 1 YES, 2 NO.
+MySQL-001|vg_meps01|h216_narrow|PID|int||Column|Person Number
+MySQL-001|vg_meps01|h216_narrow|POVCAT19|int||Column|FAMILY INC AS % OF POVERTY LINE (value 1: Poor/Negative, 2: Near Poor, 3: Low Income, 4: Middle, 5: High Income)
+MySQL-001|vg_meps01|h216_narrow|POVLEV19|float||Column|FAMILY INC AS % OF POVERTY LINE - CONTINUOUS Variable
+MySQL-001|vg_meps01|h216_narrow|RACEV2X|int||Column|Race (value of 1: White, 2: Black, 3: Amer Indian, 4: Asian Indian, 5: Chinese, 6: Filipino, 10: Other Asian, 12: Multiple races)
+MySQL-001|vg_meps01|h216_narrow|RCHDIF31|smallint||Column|Indicator for difficulty receiving needed medical care: -8=Don't know, -7=Refused, -1=Inapplicable, 1=Yes, 2=No.
+MySQL-001|vg_meps01|h216_narrow|RXEXP19|int||Column|Total prescription expenses for 2019 in dollars, ranging from $0 to $234,970.
+MySQL-001|vg_meps01|h216_narrow|SAQELIG|tinyint||Column|Indicator of eligibility for the Self-Administered Questionnaire (SAQ); binary variable denoting participation status.
+MySQL-001|vg_meps01|h216_narrow|SCHLBH42|smallint||Column|Indicates if school attendance was hindered in round 4/2: -1 Inapplicable, 1 Yes, 2 No, etc.
+MySQL-001|vg_meps01|h216_narrow|SCHLIM31|smallint||Column|SCHOOL LIMITATION - RD 3/1: 3.0 Num; -1 INAPPLICABLE, 1 YES, 2 NO; Start 549, End 551.
+MySQL-001|vg_meps01|h216_narrow|SCHPRO42|smallint||Column|CAHPS school-related variable (0-17), values: -8=DK, -1=Inapplicable, 1=Never, 2=Sometimes, 3=Usually, 4=Always.
+MySQL-001|vg_meps01|h216_narrow|SEX|int||Column|Biological sex: 1 = Male, 2 = Female
+MySQL-001|vg_meps01|h216_narrow|SIBPRO42|smallint||Column|Problem getting along with siblings (ages 5-17); coded values: -15, -8, -7, -1, 0, 1-3, 4, 99.
+MySQL-001|vg_meps01|h216_narrow|SOCLIM31|smallint||Column|Household size on the reference date (Round 3/1); values represent exact family member counts or inapplicable data (-1).
+MySQL-001|vg_meps01|h216_narrow|SPRPRO42|smallint||Column|Provider spent enough time with person during Round 4/2 interview: -8=Don't know, -7=Refused, -1=Inapplicable, 1=Yes, 2=No.
+MySQL-001|vg_meps01|h216_narrow|STNDIF31|smallint||Column|Indicator if deceased in round 3/1: -1 Not applicable, 31 Yes (deceased); numeric, two-digit code.
+MySQL-001|vg_meps01|h216_narrow|STPDIF31|smallint||Column|Indicator of difficulty climbing steps (range: none to severe) during reference period (Round 3/1).
+MySQL-001|vg_meps01|h216_narrow|TOTEXP19|int||Column|Total health care expenditures in 2019; includes all sources of payment. Format: Numeric. Values vary.
+MySQL-001|vg_meps01|h216_narrow|TOTSLF19|int||Column|Indicates self-coverage eligibility status in 2019 (-1: Not Applicable, 1: Self-covered, 2: Not Self-covered).
+MySQL-001|vg_meps01|h216_narrow|TOTTCH19|int||Column|Final total number of eligible persons for survey analysis in 2019 weighted by PERWT19F.
+MySQL-001|vg_meps01|h216_narrow|TTLP19X|int||Column|Total pre-tax income for 2019 in dollars (numeric, unweighted values range from -72406 to 72406).
+MySQL-001|vg_meps01|h216_narrow|UNABLE31|smallint||Column|Measures difficulty with activities in the past 31 days: -15 Cannot Compute, -1 Inapplicable, 1-3 Problem severity, 4 Very Big Problem.
+MySQL-001|vg_meps01|h216_narrow|UNHAP42|smallint||Column|Indicator for whether a person is unhappy:  Values: -8 (Don't know), -7 (Refused), -1 (Inapplicable), 1 (Yes), 2 (No).
+MySQL-001|vg_meps01|h216_narrow|WAGEP19X|int||Column|Numeric; person's wage income in 2019; $0, $9-$303,114, weighted by PERWT19F; unweighted total: 28,512 observations.
+MySQL-001|vg_meps01|h216_narrow|WLKDIF31|smallint||Column|Difficulty walking: -8 DK, -1 Inapplicable, 1 No difficulty, 2 Some difficulty, 3 A lot of difficulty, 4 Unable to do.
+MySQL-001|vg_meps01|h216_narrow|WLKLIM31|smallint||Column|2.0 numeric: -8=Don�t know, -1=Inapplicable, 1=Yes, 2=No. Indicates child�s health limitation status.
+MySQL-001|vg_meps01|h216_narrow|WRKLIM31|smallint||Column|Indicates work limitations (-8=Don't know, -7=Refused, -1=Inapplicable, 1=Yes, 2=No).
+MySQL-001|vg_meps01|h216_narrow|PERWT19F|float||Column|Person-level weight that allows us to extrapolate to the US population.
+MySQL-001|vg_meps01|h216_narrow||||Table|This is the main table, has enrollment, expenditure, and key demographic information. Can be linked to pharmacy and medical tables if needed using DUPERSID.

vg_/csvs/flagged_examples.json ADDED Viewed

	@@ -0,0 +1,7 @@

+[
+    {
+        "database": "capgpbsh_vg_chinook",
+        "user_messages": "Give me a bar plot of sales, by album, of U2\n",
+        "example_code": "\nimport pandas as pd\nimport matplotlib.pyplot as plt\n\n# Query to find the ArtistId of U2\nquery_artist_id = \"SELECT ArtistId FROM Artist WHERE Name = 'U2'\"\nartist_id = pd.read_sql(query_artist_id, conn).iloc[0, 0]\n\n# Query to find the sales by album for U2\nquery_album_sales = \"\"\"\n    SELECT a.Title, SUM(il.Quantity) AS TotalSales\n    FROM Album a\n    JOIN Artist ar ON a.ArtistId = ar.ArtistId\n    JOIN Track t ON a.AlbumId = t.AlbumId\n    JOIN InvoiceLine il ON t.TrackId = il.TrackId\n    WHERE ar.ArtistId = %s\n    GROUP BY a.Title\n    ORDER BY SUM(il.Quantity) DESC\n\"\"\" % artist_id\n\n# Use pd.read_sql to execute the query and get the sales by album for U2\nalbum_sales = pd.read_sql(query_album_sales, conn)\n\n# Create a bar plot of sales by album using Matplotlib\nplt.figure(figsize=(10, 6))\nplt.bar(album_sales['Title'], album_sales['TotalSales'])\nplt.xlabel('Album Title')\nplt.ylabel('Total Sales')\nplt.title('Album Sales by U2')\nplt.xticks(rotation=90)\nplt.tight_layout()\nplt.show()"
+    }
+]

vg_/csvs/training_examples.json CHANGED Viewed

@@ -5,78 +5,168 @@
         "example_code": "# Using Server: MUZAYAN_GRAM\\SQLEXPRESS, and database: MEPS.\n#conn = Driver={SQL Server};Server=MUZAYAN_GRAM\\SQLEXPRESS;Database=MEPS;Trusted_Connection=yes;.\n\nimport pandas as pd\nimport matplotlib.pyplot as plt\nfrom matplotlib_venn import venn2\n\n# Query the data\ndiabetes_query = \"\"\"\n    SELECT DUPERSID, DIABDX_M18, AGELAST\n    FROM h216_narrow\n\"\"\"\ndiabetes_df = pd.read_sql(diabetes_query, conn)\n\n# Filter the data\ndiabetes_df = diabetes_df[(diabetes_df['DIABDX_M18'] == '1') | (diabetes_df['DIABDX_M18'] == '2')]\ndiabetes_df = diabetes_df[diabetes_df['AGELAST'].notna()]\ndiabetes_df['DIABDX_M18'] = diabetes_df['DIABDX_M18'].apply(lambda x: True if x == '1' else False)\ndiabetes_df['UNDER_50'] = diabetes_df['AGELAST'].apply(lambda x: True if int(x) < 50 else False)\n\n# Create a venn diagram\ndiabetes_count = diabetes_df['DIABDX_M18'].sum()\nunder_50_count = diabetes_df['UNDER_50'].sum()\nboth_count = ((diabetes_df['DIABDX_M18']) & (diabetes_df['UNDER_50'])).sum()\n\n# Plot the venn diagram\nplt.figure(figsize=(8, 8))\nv = venn2(subsets=(diabetes_count - both_count, under_50_count - both_count, both_count), set_labels=('Diabetes', 'Under 50'))\n\n# Show the plot\nplt.show()"
     },
     {
-        "database": "Db_Test_01",
-        "user_messages": "give me a histogram for rings",
-        "example_code": "# Using Server: MUZAYAN_GRAM\\SQLEXPRESS, and database: Db_Test_01.\n#conn = Driver={SQL Server};Server=MUZAYAN_GRAM\\SQLEXPRESS;Database=Db_Test_01;Trusted_Connection=yes;.\n\nimport matplotlib.pyplot as plt\nimport pandas as pd\nimport numpy as np\n\n# Read the 'Rings' column from the 'd_Abalone' table\nrings = pd.read_sql(\"SELECT Rings FROM d_Abalone\", conn)\n\n# Create a histogram with a black edge around the bars\nplt.hist(rings['Rings'], edgecolor='black')\nplt.xlabel('Number of Rings')\nplt.ylabel('Frequency')\nplt.title('Histogram of Abalone Rings')\nplt.show()"
     },
     {
-        "database": "Db_Test_01",
-        "user_messages": "give me a histogram for rings; color it red",
-        "example_code": "# Using Server: MUZAYAN_GRAM\\SQLEXPRESS, and database: Db_Test_01.\n#conn = Driver={SQL Server};Server=MUZAYAN_GRAM\\SQLEXPRESS;Database=Db_Test_01;Trusted_Connection=yes;.\n\nimport matplotlib.pyplot as plt\nimport pandas as pd\n\n# Read the 'Rings' column from the 'd_Abalone' table\nrings = pd.read_sql(\"SELECT Rings FROM d_Abalone\", conn)\n\n# Create a histogram with a black edge around the bars and color the bars red\nplt.hist(rings['Rings'], edgecolor='black', color='red')\nplt.xlabel('Number of Rings')\nplt.ylabel('Frequency')\nplt.title('Histogram of Abalone Rings')\nplt.show()"
     },
     {
-        "database": "Chinook",
-        "user_messages": "how many artists in the data?",
-        "example_code": "# Using Server: MUZAYAN_GRAM\\SQLEXPRESS, and database: Chinook.\n#conn = Driver={SQL Server};Server=MUZAYAN_GRAM\\SQLEXPRESS;Database=Chinook;Trusted_Connection=yes;.\n\nimport pandas as pd\n\n# Read the Artist table into a Pandas dataframe\nartist_df = pd.read_sql(\"SELECT * FROM Artist\", conn)\n\n# Count the number of artists\nnum_artists = len(artist_df)\n\nprint(f\"There are {num_artists} artists in the data.\")"
     },
     {
         "database": "MEPS",
-        "user_messages": "what is the relationship between income and diabetes, from meps;",
-        "example_code": "# Using Server: MUZAYAN_GRAM\\SQLEXPRESS, and database: MEPS.\n#conn = Driver={SQL Server};Server=MUZAYAN_GRAM\\SQLEXPRESS;Database=MEPS;Trusted_Connection=yes;.\n\nimport pandas as pd\nimport numpy as np\nimport matplotlib.pyplot as plt\nimport seaborn as sns\n\n# Load the h216_narrow table into a pandas dataframe\ndf = pd.read_sql('SELECT * FROM h216_narrow', conn)\n\n# Filter the dataframe to include only rows where DIABDX_M18 is '1' (Diagnosed) or '2' (Not Diagnosed)\ndf = df[df['DIABDX_M18'].isin(['1', '2'])]\n\n# Convert DIABDX_M18 to a categorical variable\ndf['DIABDX_M18'] = df['DIABDX_M18'].astype('category')\ndf['DIABDX_M18'] = df['DIABDX_M18'].cat.rename_categories({'1': 'Diagnosed', '2': 'Not Diagnosed'})\n\n# Plot the distribution of total family income (FAMINC19) by diabetes diagnosis status\nplt.figure(figsize=(10,6))\nsns.boxplot(data=df, x='DIABDX_M18', y='FAMINC19')\nplt.title('Distribution of Total Family Income by Diabetes Diagnosis Status')\nplt.xlabel('Diabetes Diagnosis Status')\nplt.ylabel('Total Family Income')\nplt.show()\n\n# Calculate the mean and standard deviation of total family income by diabetes diagnosis status\ngrouped_df = df.groupby('DIABDX_M18')['FAMINC19'].agg(['mean', 'std'])\nprint(grouped_df)\n\n# Perform a t-test to compare the mean total family income between diagnosed and not diagnosed groups\nfrom scipy.stats import ttest_ind\ndiagnosed_group = df[df['DIABDX_M18'] == 'Diagnosed']['FAMINC19']\nnot_diagnosed_group = df[df['DIABDX_M18'] == 'Not Diagnosed']['FAMINC19']\nt_stat, p_val = ttest_ind(diagnosed_group, not_diagnosed_group)\nprint(f'T-statistic: {t_stat}, p-value: {p_val}')"
     },
     {
         "database": "MEPS",
-        "user_messages": "give me a venn diagram for people with arthritis and with diabetes",
-        "example_code": "# Using Server: MUZAYAN_GRAM\\SQLEXPRESS, and database: MEPS.\n#conn = Driver={SQL Server};Server=MUZAYAN_GRAM\\SQLEXPRESS;Database=MEPS;Trusted_Connection=yes;.\n\nimport pandas as pd\nimport matplotlib.pyplot as plt\nfrom matplotlib_venn import venn2\n\n# Read data from the database\nquery = \"\"\"\n    SELECT DUPERSID, ARTHDX, DIABDX_M18\n    FROM h216_narrow\n    WHERE ARTHDX = '1' OR DIABDX_M18 = '1'\n\"\"\"\ndata = pd.read_sql(query, conn)\n\n# Create a boolean mask for people with arthritis and people with diabetes\nhas_arthritis = data['ARTHDX'] == '1'\nhas_diabetes = data['DIABDX_M18'] == '1'\n\n# Calculate the overlap between the two groups\narthritis_only = len(data[has_arthritis & ~has_diabetes])\ndiabetes_only = len(data[~has_arthritis & has_diabetes])\nboth = len(data[has_arthritis & has_diabetes])\ntotal = len(data)\n\n# Create the Venn diagram\nplt.figure(figsize=(8, 8))\nvenn2(subsets=(arthritis_only, diabetes_only, both), set_labels=('Arthritis', 'Diabetes'))\n\n# Show the total number of patients in the title\nplt.title(f'Venn Diagram of Arthritis and Diabetes (Total: {total})')\n\n# Display the plot\nplt.show()"
     },
     {
-        "database": "Db_Test_01",
-        "user_messages": "give me a model to predict rings",
-        "example_code": "# Using Server: MUZAYAN_GRAM\\SQLEXPRESS, and database: Db_Test_01.\n#conn = Driver={SQL Server};Server=MUZAYAN_GRAM\\SQLEXPRESS;Database=Db_Test_01;Trusted_Connection=yes;.\n\n# Import necessary libraries\nimport pandas as pd\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.ensemble import RandomForestRegressor\nfrom sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score\nfrom sklearn.preprocessing import OneHotEncoder\n\n# Load the data from the MS SQL database\ndf = pd.read_sql('SELECT * FROM d_Abalone', conn)\n\n# One-hot-encode the Sex column\ndf = pd.get_dummies(df, columns=['Sex'])\n\n# Define features (X) and target (y)\nX = df.drop(['Rings'], axis=1)\ny = df['Rings']\n\n# Split the data into training and testing sets\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n\n# Create and train a random forest regressor model\nmodel = RandomForestRegressor(n_estimators=100, random_state=42)\nmodel.fit(X_train, y_train)\n\n# Make predictions on the test set\ny_pred = model.predict(X_test)\n\n# Calculate performance metrics\nmse = mean_squared_error(y_test, y_pred)\nmae = mean_absolute_error(y_test, y_pred)\nr2 = r2_score(y_test, y_pred)\n\n# Print performance metrics\nprint(f\"Mean Squared Error: {mse:.2f}\")\nprint(f\"Mean Absolute Error: {mae:.2f}\")\nprint(f\"R2 Score: {r2:.2f}\")"
     },
     {
-        "database": "BenchmarkingDb_Smol",
-        "user_messages": "give me a model to predict iris",
-        "example_code": "# Using Server: MUZAYAN_GRAM\\SQLEXPRESS, and database: BenchmarkingDb_Smol.\n#conn = Driver={SQL Server};Server=MUZAYAN_GRAM\\SQLEXPRESS;Database=BenchmarkingDb_Smol;Trusted_Connection=yes;.\n\n# Import necessary libraries\nimport pandas as pd\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.preprocessing import StandardScaler\nfrom sklearn.ensemble import RandomForestClassifier\nfrom sklearn.metrics import accuracy_score, classification_report, confusion_matrix\n\n# Load the iris dataset from the database\niris_data = pd.read_sql(\"SELECT * FROM iris\", conn)\n\n# Define the features (X) and the target variable (y)\nX = iris_data[['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']]\ny = iris_data['target']\n\n# Split the data into training and testing sets\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n\n# Standardize the features\nscaler = StandardScaler()\nX_train = scaler.fit_transform(X_train)\nX_test = scaler.transform(X_test)\n\n# Train a random forest classifier on the training data\nrfc = RandomForestClassifier(n_estimators=100, random_state=42)\nrfc.fit(X_train, y_train)\n\n# Make predictions on the test data\ny_pred = rfc.predict(X_test)\n\n# Evaluate the model's performance\nprint(\"Accuracy:\", accuracy_score(y_test, y_pred))\nprint(\"Classification Report:\")\nprint(classification_report(y_test, y_pred))\nprint(\"Confusion Matrix:\")\nprint(confusion_matrix(y_test, y_pred))"
     },
     {
         "database": "MEPS",
         "user_messages": "how many diabetics in meps?",
-        "example_code": "# Using Server: MUZAYAN_GRAM\\SQLEXPRESS, and database: MEPS.\n#conn = Driver={SQL Server};Server=MUZAYAN_GRAM\\SQLEXPRESS;Database=MEPS;Trusted_Connection=yes;.\n\n# Import necessary libraries\nimport pandas as pd\n\n# Query the database to count the number of diabetics\nquery = \"\"\"\n    SELECT COUNT(DUPERSID) \n    FROM h216_narrow \n    WHERE DIABDX_M18 = '1' \n\"\"\"\n# Read the query results into a pandas DataFrame\ndf = pd.read_sql(query, conn)\n\n# Print the number of diabetics in the database\nprint(\"Number of diabetics in MEPS:\", df.iloc[0, 0])"
     },
     {
         "database": "MEPS",
-        "user_messages": "give me the pmpm total healthcare cost from meps for diabetics",
-        "example_code": "# Using Server: MUZAYAN_GRAM\\SQLEXPRESS, and database: MEPS.\n#conn = Driver={SQL Server};Server=MUZAYAN_GRAM\\SQLEXPRESS;Database=MEPS;Trusted_Connection=yes;.\n\n# Import necessary libraries\nimport pandas as pd\n\n# Query the database to get the total healthcare cost for diabetics\nquery = \"\"\"\n    SELECT SUM(TOTEXP19) as total_cost, COUNT(DUPERSID) as total_diabetics \n    FROM h216_narrow \n    WHERE DIABDX_M18 = '1' \n\"\"\"\n# Read the query results into a pandas DataFrame\ndf = pd.read_sql(query, conn)\n\n# Calculate the PMPM total healthcare cost for diabetics\ntotal_cost = df['total_cost'].iloc[0]\ntotal_diabetics = df['total_diabetics'].iloc[0]\npmpm_cost = (total_cost / total_diabetics) / 12  # Assuming 12 months in a year\n\n# Print the PMPM total healthcare cost for diabetics\nprint(\"PMPM total healthcare cost for diabetics in MEPS: $\", round(pmpm_cost, 2))"
     },
     {
-        "database": "Chinook",
-        "user_messages": "from the chinook db, can you plot the number of customers by country on a map, and color code it by number of customers (white for few, and green for more)?",
-        "example_code": "# Using Server: MUZAYAN_GRAM\\SQLEXPRESS, and database: Chinook.\n#conn = Driver={SQL Server};Server=MUZAYAN_GRAM\\SQLEXPRESS;Database=Chinook;Trusted_Connection=yes;.\n\nimport pandas as pd\nimport plotly.graph_objects as go\n\n# Read the Customer table into a Pandas dataframe\ncustomer_df = pd.read_sql(\"SELECT Country FROM Customer\", conn)\n\n# Group the customers by country and count the number of customers\ncountry_counts = customer_df['Country'].value_counts().reset_index()\ncountry_counts.columns = ['Country', 'Count']\n\n# Use plotly to create a choropleth map\nfig = go.Figure(data=go.Choropleth(\n    locations=country_counts['Country'], # Spatial coordinates\n    z = country_counts['Count'], # Data to display\n    locationmode = 'country names',\n    colorscale=[[0, \"rgba(255, 255, 255, 1)\"], [1, \"rgba(0, 128, 0, 1)\"]],\n    colorbar_title = \"Number of customers\"\n    ))\n\nfig.update_layout(\n        title_text='Number of Customers by Country',\n        geo_scope='world',\n    )\n\nfig.show()"
     },
     {
-        "database": "Db_Test_01",
-        "user_messages": "For the Abalone dataset, give me the count of Rings > 10.",
-        "example_code": "# Using Server: MUZAYAN_GRAM\\SQLEXPRESS, and database: Db_Test_01.\n#conn = Driver={SQL Server};Server=MUZAYAN_GRAM\\SQLEXPRESS;Database=Db_Test_01;Trusted_Connection=yes;.\n\nimport pandas as pd\n\n# Read the 'Rings' column from the 'd_Abalone' table where Rings > 10\nrings_over_10 = pd.read_sql(\"SELECT count(*) as count FROM d_Abalone WHERE Rings > 10\", conn)\n\n# Print the result\nprint(rings_over_10['count'].iloc[0])"
     },
     {
         "database": "MEPS",
-        "user_messages": "give me the top 10 drugs (in terms of cost) taken by those with diabetes, from meps",
-        "example_code": "# Using Server: MUZAYAN_GRAM\\SQLEXPRESS, and database: MEPS.\n#conn = Driver={SQL Server};Server=MUZAYAN_GRAM\\SQLEXPRESS;Database=MEPS;Trusted_Connection=yes;.\n\n# Import necessary libraries\nimport pandas as pd\n\n# Query the database to get top 10 drugs taken by diabetics\nquery = \"\"\"\n    SELECT TOP 10 RXDRGNAM, SUM(RXEXP19) as Total_Cost\n    FROM h213a\n    INNER JOIN h216_narrow ON h213a.DUPERSID = h216_narrow.DUPERSID\n    WHERE h216_narrow.DIABDX_M18 = '1'\n    GROUP BY RXDRGNAM\n    ORDER BY Total_Cost DESC\n\"\"\"\n\n# Read the query results into a pandas DataFrame\ndf = pd.read_sql(query, conn)\n\n# Print the top 10 drugs taken by diabetics\nprint(\"Top 10 drugs taken by diabetics in MEPS:\")\ndf"
     },
     {
         "database": "MEPS",
-        "user_messages": "give me the top 10 drugs (in terms of cost) taken by those with diabetes, from meps; express in a pie chart",
-        "example_code": "# Using Server: MUZAYAN_GRAM\\SQLEXPRESS, and database: MEPS.\n#conn = Driver={SQL Server};Server=MUZAYAN_GRAM\\SQLEXPRESS;Database=MEPS;Trusted_Connection=yes;.\n\n# Import necessary libraries\nimport pandas as pd\nimport matplotlib.pyplot as plt\n\n# Query the database to get top 10 drugs taken by diabetics\nquery = \"\"\"\n    SELECT TOP 10 RXDRGNAM, SUM(RXEXP19) as Total_Cost\n    FROM h213a\n    INNER JOIN h216_narrow ON h213a.DUPERSID = h216_narrow.DUPERSID\n    WHERE h216_narrow.DIABDX_M18 = '1'\n    GROUP BY RXDRGNAM\n    ORDER BY Total_Cost DESC\n\"\"\"\n\n# Read the query results into a pandas DataFrame\ndf = pd.read_sql(query, conn)\n\n# Print the top 10 drugs taken by diabetics\nprint(\"Top 10 drugs taken by diabetics in MEPS:\")\ndf\n\n# Plot a pie chart to visualize the top 10 drugs taken by diabetics\nplt.figure(figsize=(10,8))\nplt.pie(df['Total_Cost'], labels = df['RXDRGNAM'], autopct='%1.1f%%', textprops={'fontsize': 10})\nplt.title('Top 10 Drugs Taken by Diabetics in MEPS')\nplt.show()"
     },
     {
         "database": "MEPS",
-        "user_messages": "what are the top 10 drugs, by cost, for people that have arthritis. Use meps; Re-trying...; for cost, use sum of RXSF19X, RXOF19X, RXOT19X and RXPV19X",
-        "example_code": "# Using Server: MUZAYAN_GRAM\\SQLEXPRESS, and database: MEPS.\n#conn = Driver={SQL Server};Server=MUZAYAN_GRAM\\SQLEXPRESS;Database=MEPS;Trusted_Connection=yes;.\n\n# Import necessary libraries\nimport pandas as pd\n\n# Query the database to get top 10 drugs taken by people with arthritis\nquery = \"\"\"\n    SELECT TOP 10 h213a.RXDRGNAM, SUM(h213a.RXSF19X + h213a.RXOF19X + h213a.RXOT19X + h213a.RXPV19X) as Total_Cost\n    FROM h213a\n    INNER JOIN h216_narrow ON h213a.DUPERSID = h216_narrow.DUPERSID\n    WHERE h216_narrow.ARTHDX = '1'\n    GROUP BY h213a.RXDRGNAM\n    ORDER BY Total_Cost DESC\n\"\"\"\n\n# Read the query results into a pandas DataFrame\ndf = pd.read_sql(query, conn)\n\n# Print the top 10 drugs taken by people with arthritis\nprint(\"Top 10 drugs taken by people with arthritis in MEPS:\")\ndf"
     },
     {
         "database": "MEPS",
-        "user_messages": "give me the top 10 drugs, in terms of amount paid by insurance, for individuals with diabetes.",
-        "example_code": "# Using Server: MUZAYAN_GRAM\\SQLEXPRESS, and database: MEPS.\n#conn = Driver={SQL Server};Server=MUZAYAN_GRAM\\SQLEXPRESS;Database=MEPS;Trusted_Connection=yes;.\n\n# Import necessary libraries\nimport pandas as pd\n\n# Query the database to get top 10 drugs taken by diabetics\nquery = \"\"\"\n    SELECT TOP 10 h213a.RXDRGNAM, SUM(h213a.RXPV19X+RXOF19X+RXOT19X+RXMR19X+RXSL19X) as Total_Paid\n    FROM h213a\n    INNER JOIN h216_narrow ON h213a.DUPERSID = h216_narrow.DUPERSID\n    WHERE h216_narrow.DIABDX_M18 = '1'\n    GROUP BY h213a.RXDRGNAM\n    ORDER BY Total_Paid DESC\n\"\"\"\n\n# Read the query results into a pandas DataFrame\ndf = pd.read_sql(query, conn)\n\n# Print the top 10 drugs taken by diabetics\nprint(\"Top 10 drugs taken by diabetics in MEPS, in terms of amount paid by insurance:\")\ndf"
     }
 ]

         "example_code": "# Using Server: MUZAYAN_GRAM\\SQLEXPRESS, and database: MEPS.\n#conn = Driver={SQL Server};Server=MUZAYAN_GRAM\\SQLEXPRESS;Database=MEPS;Trusted_Connection=yes;.\n\nimport pandas as pd\nimport matplotlib.pyplot as plt\nfrom matplotlib_venn import venn2\n\n# Query the data\ndiabetes_query = \"\"\"\n    SELECT DUPERSID, DIABDX_M18, AGELAST\n    FROM h216_narrow\n\"\"\"\ndiabetes_df = pd.read_sql(diabetes_query, conn)\n\n# Filter the data\ndiabetes_df = diabetes_df[(diabetes_df['DIABDX_M18'] == '1') | (diabetes_df['DIABDX_M18'] == '2')]\ndiabetes_df = diabetes_df[diabetes_df['AGELAST'].notna()]\ndiabetes_df['DIABDX_M18'] = diabetes_df['DIABDX_M18'].apply(lambda x: True if x == '1' else False)\ndiabetes_df['UNDER_50'] = diabetes_df['AGELAST'].apply(lambda x: True if int(x) < 50 else False)\n\n# Create a venn diagram\ndiabetes_count = diabetes_df['DIABDX_M18'].sum()\nunder_50_count = diabetes_df['UNDER_50'].sum()\nboth_count = ((diabetes_df['DIABDX_M18']) & (diabetes_df['UNDER_50'])).sum()\n\n# Plot the venn diagram\nplt.figure(figsize=(8, 8))\nv = venn2(subsets=(diabetes_count - both_count, under_50_count - both_count, both_count), set_labels=('Diabetes', 'Under 50'))\n\n# Show the plot\nplt.show()"
     },
     {
+        "database": "MEPS",
+        "user_messages": "give me the average pharmacy claim cost for people with and without diabetes",
+        "example_code": "\nimport pandas as pd\n\n# Query the database to get the pharmacy claim cost for people with and without diabetes\nquery = \"\"\"\n    SELECT \n        CASE \n            WHEN DIABAGED > 0 THEN 'With Diabetes'\n            ELSE 'Without Diabetes'\n        END AS diabetes_status,\n        AVG(RXEXP19) AS average_pharmacy_claim_cost\n    FROM \n        h216_narrow\n    GROUP BY \n        CASE \n            WHEN DIABAGED > 0 THEN 'With Diabetes'\n            ELSE 'Without Diabetes'\n        END\n\"\"\"\n\n# Read the query results into a pandas DataFrame\ndf = pd.read_sql(query, conn)\n\n# Print the average pharmacy claim cost for people with and without diabetes\nfor index, row in df.iterrows():\n    print(f\"Average pharmacy claim cost for people {row['diabetes_status'].lower()}: ${round(row['average_pharmacy_claim_cost'], 2)}\")\n\ndf"
     },
     {
+        "database": "Chinook",
+        "user_messages": "how many artists are there?",
+        "example_code": "\nimport pandas as pd\nimport pyodbc\n\n# SQL query to count the number of artists\nquery = \"SELECT COUNT(*) AS NumberOfArtists FROM Artist\"\n\n# Execute the query and read the result into a pandas DataFrame\ndf = pd.read_sql(query, conn)\n\n# Print the number of artists\nprint(\"Number of artists:\", df['NumberOfArtists'][0])"
     },
     {
+        "database": "MEPS",
+        "user_messages": "give me a venn diagram for people with diabetes and those with asthma",
+        "example_code": "\nimport pandas as pd\nimport matplotlib.pyplot as plt\nfrom matplotlib_venn import venn2\n\n# Query the data for diabetes and asthma\nquery = \"\"\"\n    SELECT DUPERSID, DIABDX_M18, ASTHDX\n    FROM h216_narrow\n\"\"\"\ndf = pd.read_sql(query, conn)\n\n# Filter the data to include only diagnosed cases\ndf['DIABDX_M18'] = df['DIABDX_M18'].apply(lambda x: True if x == '1' else False)\ndf['ASTHDX'] = df['ASTHDX'].apply(lambda x: True if x == '1' else False)\n\n# Calculate counts for the Venn diagram\ndiabetes_count = df['DIABDX_M18'].sum()\nasthma_count = df['ASTHDX'].sum()\nboth_count = (df['DIABDX_M18'] & df['ASTHDX']).sum()\n\n# Plot the Venn diagram\nplt.figure(figsize=(8, 8))\nvenn2(subsets=(diabetes_count - both_count, asthma_count - both_count, both_count),\n      set_labels=('Diabetes', 'Asthma'))\n\n# Show the plot\nplt.title(\"Venn Diagram of People with Diabetes and Asthma\")\nplt.show()"
     },
     {
         "database": "MEPS",
+        "user_messages": "visualize a venn diagram for people that have diabetes and arthritis, using meps",
+        "example_code": "\nimport pandas as pd\nimport matplotlib.pyplot as plt\nfrom matplotlib_venn import venn2\n\n# Query the data for diabetes and arthritis\nquery = \"\"\"\n    SELECT DUPERSID, DIABDX_M18, ARTHDX\n    FROM h216_narrow\n\"\"\"\ndf = pd.read_sql(query, conn)\n\n# Filter and process the data\ndf = df[(df['DIABDX_M18'].isin(['1', '2'])) & (df['ARTHDX'].isin(['1', '2']))]  # Filter valid responses\ndf['Diabetes'] = df['DIABDX_M18'].apply(lambda x: True if x == '1' else False)  # Convert to boolean\ndf['Arthritis'] = df['ARTHDX'].apply(lambda x: True if x == '1' else False)     # Convert to boolean\n\n# Calculate counts for the Venn diagram\ndiabetes_count = df['Diabetes'].sum()\narthritis_count = df['Arthritis'].sum()\nboth_count = (df['Diabetes'] & df['Arthritis']).sum()\n\n# Create and plot the Venn diagram\nplt.figure(figsize=(8, 8))\nvenn2(subsets=(diabetes_count - both_count, arthritis_count - both_count, both_count),\n      set_labels=('Diabetes', 'Arthritis'))\nplt.title('Venn Diagram of People with Diabetes and Arthritis')\nplt.show()"
     },
     {
         "database": "MEPS",
+        "user_messages": "visualize a venn diagram for people that have diabetes and asthma, from meps",
+        "example_code": "\nimport pandas as pd\nimport matplotlib.pyplot as plt\nfrom matplotlib_venn import venn2\n\n# Query the data for diabetes and asthma\nquery = \"\"\"\n    SELECT DUPERSID, DIABDX_M18, ASTHDX\n    FROM h216_narrow\n\"\"\"\ndf = pd.read_sql(query, conn)\n\n# Filter the data to include only diagnosed cases\ndf['DIABDX_M18'] = df['DIABDX_M18'].apply(lambda x: True if x == '1' else False)\ndf['ASTHDX'] = df['ASTHDX'].apply(lambda x: True if x == '1' else False)\n\n# Calculate counts for the Venn diagram\ndiabetes_count = df['DIABDX_M18'].sum()\nasthma_count = df['ASTHDX'].sum()\nboth_count = (df['DIABDX_M18'] & df['ASTHDX']).sum()\n\n# Plot the Venn diagram\nplt.figure(figsize=(8, 8))\nvenn2(subsets=(diabetes_count - both_count, asthma_count - both_count, both_count),\n      set_labels=('Diabetes', 'Asthma'))\n\n# Show the plot\nplt.title(\"Venn Diagram of People with Diabetes and Asthma\")\nplt.show()"
     },
     {
+        "database": "MEPS",
+        "user_messages": "\"give me the average pharmacy claim cost for people with and without diabetes\n\"\n",
+        "example_code": "\nimport pandas as pd\n\n# Query to calculate average pharmacy claim cost by diabetes status\nquery = \"\"\"\nSELECT\n    CASE\n        WHEN CAST(DIABDX_M18 AS INT) = 1 THEN 'With Diabetes'\n        ELSE 'Without Diabetes'\n    END AS diabetes_status,\n    AVG(RXEXP19) AS avg_pharmacy_claim_cost\nFROM \n    h216_narrow\nGROUP BY \n    CASE\n        WHEN CAST(DIABDX_M18 AS INT) = 1 THEN 'With Diabetes'\n        ELSE 'Without Diabetes'\n    END\n\"\"\"\n\ndf = pd.read_sql(query, conn)\n\n# Format and print results\nprint(\"Average Pharmacy Claim Costs (2019)\")\nprint(\"-----------------------------------\")\nfor _, row in df.iterrows():\n    status = row['diabetes_status'].lower()\n    cost = round(row['avg_pharmacy_claim_cost'], 2)\n    print(f\"{status}: ${cost:,.2f}\")"
     },
     {
+        "database": "MEPS",
+        "user_messages": "what is the relationship between income and diabetes",
+        "example_code": "\nimport pandas as pd\n\n# Query to calculate correlation between income and diabetes\nquery = \"\"\"\nSELECT \n    FAMINC19 AS income,\n    CASE\n        WHEN DIABDX_M18 = '1' THEN 1\n        ELSE 0\n    END AS diabetes\nFROM \n    h216_narrow\nWHERE \n    FAMINC19 IS NOT NULL AND DIABDX_M18 IS NOT NULL\n\"\"\"\n\ndf = pd.read_sql(query, conn)\n\n# Calculate correlation between income and diabetes\ncorrelation = df['income'].corr(df['diabetes'])\nprint(f\"Correlation between income and diabetes: {correlation:.4f}\")\n\n# Calculate average income for people with and without diabetes\navg_income_with_diabetes = df[df['diabetes'] == 1]['income'].mean()\navg_income_without_diabetes = df[df['diabetes'] == 0]['income'].mean()\n\n# Print average income for people with and without diabetes\nprint(f\"Average income for people with diabetes: ${avg_income_with_diabetes:,.2f}\")\nprint(f\"Average income for people without diabetes: ${avg_income_without_diabetes:,.2f}\")\n\n# Calculate percentage of people with diabetes at each income level\nincome_bins = [0, 25000, 50000, 75000, 100000, float('inf')]\ndf['income_bin'] = pd.cut(df['income'], bins=income_bins)\ndiabetes_rates = df.groupby('income_bin')['diabetes'].mean()\n\n# Print percentage of people with diabetes at each income level\nprint(\"\\nPercentage of people with diabetes at each income level:\")\nprint(diabetes_rates)"
+    },
+    {
+        "database": "MEPS",
+        "user_messages": "what is the relationship between income and diabetes",
+        "example_code": "\nimport pandas as pd\n\n# Query to calculate correlation between income and diabetes\nquery = \"\"\"\nSELECT \n    FAMINC19 AS income,\n    CASE\n        WHEN DIABDX_M18 = '1' THEN 1\n        ELSE 0\n    END AS diabetes\nFROM \n    h216_narrow\nWHERE \n    FAMINC19 IS NOT NULL AND DIABDX_M18 IS NOT NULL\n\"\"\"\n\ndf = pd.read_sql(query, conn)\n\n# Calculate correlation between income and diabetes\ncorrelation = df['income'].corr(df['diabetes'])\nprint(f\"Correlation between income and diabetes: {correlation:.4f}\")\n\n# Calculate average income for people with and without diabetes\navg_income_with_diabetes = df[df['diabetes'] == 1]['income'].mean()\navg_income_without_diabetes = df[df['diabetes'] == 0]['income'].mean()\n\n# Print average income for people with and without diabetes\nprint(f\"Average income for people with diabetes: ${avg_income_with_diabetes:,.2f}\")\nprint(f\"Average income for people without diabetes: ${avg_income_without_diabetes:,.2f}\")\n\n# Create bins for income levels\nincome_bins = [0, 25000, 50000, 75000, 100000, float('inf')]\nincome_labels = ['<25k', '25k-50k', '50k-75k', '75k-100k', '100k+']\ndf['income_bin'] = pd.cut(df['income'], bins=income_bins, labels=income_labels)\n\n# Calculate percentage of people with diabetes at each income level\ndiabetes_rates = df.groupby('income_bin')['diabetes'].mean()\n\n# Print percentage of people with diabetes at each income level\nprint(\"\\nPercentage of people with diabetes at each income level:\")\nprint(diabetes_rates * 100)  # Convert to percentage"
     },
     {
         "database": "MEPS",
         "user_messages": "how many diabetics in meps?",
+        "example_code": "\nimport pandas as pd\n\n# Query the data\ndiabetes_query = \"\"\"\n    SELECT DIABDX_M18\n    FROM h216_narrow\n\"\"\"\ndiabetes_df = pd.read_sql(diabetes_query, conn)\n\n# Filter the data\ndiabetes_df = diabetes_df[diabetes_df['DIABDX_M18'].notna()]\ndiabetes_df['DIABDX_M18'] = diabetes_df['DIABDX_M18'].apply(lambda x: True if x == '1' else False)\n\n# Get the count of diabetics\ndiabetic_count = diabetes_df['DIABDX_M18'].sum()\n\n# Calculate the percentage of diabetics\ntotal_count = len(diabetes_df)\ndiabetic_percentage = (diabetic_count / total_count) * 100\n\nprint(f\"The total count of diabetics in MEPS is: {diabetic_count}\")\nprint(f\"The percentage of diabetics in MEPS is: {diabetic_percentage:.2f}%\")"
     },
     {
         "database": "MEPS",
+        "user_messages": "give me a venn diagram for people with arthritis and those with diabetes, from meps",
+        "example_code": "\nimport pandas as pd\nimport matplotlib.pyplot as plt\nfrom matplotlib_venn import venn2\n\n# Query the data for diabetes and arthritis\nquery = \"\"\"\n    SELECT DUPERSID, DIABDX_M18, ARTHDX\n    FROM h216_narrow\n\"\"\"\ndf = pd.read_sql(query, conn)\n\n# Filter and process the data\ndf = df[(df['DIABDX_M18'].isin(['1', '2'])) & (df['ARTHDX'].isin(['1', '2']))]  # Filter valid responses\ndf['Diabetes'] = df['DIABDX_M18'].apply(lambda x: True if x == '1' else False)  # Convert to boolean\ndf['Arthritis'] = df['ARTHDX'].apply(lambda x: True if x == '1' else False)     # Convert to boolean\n\n# Calculate counts for the Venn diagram\ndiabetes_count = df['Diabetes'].sum()\narthritis_count = df['Arthritis'].sum()\nboth_count = (df['Diabetes'] & df['Arthritis']).sum()\n\n# Create and plot the Venn diagram\nplt.figure(figsize=(8, 8))\nvenn2(subsets=(diabetes_count - both_count, arthritis_count - both_count, both_count),\n      set_labels=('Diabetes', 'Arthritis'))\nplt.title('Venn Diagram of People with Diabetes and Arthritis')\nplt.show()"
     },
     {
+        "database": "MEPS",
+        "user_messages": "give me a venn diagram for people with asthma and those with arthritis, from meps",
+        "example_code": "\nimport pandas as pd\nimport matplotlib.pyplot as plt\nfrom matplotlib_venn import venn2\n\n# Query the data for asthma and arthritis\nquery = \"\"\"\n    SELECT DUPERSID, ASTHDX, ARTHDX\n    FROM h216_narrow\n\"\"\"\ndf = pd.read_sql(query, conn)\n\n# Filter and process the data\ndf = df[(df['ASTHDX'].isin(['1', '2'])) & (df['ARTHDX'].isin(['1', '2']))]  # Filter valid responses\ndf['Asthma'] = df['ASTHDX'].apply(lambda x: True if x == '1' else False)    # Convert to boolean\ndf['Arthritis'] = df['ARTHDX'].apply(lambda x: True if x == '1' else False) # Convert to boolean\n\n# Calculate counts for the Venn diagram\nasthma_count = df['Asthma'].sum()\narthritis_count = df['Arthritis'].sum()\nboth_count = (df['Asthma'] & df['Arthritis']).sum()\n\n# Create and plot the Venn diagram\nplt.figure(figsize=(8, 8))\nvenn2(subsets=(asthma_count - both_count, arthritis_count - both_count, both_count),\n      set_labels=('Asthma', 'Arthritis'))\nplt.title('Venn Diagram of People with Asthma and Arthritis')\nplt.show()"
     },
     {
+        "database": "MEPS",
+        "user_messages": "give me the top 10 drugs, in terms of amount paid by self, for individuals with diabetes. ",
+        "example_code": "\nimport pandas as pd\n\n# Query the database to get the top 10 drugs paid by self for individuals with diabetes\nquery = \"\"\"\n    SELECT TOP 10 \n        RXDRGNAM,\n        SUM(RXSF19X) AS total_self_paid\n    FROM \n        h213a\n    INNER JOIN \n        h216_narrow ON h213a.DUPERSID = h216_narrow.DUPERSID\n    WHERE \n        h216_narrow.DIABAGED > 0\n    GROUP BY \n        RXDRGNAM\n    ORDER BY \n        total_self_paid DESC\n\"\"\"\n\n# Read the query results into a pandas DataFrame\ndf = pd.read_sql(query, conn)\n\n# Print the top 10 drugs paid by self for individuals with diabetes\ndf"
     },
     {
         "database": "MEPS",
+        "user_messages": "from meps, give me the average total healthcare cost for people with diabetes and asthma, with just diabetes, and with neither\n\n",
+        "example_code": "\nimport pandas as pd\n\n# Query the database to get the average total healthcare cost for people with diabetes and asthma, with just diabetes, and with neither\nquery = \"\"\"\n    SELECT \n        CASE \n            WHEN DIABAGED > 0 AND ASTHAGED > 0 THEN 'Diabetes and Asthma'\n            WHEN DIABAGED > 0 THEN 'Diabetes Only'\n            ELSE 'Neither'\n        END AS condition_status,\n        AVG(TOTEXP19) AS average_healthcare_cost\n    FROM \n        h216_narrow\n    WHERE \n        DIABAGED IS NOT NULL AND ASTHAGED IS NOT NULL\n    GROUP BY \n        CASE \n            WHEN DIABAGED > 0 AND ASTHAGED > 0 THEN 'Diabetes and Asthma'\n            WHEN DIABAGED > 0 THEN 'Diabetes Only'\n            ELSE 'Neither'\n        END\n\"\"\"\n\n# Read the query results into a pandas DataFrame\ndf = pd.read_sql(query, conn)\n\n# Print the average healthcare cost for people with diabetes and asthma, with just diabetes, and with neither\nfor index, row in df.iterrows():\n    print(f\"Average total healthcare cost for people with {row['condition_status'].lower()}: ${round(row['average_healthcare_cost'], 2)}\")"
     },
     {
         "database": "MEPS",
+        "user_messages": "from meps, explore the relationship of income with total healthcare cost",
+        "example_code": "\nimport pandas as pd\nimport matplotlib.pyplot as plt\nfrom sklearn.linear_model import LinearRegression\nimport numpy as np\n\n# Query the database to get the income and total healthcare cost\nquery = \"\"\"\n    SELECT \n        FAMINC19,\n        TOTEXP19\n    FROM \n        h216_narrow\n    WHERE \n        FAMINC19 IS NOT NULL AND TOTEXP19 IS NOT NULL\n\"\"\"\n\n# Read the query results into a pandas DataFrame\ndf = pd.read_sql(query, conn)\n\n# Calculate the correlation coefficient between income and total healthcare cost\ncorrelation_coefficient = df['FAMINC19'].corr(df['TOTEXP19'])\nprint(f\"Correlation coefficient between income and total healthcare cost: {round(correlation_coefficient, 2)}\")\n\n# Create a scatter plot to visualize the relationship\nplt.figure(figsize=(10, 6))\nplt.scatter(df['FAMINC19'], df['TOTEXP19'])\nplt.xlabel('Family Income')\nplt.ylabel('Total Healthcare Cost')\nplt.title('Relationship between Income and Total Healthcare Cost')\nplt.show()\n\n# Perform linear regression to model the relationship\nX = df['FAMINC19'].values.reshape(-1, 1)\ny = df['TOTEXP19'].values\nmodel = LinearRegression()\nmodel.fit(X, y)\nprint(f\"Coefficient of determination (R^2): {round(model.score(X, y), 2)}\")\n\n# Predict total healthcare cost for different income levels\nincome_levels = np.array([20000, 50000, 100000]).reshape(-1, 1)\npredicted_costs = model.predict(income_levels)\nfor i, income in enumerate(income_levels):\n    print(f\"Predicted total healthcare cost for income ${income[0]}: ${round(predicted_costs[i], 2)}\")\n\n# Calculate the average total healthcare cost for different income quartiles\ndf['income_quartile'] = pd.qcut(df['FAMINC19'], q=4, labels=['Q1', 'Q2', 'Q3', 'Q4'])\naverage_costs = df.groupby('income_quartile')['TOTEXP19'].mean()\nprint(\"Average total healthcare cost for different income quartiles:\")\nfor index, row in average_costs.items():\n    print(f\"{index}: ${round(row, 2)}\")"
     },
     {
         "database": "MEPS",
+        "user_messages": "give me a venn diagram for those with diabetes, asthma, and arthritis. Use DUPERSID to get counts of those that have both diabetes and asthma, etc.",
+        "example_code": "\nimport pandas as pd\nimport matplotlib.pyplot as plt\nfrom matplotlib_venn import venn3\n\n# Query the data for diabetes, asthma, and arthritis\nquery = \"\"\"\n    SELECT DUPERSID, DIABDX_M18, ASTHDX, ARTHDX\n    FROM h216_narrow\n\"\"\"\ndf = pd.read_sql(query, conn)\n\n# Filter the data to include only diagnosed cases\ndf['DIABDX_M18'] = df['DIABDX_M18'].apply(lambda x: True if x == '1' else False)\ndf['ASTHDX'] = df['ASTHDX'].apply(lambda x: True if x == '1' else False)\ndf['ARTHDX'] = df['ARTHDX'].apply(lambda x: True if x == '1' else False)\n\n# Calculate counts for the Venn diagram\ndiabetes_count = df['DIABDX_M18'].sum()\nasthma_count = df['ASTHDX'].sum()\narthritis_count = df['ARTHDX'].sum()\n\nboth_diabetes_asthma = (df['DIABDX_M18'] & df['ASTHDX']).sum()\nboth_diabetes_arthritis = (df['DIABDX_M18'] & df['ARTHDX']).sum()\nboth_asthma_arthritis = (df['ASTHDX'] & df['ARTHDX']).sum()\n\nall_three = (df['DIABDX_M18'] & df['ASTHDX'] & df['ARTHDX']).sum()\n\n# Calculate the subsets\ndiabetes_only = diabetes_count - both_diabetes_asthma - both_diabetes_arthritis + all_three\nasthma_only = asthma_count - both_diabetes_asthma - both_asthma_arthritis + all_three\narthritis_only = arthritis_count - both_diabetes_arthritis - both_asthma_arthritis + all_three\n\n# Plot the Venn diagram\nplt.figure(figsize=(8, 8))\nvenn3(subsets=(diabetes_only, asthma_only, both_diabetes_asthma - all_three, \n               arthritis_only, both_diabetes_arthritis - all_three, both_asthma_arthritis - all_three, all_three),\n      set_labels=('Diabetes', 'Asthma', 'Arthritis'))\n\n# Show the plot\nplt.title(\"Venn Diagram of People with Diabetes, Asthma, and Arthritis\")\nplt.show()"
     },
     {
         "database": "MEPS",
+        "user_messages": "can i get the portion of cost paid by self by insurance coverage type (private, public, uninsured)",
+        "example_code": "\nimport pandas as pd\nimport matplotlib.pyplot as plt\nimport numpy as np\n\n# Query the database to get the required data\nquery = \"\"\"\n    SELECT \n        CASE\n            WHEN INSCOV19 = 1 THEN 'Private'\n            WHEN INSCOV19 = 2 THEN 'Public'\n            WHEN INSCOV19 = 3 THEN 'Uninsured'\n            ELSE 'Other'\n        END AS InsuranceCoverage,\n        CASE\n            WHEN TOTEXP19 = 0 THEN NULL  -- avoid divide by zero\n            ELSE TOTSLF19 / TOTEXP19\n        END AS PortionOfCostPaidBySelf\n    FROM \n        h216_narrow\n    WHERE \n        TOTEXP19 IS NOT NULL AND TOTSLF19 IS NOT NULL\n\"\"\"\n\n# Read the query results into a pandas DataFrame\ndf = pd.read_sql(query, conn)\ndf = df.dropna()  # drop any rows with missing values (i.e., TOTEXP19 = 0)\n\n# Calculate the average portion of cost paid by self by insurance coverage type\naverage_portion_paid = df.groupby('InsuranceCoverage')['PortionOfCostPaidBySelf'].mean()\nprint(\"Average portion of cost paid by self by insurance coverage type:\")\nfor index, row in average_portion_paid.items():\n    print(f\"{index}: {round(row, 2)}\")\n\n# Create a bar chart to visualize the average portion of cost paid by self by insurance coverage type\nplt.figure(figsize=(10, 6))\nplt.bar(average_portion_paid.index, average_portion_paid.values)\nplt.xlabel('Insurance Coverage')\nplt.ylabel('Average Portion of Cost Paid by Self')\nplt.title('Average Portion of Cost Paid by Self by Insurance Coverage Type')\nplt.show()\n\n# Calculate the median portion of cost paid by self by insurance coverage type\nmedian_portion_paid = df.groupby('InsuranceCoverage')['PortionOfCostPaidBySelf'].median()\nprint(\"Median portion of cost paid by self by insurance coverage type:\")\nfor index, row in median_portion_paid.items():\n    print(f\"{index}: {round(row, 2)}\")\n\n# Calculate the standard deviation of portion of cost paid by self by insurance coverage type\nstd_dev_portion_paid = df.groupby('InsuranceCoverage')['PortionOfCostPaidBySelf'].std()\nprint(\"Standard deviation of portion of cost paid by self by insurance coverage type:\")\nfor index, row in std_dev_portion_paid.items():\n    print(f\"{index}: {round(row, 2)}\")"
+    },
+    {
+        "database": "capgpbsh_vg_chinook",
+        "user_messages": "how many artists are there?",
+        "example_code": "\nimport pandas as pd\n\n# Query to count the number of artists\nquery = \"SELECT COUNT(ArtistId) FROM Artist\"\n\n# Use pd.read_sql to execute the query and get the result\nresult = pd.read_sql(query, conn)\n\n# Print the result\nprint(\"Number of artists:\", result.iloc[0, 0])"
+    },
+    {
+        "database": "capgpbsh_vg_chinook",
+        "user_messages": "how many artists are there?",
+        "example_code": "\nimport pandas as pd\n\n# Query to count the number of artists\nquery = \"SELECT COUNT(ArtistId) AS ArtistCount FROM Artist\"\n\n# Use pd.read_sql to execute the query and get the result\nresult = pd.read_sql(query, conn)\n\n# Print the result\nprint(\"Number of artists:\", result.iloc[0, 0])"
+    },
+    {
+        "database": "capgpbsh_vg_chinook",
+        "user_messages": "Rank the top selling albums by that famous irish band.",
+        "example_code": "\n# Import necessary libraries\nimport pandas as pd\n\n# Query to find the ArtistId of U2\nquery_artist_id = \"SELECT ArtistId FROM Artist WHERE Name = 'U2'\"\n\n# Use pd.read_sql to execute the query and get the ArtistId of U2\nartist_id = pd.read_sql(query_artist_id, conn).iloc[0, 0]\n\n# Query to find the top selling albums of U2\nquery_top_albums = \"\"\"\n    SELECT TOP 10 a.Title, SUM(il.Quantity) AS TotalSales\n    FROM InvoiceLine il\n    JOIN Track t ON il.TrackId = t.TrackId\n    JOIN Album a ON t.AlbumId = a.AlbumId\n    WHERE a.ArtistId = %s\n    GROUP BY a.Title\n    ORDER BY SUM(il.Quantity) DESC\n\"\"\" % artist_id\n\n# Use pd.read_sql to execute the query and get the top selling albums\ntop_albums = pd.read_sql(query_top_albums, conn)\n\n# Print the result\nprint(\"Top selling albums by U2:\")\ntop_albums"
+    },
+    {
+        "database": "capgpbsh_vg_chinook",
+        "user_messages": "Give me a bar plot of sales, by album, of U2; use plotly",
+        "example_code": "\nimport pandas as pd\nimport plotly.express as px\n\n# Query to find the ArtistId of U2\nquery_artist_id = \"SELECT ArtistId FROM Artist WHERE Name = 'U2'\"\nartist_id = pd.read_sql(query_artist_id, conn).iloc[0, 0]\n\n# Query to find the sales by album for U2\nquery_album_sales = \"\"\"\n    SELECT a.Title, SUM(il.Quantity) AS TotalSales\n    FROM Album a\n    JOIN Artist ar ON a.ArtistId = ar.ArtistId\n    JOIN Track t ON a.AlbumId = t.AlbumId\n    JOIN InvoiceLine il ON t.TrackId = il.TrackId\n    WHERE ar.ArtistId = %s\n    GROUP BY a.Title\n    ORDER BY SUM(il.Quantity) DESC\n\"\"\" % artist_id\n\n# Use pd.read_sql to execute the query and get the sales by album for U2\nalbum_sales = pd.read_sql(query_album_sales, conn)\n\n# Create a bar plot of sales by album using Plotly\nfig = px.bar(album_sales, x='Title', y='TotalSales', title=\"Album Sales by U2\")\nfig.update_layout(xaxis_title=\"Album Title\", yaxis_title=\"Total Sales\")\nfig.show()"
+    },
+    {
+        "database": "capgpbsh_vg_chinook",
+        "user_messages": "from the chinook db, can you plot the number of customers by country on a map, and color code it by number of customers (white for few, and green for more)?",
+        "example_code": "\nimport pandas as pd\nimport plotly.graph_objects as go\n\n# Query to find the number of customers by country\nquery_customer_country = \"\"\"\n    SELECT Country, COUNT(CustomerId) AS NumberOfCustomers\n    FROM Customer\n    GROUP BY Country\n\"\"\"\n\n# Use pd.read_sql to execute the query and get the number of customers by country\ncustomer_country = pd.read_sql(query_customer_country, conn)\n\n# Create a world map plot of customers by country using Plotly\nfig = go.Figure(data=go.Choropleth(\n    locations=customer_country['Country'], # Spatial coordinates\n    z = customer_country['NumberOfCustomers'].astype(float), # Data to color by\n    locationmode = 'country names', # Set of locations match entries in `locations`\n    autocolorscale=False,\n    colorscale='Greens',\n    colorbar_title = \"Number of Customers\"\n))\n\nfig.update_layout(\n    title_text='Number of Customers by Country',\n    geo_scope='world',\n)\n\nfig.show()"
+    },
+    {
+        "database": "capgpbsh_vg_chinook",
+        "user_messages": "from the chinook db, can you plot the number of customers by country on a map, and color code it by number of customers (white for few, and green for more)?",
+        "example_code": "\n# Import necessary libraries\nimport pandas as pd\nimport plotly.graph_objects as go\n\n# Query to find the number of customers by country\nquery_customer_country = \"\"\"\n    SELECT Country, COUNT(CustomerId) AS NumberOfCustomers\n    FROM Customer\n    GROUP BY Country\n\"\"\"\n\n# Use pd.read_sql to execute the query and get the number of customers by country\ncustomer_country = pd.read_sql(query_customer_country, conn)\n\n# Create a world map plot of customers by country using Plotly\nfig = go.Figure(data=go.Choropleth(\n    locations=customer_country['Country'], # Spatial coordinates\n    z = customer_country['NumberOfCustomers'].astype(float), # Data to color by\n    locationmode = 'country names', # Set of locations match entries in `locations`\n    autocolorscale=False,\n    colorscale='Greens', # Set colorscale to 'Greens' to color code by number of customers\n    reversescale=False, # Set reversescale to False to map low values to white and high values to green\n    marker_line_color='darkgray', # Set marker line color to darkgray for better visibility\n    marker_line_width=0.5, # Set marker line width to 0.5 for better visibility\n    colorbar_title = \"Number of Customers\"\n))\n\nfig.update_layout(\n    title_text='Number of Customers by Country',\n    geo_scope='world',\n)\n\n# Show the plot\nfig.show()"
+    },
+    {
+        "database": "capgpbsh_vg_chinook",
+        "user_messages": "Give me a word cloud for genre\n",
+        "example_code": "\nimport pandas as pd\nfrom wordcloud import WordCloud\nimport matplotlib.pyplot as plt\n\n# Query to get all genres\nquery = \"SELECT Name FROM Genre\"\n\n# Use pd.read_sql to execute the query and get the result\ngenres = pd.read_sql(query, conn)\n\n# Join all genres into a single string\ngenre_string = ' '.join(genres['Name'].tolist())\n\n# Create a word cloud\nwordcloud = WordCloud(width=800, height=400).generate(genre_string)\n\n# Display the word cloud\nplt.figure(figsize=(10, 5))\nplt.imshow(wordcloud, interpolation='bilinear')\nplt.axis('off')\nplt.show()"
+    },
+    {
+        "database": "capgpbsh_vg_chinook",
+        "user_messages": "what are the top 3 selling genre in chinook data?\n",
+        "example_code": "\n# Import necessary libraries\nimport pandas as pd\n\n# Query to find the top 3 selling genres\nquery_top_genres = \"\"\"\n    SELECT g.Name, SUM(il.Quantity) AS TotalSales\n    FROM Genre g\n    JOIN Track t ON g.GenreId = t.GenreId\n    JOIN InvoiceLine il ON t.TrackId = il.TrackId\n    GROUP BY g.Name\n    ORDER BY SUM(il.Quantity) DESC\n    LIMIT 3\n\"\"\"\n\n# Use pd.read_sql to execute the query and get the top selling genres\ntop_genres = pd.read_sql(query_top_genres, conn)\n\n# Print the result\nprint(\"Top 3 selling genres:\")\nprint(top_genres)"
+    },
+    {
+        "database": "capgpbsh_vg_chinook",
+        "user_messages": "Give me the top 10 selling artists.",
+        "example_code": "\nimport pandas as pd\n\n# Query to get the top 10 selling artists\nquery = \"\"\"\n    SELECT a.Name, SUM(il.Quantity) as TotalSold\n    FROM Artist a\n    JOIN Album al ON a.ArtistId = al.ArtistId\n    JOIN Track t ON al.AlbumId = t.AlbumId\n    JOIN InvoiceLine il ON t.TrackId = il.TrackId\n    GROUP BY a.Name\n    ORDER BY TotalSold DESC\n    LIMIT 10\n\"\"\"\n\n# Use pd.read_sql to execute the query and get the result\nresult = pd.read_sql(query, conn)\n\n# Print the result\nprint(result)"
+    },
+    {
+        "database": "capgpbsh_vg_chinook",
+        "user_messages": "rank the employees making the most sales",
+        "example_code": "\n# Import necessary libraries\nimport pandas as pd\n\n# Query to find the top selling employees\nquery_top_employees = \"\"\"\n    SELECT e.FirstName, e.LastName, SUM(i.Total) AS TotalSales\n    FROM Invoice i\n    JOIN Customer c ON i.CustomerId = c.CustomerId\n    JOIN Employee e ON c.SupportRepId = e.EmployeeId\n    GROUP BY e.FirstName, e.LastName\n    ORDER BY SUM(i.Total) DESC\n\"\"\"\n\n# Use pd.read_sql to execute the query and get the top selling employees\ntop_employees = pd.read_sql(query_top_employees, conn)\n\n# Print the result\nprint(\"Top selling employees:\")\nprint(top_employees)"
+    },
+    {
+        "database": "MEPS",
+        "user_messages": "give me a visual to understand the total healthcare spend, and the medical and pharmacy as proportions of total\n",
+        "example_code": "\nimport pandas as pd\nimport matplotlib.pyplot as plt\n\n# Query the database to get the total healthcare spend and pharmacy spend\nquery = \"\"\"\n    SELECT \n        TOTEXP19,\n        RXEXP19\n    FROM \n        h216_narrow\n    WHERE \n        TOTEXP19 IS NOT NULL AND RXEXP19 IS NOT NULL\n\"\"\"\n\n# Read the query results into a pandas DataFrame\ndf = pd.read_sql(query, conn)\n\n# Calculate the total healthcare spend and pharmacy spend\ntotal_healthcare_spend = df['TOTEXP19'].sum()\npharmacy_spend = df['RXEXP19'].sum()\n\n# Calculate the proportions of total healthcare spend\npharmacy_proportion = (pharmacy_spend / total_healthcare_spend) * 100\n\n# Calculate the medical spend as the difference between total healthcare spend and pharmacy spend\nmedical_spend = total_healthcare_spend - pharmacy_spend\nmedical_proportion = (medical_spend / total_healthcare_spend) * 100\n\n# Create a pie chart to visualize the proportions\nlabels = ['Medical', 'Pharmacy']\nsizes = [medical_proportion, pharmacy_proportion]\nplt.pie(sizes, labels=labels, autopct='%1.1f%%')\nplt.title('Proportion of Total Healthcare Spend')\nplt.show()\n\nprint(f\"Total healthcare spend: ${total_healthcare_spend:,.2f}\")\nprint(f\"Pharmacy spend as proportion of total healthcare spend: {pharmacy_proportion:.2f}%\")\nprint(f\"Medical spend as proportion of total healthcare spend: {medical_proportion:.2f}%\")"
+    },
+    {
+        "database": "MEPS",
+        "user_messages": "give me a visual to understand the total healthcare spend, and the medical and pharmacy as proportions of total\n",
+        "example_code": "\nimport pandas as pd\nimport matplotlib.pyplot as plt\n\nquery = \"\"\"\n    SELECT \n        SUM(TOTEXP19) AS total_healthcare_spend,\n        SUM(RXEXP19) AS pharmacy_spend\n    FROM \n        h216_narrow\n    WHERE \n        TOTEXP19 IS NOT NULL AND RXEXP19 IS NOT NULL\n\"\"\"\n\n# Read the query results into a pandas DataFrame\ndf = pd.read_sql(query, conn)\n\n# Since the DataFrame has only one row, we can access the values directly\ntotal_healthcare_spend = df['total_healthcare_spend'].iloc[0]\npharmacy_spend = df['pharmacy_spend'].iloc[0]\n\n# Calculate the proportions of total healthcare spend\npharmacy_proportion = (pharmacy_spend / total_healthcare_spend) * 100\n\n# Calculate the medical spend as the difference between total healthcare spend and pharmacy spend\nmedical_spend = total_healthcare_spend - pharmacy_spend\nmedical_proportion = (medical_spend / total_healthcare_spend) * 100\n\n# Create a pie chart to visualize the proportions\nlabels = ['Medical', 'Pharmacy']\nsizes = [medical_proportion, pharmacy_proportion]\nplt.pie(sizes, labels=labels, autopct='%1.1f%%')\nplt.title('Proportion of Total Healthcare Spend')\nplt.show()\n\nprint(f\"Total healthcare spend: ${total_healthcare_spend:,.2f}\")\nprint(f\"Pharmacy spend as proportion of total healthcare spend: {pharmacy_proportion:.2f}%\")\nprint(f\"Medical spend as proportion of total healthcare spend: {medical_proportion:.2f}%\")"
+    },
+    {
+        "database": "freedb_vg_demo",
+        "user_messages": "how many artists in data?",
+        "example_code": "\nimport pandas as pd\n\n# Query the Artist table to get the number of artists\nartist_count_df = pd.read_sql_query(\"SELECT COUNT(*) FROM Artist\", conn)\n\n# Get the count from the DataFrame\nartist_count = artist_count_df.iloc[0, 0]\n\nprint(\"Number of artists in the data:\", artist_count)"
+    },
+    {
+        "database": "freedb_vg_demo",
+        "user_messages": "what are the top 3 selling genre in chinook data?\n",
+        "example_code": "\nimport pandas as pd\n\n# Define the query to get the top 3 selling genres\nquery = \"\"\"\n    SELECT T.GenreId, G.Name, SUM(IL.Quantity) AS TotalSales\n    FROM Track T\n    JOIN Genre G ON T.GenreId = G.GenreId\n    JOIN InvoiceLine IL ON T.TrackId = IL.TrackId\n    GROUP BY T.GenreId, G.Name\n    ORDER BY TotalSales DESC\n    LIMIT 3\n\"\"\"\n\n# Use pd.read_sql_query to execute the query\ntop_selling_genres_df = pd.read_sql_query(query, conn)\n\n# Print the top 3 selling genres\nprint(top_selling_genres_df)"
+    },
+    {
+        "database": "freedb_vg_demo",
+        "user_messages": "Give me the top 10 selling artists.\n",
+        "example_code": "\nimport pandas as pd\n\n# Query the tables to get the top 10 selling artists\ntop_artists_df = pd.read_sql_query(\"\"\"\n    SELECT a.Name, SUM(il.UnitPrice * il.Quantity) as TotalSales\n    FROM InvoiceLine il\n    JOIN Track t ON il.TrackId = t.TrackId\n    JOIN Album alb ON t.AlbumId = alb.AlbumId\n    JOIN Artist a ON alb.ArtistId = a.ArtistId\n    GROUP BY a.Name\n    ORDER BY TotalSales DESC\n    LIMIT 10\n\"\"\", conn)\n\n# Print the top 10 selling artists\nprint(top_artists_df)"
+    },
+    {
+        "database": "freedb_vg_demo",
+        "user_messages": "Give me a bar plot of sales, by album, of U2\n",
+        "example_code": "\nimport pandas as pd\nimport plotly.express as px\n\n# Query the tables to get the sales by album for U2\nu2_album_sales_df = pd.read_sql_query(\"\"\"\n    SELECT alb.Title, SUM(il.UnitPrice * il.Quantity) as TotalSales\n    FROM InvoiceLine il\n    JOIN Track t ON il.TrackId = t.TrackId\n    JOIN Album alb ON t.AlbumId = alb.AlbumId\n    JOIN Artist a ON alb.ArtistId = a.ArtistId\n    WHERE a.Name = 'U2'\n    GROUP BY alb.Title\n    ORDER BY TotalSales DESC\n\"\"\", conn)\n\n# Create a bar plot of sales by album\nfig = px.bar(u2_album_sales_df, x='Title', y='TotalSales', title='Sales by Album for U2')\nfig.show()"
+    },
+    {
+        "database": "freedb_vg_demo",
+        "user_messages": "from the chinook db, can you plot the number of customers by country on a map, and color code it by number of customers (white for few, and green for more)?\n",
+        "example_code": "\nimport pandas as pd\nimport plotly.express as px\n\n# Define the query to get the number of customers by country\nquery = \"\"\"\n    SELECT Country, COUNT(*) AS NumberOfCustomers\n    FROM Customer\n    GROUP BY Country\n\"\"\"\n\n# Use pd.read_sql_query to execute the query\ncustomers_by_country_df = pd.read_sql_query(query, conn)\n\n# Plot the number of customers by country using Plotly Express\nfig = px.choropleth(customers_by_country_df, locations=\"Country\", \n                    color=\"NumberOfCustomers\", \n                    color_continuous_scale='Greens', \n                    locationmode='country names')\n\nfig.update_layout(title='Number of Customers by Country')\nfig.show()"
     }
 ]