fmegahed commited on
Commit
1ce42e5
·
verified ·
1 Parent(s): 66a7359

uploading the data file and custom instructions

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ data/oews.rds filter=lfs diff=lfs merge=lfs -text
data/data_desc.md ADDED
@@ -0,0 +1,79 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # May 2024 OEWS Estimates — Data Dictionary / Description
2
+
3
+ **Dataset:** Occupational Employment and Wage Statistics (OEWS) Survey (May 2024 Estimates)
4
+ **Publisher:** U.S. Bureau of Labor Statistics (BLS), Department of Labor
5
+ **Website:** https://www.bls.gov/oes/
6
+ **Contact:** oewsinfo@bls.gov
7
+
8
+ > Not all fields are available for every type of estimate.
9
+
10
+ ---
11
+
12
+ ## Overview
13
+
14
+ This dataset provides employment and wage estimates for occupations (SOC) by geography (U.S., state, territory, metropolitan and nonmetropolitan areas) and, for some releases, by industry (NAICS) and ownership. Estimates include employment counts, measures of sampling error (PRSE), and wage levels (mean and percentiles) on hourly and/or annual bases.
15
+
16
+ ---
17
+
18
+ ## Field Definitions
19
+
20
+ | Field | Description |
21
+ |---|---|
22
+ | `area` | Geographic identifier: U.S. (99), state FIPS code, Metropolitan Statistical Area (MSA) code, or OEWS-specific nonmetropolitan area code. |
23
+ | `area_title` | Area name. |
24
+ | `area_type` | Area type: `1`=U.S.; `2`=State; `3`=U.S. Territory; `4`=Metropolitan Statistical Area (MSA); `6`=Nonmetropolitan Area. |
25
+ | `prim_state` | Primary state for the given area. `"US"` is used for national estimates. |
26
+ | `naics` | North American Industry Classification System (NAICS) code for the given industry. |
27
+ | `naics_title` | NAICS title for the given industry. |
28
+ | `i_group` | Industry level indicator: cross-industry or NAICS sector, 3-digit, 4-digit, 5-digit, or 6-digit industry. For industries no longer published at the 4-digit NAICS level, “4-digit” indicates the most detailed breakdown available (either a standard NAICS 3-digit industry or an OEWS-specific combination of 4-digit industries). Some industries aggregated to the 3-digit level (e.g., `327000`) may appear twice (once as “3-digit” and once as “4-digit”). |
29
+ | `own_code` | Ownership type: `1`=Federal Government; `2`=State Government; `3`=Local Government; `123`=Federal, State, and Local Government; `235`=Private, State, and Local Government; `35`=Private and Local Government; `5`=Private; `57`=Private, Local Government Gambling Establishments (Sector 71), and Local Government Casino Hotels (Sector 72); `58`=Private plus State and Local Government Hospitals; `59`=Private and Postal Service; `1235`=Federal, State, and Local Government and Private Sector. |
30
+ | `occ_code` | 6-digit Standard Occupational Classification (SOC) code or OEWS-specific occupation code. |
31
+ | `occ_title` | SOC title or OEWS-specific occupation title. |
32
+ | `o_group` | SOC occupation level (major/minor/broad/detailed) and all-occupations totals. For occupations no longer published at the SOC detailed level, “detailed” indicates the most detailed data available (either a standard SOC broad occupation or an OEWS-specific combination of detailed occupations). Some occupations aggregated to the SOC broad level may appear twice (once as “broad” and once as “detailed”). |
33
+ | `tot_emp` | Estimated total employment, rounded to the nearest 10 (excludes self-employed). |
34
+ | `emp_prse` | Percent relative standard error (PRSE) for the employment estimate (measure of sampling error as a percent of the estimate). Lower PRSE generally indicates higher precision. |
35
+ | `jobs_1000` | Jobs per 1,000 in the given area for the occupation. **Only available for state and MSA estimates; otherwise blank.** |
36
+ | `loc_quotient` | Location quotient: ratio of an occupation’s share of area employment to its share of U.S. employment. Example: 10% locally vs 2% nationally → LQ = 5. **Only available for state, metropolitan, and nonmetropolitan area estimates; otherwise blank.** |
37
+ | `pct_total` | Percent of industry employment in the given occupation. Percents may not sum to 100 due to suppressed/unpublished occupations. **Only available for national industry estimates; otherwise blank.** |
38
+ | `pct_rpt` | Percent of establishments reporting the occupation for the cell. **Only available for national industry estimates; otherwise blank.** |
39
+ | `h_mean` | Mean hourly wage. |
40
+ | `a_mean` | Mean annual wage. |
41
+ | `mean_prse` | Percent relative standard error (PRSE) for the mean wage estimate (sampling error measure). Lower PRSE generally indicates higher precision. |
42
+ | `h_pct10` | Hourly 10th percentile wage. |
43
+ | `h_pct25` | Hourly 25th percentile wage. |
44
+ | `h_median` | Hourly median wage (50th percentile). |
45
+ | `h_pct75` | Hourly 75th percentile wage. |
46
+ | `h_pct90` | Hourly 90th percentile wage. |
47
+ | `a_pct10` | Annual 10th percentile wage. |
48
+ | `a_pct25` | Annual 25th percentile wage. |
49
+ | `a_median` | Annual median wage (50th percentile). |
50
+ | `a_pct75` | Annual 75th percentile wage. |
51
+ | `a_pct90` | Annual 90th percentile wage. |
52
+ | `annual` | Contains `"TRUE"` if **only annual** wages are released (e.g., occupations typically working fewer than 2,080 hours/year but paid annually such as teachers, pilots, athletes). |
53
+ | `hourly` | Contains `"TRUE"` if **only hourly** wages are released (e.g., occupations typically working fewer than 2,080 hours/year and paid hourly such as actors, dancers, musicians, singers). |
54
+
55
+ ---
56
+
57
+ ## Notes on Missing/Suppressed Values
58
+
59
+ The dataset uses special symbols in some fields:
60
+
61
+ - `*` = a wage estimate is not available
62
+ - `**` = an employment estimate is not available
63
+ - `#` = wage is ≥ **$115.00/hour** or ≥ **$239,200/year**
64
+ - `~` = percent of establishments reporting the occupation is **< 0.5%**
65
+
66
+ ---
67
+
68
+ ## Practical Interpretation Tips
69
+
70
+ - **Employment (`tot_emp`) is rounded** to the nearest 10; small differences may be rounding artifacts.
71
+ - **PRSE fields (`emp_prse`, `mean_prse`)** indicate sampling uncertainty; use them to judge reliability.
72
+ - **Hourly vs annual fields:** Some occupations will only have one wage basis published (see `annual` and `hourly` flags).
73
+
74
+ ---
75
+
76
+ ## Source Citation
77
+
78
+ Bureau of Labor Statistics, U.S. Department of Labor. *Occupational Employment and Wage Statistics (OEWS), May 2024 Estimates.*
79
+ https://www.bls.gov/oes/
data/extra_instructions.md ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Extra Instructions
2
+
3
+ ## Geographic filter (exact)
4
+ - Use `area = '99'` for national rows.
5
+ - For states, use either:
6
+ - `prim_state = 'CA'` (two-letter state code), **or**
7
+ - `area = 'xxxxx'` (OEWS area code).
8
+ - Always specify which state identifier you mean (`prim_state` vs `area`).
9
+
10
+ ## Occupation filter (exact)
11
+ - Use `occ_title = 'All Occupations'` or `occ_code = '123456'`.
12
+ - If you want the all-occupations row, **prefer** `occ_title = 'All Occupations'`.
13
+
14
+ ## Industry filter (exact, if needed)
15
+ - Use `naics_title = 'Cross-industry'` or `naics = 'XXXXX'`.
16
+ - Text matches are **case- and punctuation-sensitive**.
17
+
18
+ ## Aggregation level (to prevent duplicate rows)
19
+ - Specify the desired level using:
20
+ - `o_group = 'total'` (or another `o_group` value), and/or
21
+ - `i_group = 'cross-industry'`, `'3-digit'`, or `'sector'`.
22
+ - Use these intentionally to avoid duplicated or overlapping rows.
23
+
24
+ ## Ownership filter (optional)
25
+ - Use `own_code = '5'` for Private.
26
+ - Use `own_code = '123'` for all governments.
27
+ - Include `own_code` only when you want to restrict results by ownership.
28
+
29
+ ## Aggregation intent (very important)
30
+ State whether you want a single row, the raw matching rows, or an aggregate. You can copy/paste:
31
+
32
+ - **Calculate total:** `SUM(tot_emp)` across rows matching the filters
33
+ - **Show rows:** return the matching row(s) without summing
34
+
35
+ ## Suppressions and special symbols
36
+ Specify how to handle `*`, `**`, `#`, and `~`. Example options:
37
+ - Exclude rows where `tot_emp` is `'*'` or `'**'`
38
+ - Treat `'#'` as a numeric ceiling and include it
39
+ - Treat suppressed values as `NULL`
40
+
41
+ ## Rounding and deduplication
42
+ - `tot_emp` is rounded to the nearest 10.
43
+ - If you want to avoid double-counting across industry or occupation groupings, specify a dedupe approach (for example: “use `o_group = 'total'` to avoid double-counting”).
44
+
45
+ ## Verification step (recommended)
46
+ If any filter is ambiguous, request a preview first:
47
+ - “Show matching rows so I can confirm `naics_title`, `o_group`, and `occ_title` before aggregating.”
48
+
49
+ ## Action wording (so I take the right next step)
50
+ Start your request with:
51
+ - **Calculate** / **Compute** → run a query and return computed results
52
+ - **Show** / **Filter to** → update the dashboard view and return all columns
53
+
54
+ ---
55
+
56
+ # Paste-ready prompt templates
57
+
58
+ - **Calculate total U.S. employment**
59
+ `area = '99', occ_title = 'All Occupations', naics_title = 'Cross-industry', Calculate: SUM(tot_emp), exclude suppressed tot_emp rows`
60
+
61
+ - **Show rows to confirm labels**
62
+ `Show: area = '99' AND occ_title = 'All Occupations'`
63
+
64
+ - **Compare totals (cross-industry vs summed NAICS), with dedupe**
65
+ `Compute: area = '99', occ_title = 'All Occupations', SUM(tot_emp) for naics_title = 'Cross-industry' AND SUM(tot_emp) across all naics; use o_group = 'total' to deduplicate`
66
+
67
+ ---
68
+
69
+ # Default behavior
70
+ If you include the relevant filters, I will apply these rules automatically. If a request is ambiguous, I will first show the matching rows and ask you to confirm the labels before aggregating.
data/oews.rds ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3695c3b722fce3c92eb2258880082e56a0e51a6f9a1d30062e0c6026b9469b98
3
+ size 13752016