Update data/data_desc.md

#3
Files changed (1) hide show
  1. data/data_desc.md +47 -79
data/data_desc.md CHANGED
@@ -1,79 +1,47 @@
1
- # May 2024 OEWS Estimates — Data Dictionary / Description
2
-
3
- **Dataset:** Occupational Employment and Wage Statistics (OEWS) Survey (May 2024 Estimates)
4
- **Publisher:** U.S. Bureau of Labor Statistics (BLS), Department of Labor
5
- **Website:** https://www.bls.gov/oes/
6
- **Contact:** oewsinfo@bls.gov
7
-
8
- > Not all fields are available for every type of estimate.
9
-
10
- ---
11
-
12
- ## Overview
13
-
14
- This dataset provides employment and wage estimates for occupations (SOC) by geography (U.S., state, territory, metropolitan and nonmetropolitan areas) and, for some releases, by industry (NAICS) and ownership. Estimates include employment counts, measures of sampling error (PRSE), and wage levels (mean and percentiles) on hourly and/or annual bases.
15
-
16
- ---
17
-
18
- ## Field Definitions
19
-
20
- | Field | Description |
21
- |---|---|
22
- | `area` | Geographic identifier: U.S. (99), state FIPS code, Metropolitan Statistical Area (MSA) code, or OEWS-specific nonmetropolitan area code. |
23
- | `area_title` | Area name. |
24
- | `area_type` | Area type: `1`=U.S.; `2`=State; `3`=U.S. Territory; `4`=Metropolitan Statistical Area (MSA); `6`=Nonmetropolitan Area. |
25
- | `prim_state` | Primary state for the given area. `"US"` is used for national estimates. |
26
- | `naics` | North American Industry Classification System (NAICS) code for the given industry. |
27
- | `naics_title` | NAICS title for the given industry. |
28
- | `i_group` | Industry level indicator: cross-industry or NAICS sector, 3-digit, 4-digit, 5-digit, or 6-digit industry. For industries no longer published at the 4-digit NAICS level, “4-digit” indicates the most detailed breakdown available (either a standard NAICS 3-digit industry or an OEWS-specific combination of 4-digit industries). Some industries aggregated to the 3-digit level (e.g., `327000`) may appear twice (once as “3-digit” and once as “4-digit”). |
29
- | `own_code` | Ownership type: `1`=Federal Government; `2`=State Government; `3`=Local Government; `123`=Federal, State, and Local Government; `235`=Private, State, and Local Government; `35`=Private and Local Government; `5`=Private; `57`=Private, Local Government Gambling Establishments (Sector 71), and Local Government Casino Hotels (Sector 72); `58`=Private plus State and Local Government Hospitals; `59`=Private and Postal Service; `1235`=Federal, State, and Local Government and Private Sector. |
30
- | `occ_code` | 6-digit Standard Occupational Classification (SOC) code or OEWS-specific occupation code. |
31
- | `occ_title` | SOC title or OEWS-specific occupation title. |
32
- | `o_group` | SOC occupation level (major/minor/broad/detailed) and all-occupations totals. For occupations no longer published at the SOC detailed level, “detailed” indicates the most detailed data available (either a standard SOC broad occupation or an OEWS-specific combination of detailed occupations). Some occupations aggregated to the SOC broad level may appear twice (once as “broad” and once as “detailed”). |
33
- | `tot_emp` | Estimated total employment, rounded to the nearest 10 (excludes self-employed). |
34
- | `emp_prse` | Percent relative standard error (PRSE) for the employment estimate (measure of sampling error as a percent of the estimate). Lower PRSE generally indicates higher precision. |
35
- | `jobs_1000` | Jobs per 1,000 in the given area for the occupation. **Only available for state and MSA estimates; otherwise blank.** |
36
- | `loc_quotient` | Location quotient: ratio of an occupation’s share of area employment to its share of U.S. employment. Example: 10% locally vs 2% nationally → LQ = 5. **Only available for state, metropolitan, and nonmetropolitan area estimates; otherwise blank.** |
37
- | `pct_total` | Percent of industry employment in the given occupation. Percents may not sum to 100 due to suppressed/unpublished occupations. **Only available for national industry estimates; otherwise blank.** |
38
- | `pct_rpt` | Percent of establishments reporting the occupation for the cell. **Only available for national industry estimates; otherwise blank.** |
39
- | `h_mean` | Mean hourly wage. |
40
- | `a_mean` | Mean annual wage. |
41
- | `mean_prse` | Percent relative standard error (PRSE) for the mean wage estimate (sampling error measure). Lower PRSE generally indicates higher precision. |
42
- | `h_pct10` | Hourly 10th percentile wage. |
43
- | `h_pct25` | Hourly 25th percentile wage. |
44
- | `h_median` | Hourly median wage (50th percentile). |
45
- | `h_pct75` | Hourly 75th percentile wage. |
46
- | `h_pct90` | Hourly 90th percentile wage. |
47
- | `a_pct10` | Annual 10th percentile wage. |
48
- | `a_pct25` | Annual 25th percentile wage. |
49
- | `a_median` | Annual median wage (50th percentile). |
50
- | `a_pct75` | Annual 75th percentile wage. |
51
- | `a_pct90` | Annual 90th percentile wage. |
52
- | `annual` | Contains `"TRUE"` if **only annual** wages are released (e.g., occupations typically working fewer than 2,080 hours/year but paid annually such as teachers, pilots, athletes). |
53
- | `hourly` | Contains `"TRUE"` if **only hourly** wages are released (e.g., occupations typically working fewer than 2,080 hours/year and paid hourly such as actors, dancers, musicians, singers). |
54
-
55
- ---
56
-
57
- ## Notes on Missing/Suppressed Values
58
-
59
- The dataset uses special symbols in some fields:
60
-
61
- - `*` = a wage estimate is not available
62
- - `**` = an employment estimate is not available
63
- - `#` = wage is ≥ **$115.00/hour** or ≥ **$239,200/year**
64
- - `~` = percent of establishments reporting the occupation is **< 0.5%**
65
-
66
- ---
67
-
68
- ## Practical Interpretation Tips
69
-
70
- - **Employment (`tot_emp`) is rounded** to the nearest 10; small differences may be rounding artifacts.
71
- - **PRSE fields (`emp_prse`, `mean_prse`)** indicate sampling uncertainty; use them to judge reliability.
72
- - **Hourly vs annual fields:** Some occupations will only have one wage basis published (see `annual` and `hourly` flags).
73
-
74
- ---
75
-
76
- ## Source Citation
77
-
78
- Bureau of Labor Statistics, U.S. Department of Labor. *Occupational Employment and Wage Statistics (OEWS), May 2024 Estimates.*
79
- https://www.bls.gov/oes/
 
1
+ # May 2024 OEWS Estimates — Data Dictionary / Description
2
+
3
+ **Dataset:** AirBNB Hugging Face
4
+ **Publisher:** Airbnb
5
+ **Website:** https://insideairbnb.com/get-the-data/
6
+
7
+
8
+ > Not all fields are available for every type of estimate.
9
+
10
+ ---
11
+
12
+ ## Overview
13
+
14
+ This dataset provides employment and wage estimates for occupations (SOC) by geography (U.S., state, territory, metropolitan and nonmetropolitan areas) and, for some releases, by industry (NAICS) and ownership. Estimates include employment counts, measures of sampling error (PRSE), and wage levels (mean and percentiles) on hourly and/or annual bases.
15
+
16
+ ---
17
+
18
+ ## Field Definitions
19
+
20
+ | Field | Description |
21
+ |---|---|
22
+ | id | Unique Airbnb listing ID | 36451 |
23
+ | name | Listing title | "Modern Downtown Apartment" |
24
+ | host_id | Unique host ID | 812345 |
25
+ | host_name | Host’s first name | "Sarah" |
26
+ | neighbourhood_group | Neighbourhood group (may be empty for Columbus) | NULL |
27
+ | neighbourhood | Neighbourhood name | "Short North" |
28
+ | latitude | Listing coordinates (latitude) | 39.9833 |
29
+ | longitude | Listing coordinates (longitude) | -82.9988 |
30
+ | room_type | Entire home/apt, Private room, Shared room, or Hotel room | "Entire home/apt" |
31
+ | price | Nightly price in USD (integer) | 150 |
32
+ | minimum_nights | Minimum nights required per booking | 2 |
33
+ | number_of_reviews | Total review count | 47 |
34
+ | last_review | Date of most recent review (YYYY-MM-DD; blank if no reviews) | 2025-10-12 |
35
+ | reviews_per_month | Average reviews per month (blank if no reviews) | 0.56 |
36
+ | calculated_host_listings_count | Number of listings the host has in Columbus | 4 |
37
+ | availability_365 | Days available in the next year (0 = fully booked or delisted) | 120 |
38
+ | number_of_reviews_ltm | Reviews in the last 12 months | 8 |
39
+ | license | License/registration number (if applicable) | "2024-STR-00123" |
40
+
41
+
42
+ ---
43
+
44
+
45
+ ## Source Citation
46
+
47
+ https://insideairbnb.com/get-the-data/