Update data/extra_instructions.md

#4
Files changed (1) hide show
  1. data/extra_instructions.md +72 -70
data/extra_instructions.md CHANGED
@@ -1,70 +1,72 @@
1
- # Extra Instructions
2
-
3
- ## Geographic filter (exact)
4
- - Use `area = '99'` for national rows.
5
- - For states, use either:
6
- - `prim_state = 'CA'` (two-letter state code), **or**
7
- - `area = 'xxxxx'` (OEWS area code).
8
- - Always specify which state identifier you mean (`prim_state` vs `area`).
9
-
10
- ## Occupation filter (exact)
11
- - Use `occ_title = 'All Occupations'` or `occ_code = '123456'`.
12
- - If you want the all-occupations row, **prefer** `occ_title = 'All Occupations'`.
13
-
14
- ## Industry filter (exact, if needed)
15
- - Use `naics_title = 'Cross-industry'` or `naics = 'XXXXX'`.
16
- - Text matches are **case- and punctuation-sensitive**.
17
-
18
- ## Aggregation level (to prevent duplicate rows)
19
- - Specify the desired level using:
20
- - `o_group = 'total'` (or another `o_group` value), and/or
21
- - `i_group = 'cross-industry'`, `'3-digit'`, or `'sector'`.
22
- - Use these intentionally to avoid duplicated or overlapping rows.
23
-
24
- ## Ownership filter (optional)
25
- - Use `own_code = '5'` for Private.
26
- - Use `own_code = '123'` for all governments.
27
- - Include `own_code` only when you want to restrict results by ownership.
28
-
29
- ## Aggregation intent (very important)
30
- State whether you want a single row, the raw matching rows, or an aggregate. You can copy/paste:
31
-
32
- - **Calculate total:** `SUM(tot_emp)` across rows matching the filters
33
- - **Show rows:** return the matching row(s) without summing
34
-
35
- ## Suppressions and special symbols
36
- Specify how to handle `*`, `**`, `#`, and `~`. Example options:
37
- - Exclude rows where `tot_emp` is `'*'` or `'**'`
38
- - Treat `'#'` as a numeric ceiling and include it
39
- - Treat suppressed values as `NULL`
40
-
41
- ## Rounding and deduplication
42
- - `tot_emp` is rounded to the nearest 10.
43
- - If you want to avoid double-counting across industry or occupation groupings, specify a dedupe approach (for example: “use `o_group = 'total'` to avoid double-counting”).
44
-
45
- ## Verification step (recommended)
46
- If any filter is ambiguous, request a preview first:
47
- - “Show matching rows so I can confirm `naics_title`, `o_group`, and `occ_title` before aggregating.”
48
-
49
- ## Action wording (so I take the right next step)
50
- Start your request with:
51
- - **Calculate** / **Compute** → run a query and return computed results
52
- - **Show** / **Filter to** → update the dashboard view and return all columns
53
-
54
- ---
55
-
56
- # Paste-ready prompt templates
57
-
58
- - **Calculate total U.S. employment**
59
- `area = '99', occ_title = 'All Occupations', naics_title = 'Cross-industry', Calculate: SUM(tot_emp), exclude suppressed tot_emp rows`
60
-
61
- - **Show rows to confirm labels**
62
- `Show: area = '99' AND occ_title = 'All Occupations'`
63
-
64
- - **Compare totals (cross-industry vs summed NAICS), with dedupe**
65
- `Compute: area = '99', occ_title = 'All Occupations', SUM(tot_emp) for naics_title = 'Cross-industry' AND SUM(tot_emp) across all naics; use o_group = 'total' to deduplicate`
66
-
67
- ---
68
-
69
- # Default behavior
70
- If you include the relevant filters, I will apply these rules automatically. If a request is ambiguous, I will first show the matching rows and ask you to confirm the labels before aggregating.
 
 
 
1
+ # Extra Instructions
2
+
3
+ ## Room type filter (exact)
4
+ - Use `room_type = 'Entire home/apt'`
5
+ - Use `room_type = 'Private room'`
6
+ - Use `room_type = 'Shared room'`
7
+ - Use `room_type = 'Hotel room'`
8
+ - Text matches are **case- and punctuation-sensitive**.
9
+ - Always use the exact string as shown in the dataset.
10
+
11
+ ## Neighbourhood filter (exact)
12
+ - Use `neighbourhood = 'Short North'` (example).
13
+ - Neighbourhood names are **case-sensitive**.
14
+ - Always match the spelling exactly as stored in the dataset.
15
+ - Do not assume automatic capitalization or fuzzy matching.
16
+
17
+ ## Price queries (nightly rate)
18
+ - `price` represents the nightly rental rate in USD.
19
+ - Price is stored as a whole-dollar integer.
20
+ - If asking for:
21
+ - “Under $X” use `price < X`
22
+ - “At most $X” use `price <= X`
23
+ - Weekly or monthly calculations must be explicitly requested.
24
+ - Weekly = `price * 7`
25
+ - Monthly (approximate) = `price * 30`
26
+
27
+ ## Handling listings with zero reviews
28
+ - If `number_of_reviews = 0`:
29
+ - `last_review` is blank.
30
+ - `reviews_per_month` is NULL.
31
+ - Do not assume NULL `reviews_per_month` equals 0 unless explicitly stated.
32
+ - When calculating averages of `reviews_per_month`, exclude NULL values unless instructed otherwise.
33
+
34
+ ## Availability interpretation
35
+ - `availability_365` represents the number of available days in the next 365 days.
36
+ - `availability_365 = 0` may indicate:
37
+ - Fully booked
38
+ - Host-blocked calendar
39
+ - Inactive listing
40
+ - Do not automatically interpret 0 as permanently delisted unless specified.
41
+
42
+ ## Host analysis (multi-listing hosts)
43
+ - Use `calculated_host_listings_count` to identify hosts with multiple listings.
44
+ - Multi-listing host condition:
45
+ - `calculated_host_listings_count > 1`
46
+ - When ranking hosts by inventory, sort by `calculated_host_listings_count` (descending).
47
+
48
+ ## Aggregation intent (very important)
49
+ State whether you want raw rows or an aggregate.
50
+
51
+ You can copy/paste:
52
+
53
+ - **Calculate total listings:** `COUNT(id)`
54
+ - **Calculate average price:** `AVG(price)`
55
+ - **Show rows:** return matching listings without aggregation
56
+
57
+ ## Rounding and numeric handling
58
+ - `price` is already stored as whole dollars.
59
+ - When computing averages, results may return decimals.
60
+ - Do not round unless explicitly requested.
61
+
62
+ ## Verification step (recommended)
63
+ If a filter may match multiple neighbourhood spellings or room types:
64
+ - First show matching rows to confirm exact labels before aggregating.
65
+
66
+ Example:
67
+ - “Show listings where neighbourhood = 'Short North' so I can confirm label before calculating.”
68
+
69
+ ## Action wording (so I take the right next step)
70
+ Start your request with:
71
+ - **Calculate** / **Compute** → run aggregation and return computed results
72
+ - **Show** / **Filter to** → return matching listing rows