srmsoumya commited on
Commit
e98676b
·
1 Parent(s): 2486fea

chore: Run eval on v3 dataset

Browse files
IMPROVEMENTS.md DELETED
@@ -1,79 +0,0 @@
1
- # Gazet Improvement Notes
2
-
3
- Issues identified during testing. Each item is a candidate for the next training/template pass.
4
-
5
- ---
6
-
7
- ## 1. Missing "buffer-only" template
8
-
9
- **Query**: "10 km buffer around Odisha"
10
- **Expected**: Return the buffered geometry polygon itself.
11
- **Actual**: Model picks `buffer_01`, which finds all features intersecting the buffer (200 rows).
12
-
13
- **Root cause**: All buffer templates (`buffer_01` through `buffer_05`) perform an intersection join to find neighboring features. No template simply returns `ST_AsGeoJSON(ST_Buffer(...))`.
14
-
15
- **Fix**: Add a `buffer_06` template that returns the buffer polygon directly:
16
-
17
- ```sql
18
- SELECT ST_AsGeoJSON(ST_Buffer(geometry, {buffer_km} * 1000.0 / 111320.0)) AS geometry
19
- FROM read_parquet('divisions_area')
20
- WHERE id = '{anchor_id}'
21
- ```
22
-
23
- With hints like "10 km buffer around {anchor_name}", "draw a {buffer_km} km buffer around {anchor_name}". Consider a NE variant too.
24
-
25
- ---
26
-
27
- ## 2. Place extractor misses NE physical features (mixed-source queries)
28
-
29
- **Query**: "The part of Ecuador that is in the Amazon Basin"
30
- **Expected**: Place extractor returns both "Ecuador" and "Amazon Basin"; candidate search finds correct IDs for both.
31
- **Actual**: Only "Ecuador" extracted. SQL model uses memorized wrong NE ID (`ne_1159120655` = Cuando River) instead of the correct one (`ne_1159104325` = AMAZON BASIN).
32
-
33
- **Root cause**: The GGUF place extraction model was not trained to extract physical features. The runtime prompt (`_PLACES_SYSTEM_PROMPT`) has been updated but the finetuned model may ignore prompt changes. A re-finetune with NE feature examples is the definitive fix.
34
-
35
- **Affected templates**: `partial_05`, `diff_02` (mixed-source), and all NE-anchored templates (`intersect_03`, `contain_03/04`, `buffer_03/04/05`, `lookup_02`).
36
-
37
- ---
38
-
39
- ## 3. Missing NE-anchor to county intersection template
40
-
41
- **Query**: "Indravati River flows through which districts"
42
- **Expected**: `ST_Intersects` with `target_subtype='county'`
43
- **Actual**: Model sometimes uses `ST_Within` (wrong predicate) because `intersect_03` only targets `region`, not `county`.
44
-
45
- **Fix**: Add `intersect_05` (NE anchor -> county, `ST_Intersects`) with district-oriented question hints.
46
-
47
- ---
48
-
49
- ## 4. Model hallucinates NE subtype values
50
-
51
- **Query**: "which mountain ranges cross Odisha"
52
- **Expected**: `n.subtype IN ('range/mtn', 'peninsula', 'depression')` (from `adj_05`)
53
- **Actual**: Model generates `'Terrain area'` which does not exist in the data.
54
-
55
- **Fix**: More training examples for `adj_05`. Consider adding common hallucinated values to `_NE_SUBTYPE_FIXES` in `sql.py` as a runtime safety net.
56
-
57
- ---
58
-
59
- ## 5. NE subtype casing inconsistency between model output and data
60
-
61
- **Example**: Model generates `'River'`, `'Basin'`, `'Ocean'` but data has `'river'`, `'basin'`, `'ocean'`.
62
-
63
- **Current workaround**: `_normalize_ne_subtypes()` in `sql.py` does string replacement of known title-cased literals at query time (`_NE_SUBTYPE_FIXES` dict). This is brittle and only covers a hardcoded list.
64
-
65
- **Root cause**: The original Natural Earth data had title-cased `featurecla` values (e.g. `River`, `Basin`, `Ocean`). Training data was generated before the lowercase fix to `convert_natural_earth.py`, so the model learned to emit title-cased subtypes. The data is now lowercased but the model still outputs the old casing.
66
-
67
- **Fix**: Regenerate training data with the lowercased NE parquet so all subtype literals in SQL examples are lowercase. After re-finetune, the model will natively emit lowercase subtypes and the `_normalize_ne_subtypes` hack can be removed.
68
-
69
- ---
70
-
71
- ## 6. "Largest/smallest" queries always return at least 3 results
72
-
73
- **Query**: "the largest region in India", "smallest county in France"
74
- **Expected**: Return 1 result (the single largest/smallest).
75
- **Actual**: Model generates `LIMIT 3` by default, returning top 3 instead of 1.
76
-
77
- **Root cause**: The aggregation templates (`agg_01`, `agg_02`) use `LIMIT 3` as the default. The model learns this as a fixed pattern and applies it even when the query clearly asks for a single result ("the largest", "the smallest").
78
-
79
- **Fix**: During data generation, vary the LIMIT value based on the question hint phrasing. Use `LIMIT 1` for singular hints ("the largest X", "the smallest X") and `LIMIT 3` or `LIMIT 5` for plural hints ("the 3 largest", "top 5 smallest"). This teaches the model to infer the correct LIMIT from the query.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
finetune/eval_cli.py CHANGED
@@ -34,7 +34,7 @@ SERVER_URL = "http://localhost:9000"
34
  MAX_TOKENS = 2048
35
  TEMPERATURE = 0.6
36
 
37
- DEFAULT_RUN_DIR = Path("dataset/output/runs/v1")
38
 
39
 
40
  def postprocess_sql(text: str) -> str:
 
34
  MAX_TOKENS = 2048
35
  TEMPERATURE = 0.6
36
 
37
+ DEFAULT_RUN_DIR = Path("dataset/output/runs/v3")
38
 
39
 
40
  def postprocess_sql(text: str) -> str:
finetune/train_modal_qwen35.py CHANGED
@@ -101,9 +101,9 @@ class Qwen35Config:
101
  # Logging / saving
102
  logging_steps: int = 10
103
  save_strategy: str = "steps"
104
- save_steps: int = 1000
105
  eval_strategy: str = "steps"
106
- eval_steps: int = 200
107
  report_to: str = "trackio"
108
  trackio_space_id: Optional[str] = "srmsoumya/gazet-trackio"
109
  project: str = "gazet-nlg-qwen35"
 
101
  # Logging / saving
102
  logging_steps: int = 10
103
  save_strategy: str = "steps"
104
+ save_steps: int = 2000
105
  eval_strategy: str = "steps"
106
+ eval_steps: int = 500
107
  report_to: str = "trackio"
108
  trackio_space_id: Optional[str] = "srmsoumya/gazet-trackio"
109
  project: str = "gazet-nlg-qwen35"