rifatSDAS commited on
Commit
2171c22
·
1 Parent(s): ae175a2

Initial commit: Geospatial AI Query System

Browse files
Files changed (11) hide show
  1. .gitignore +66 -0
  2. LICENSE.md +11 -0
  3. README.md +138 -1
  4. USER_GUIDE.md +399 -0
  5. app.py +574 -0
  6. config.py +179 -0
  7. data_utils.py +209 -0
  8. requirements.txt +11 -0
  9. setup.bat +89 -0
  10. setup.sh +87 -0
  11. test_app.py +173 -0
.gitignore ADDED
@@ -0,0 +1,66 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Python
2
+ __pycache__/
3
+ *.py[cod]
4
+ *$py.class
5
+ *.so
6
+ .Python
7
+ build/
8
+ develop-eggs/
9
+ dist/
10
+ downloads/
11
+ eggs/
12
+ .eggs/
13
+ lib/
14
+ lib64/
15
+ parts/
16
+ sdist/
17
+ var/
18
+ wheels/
19
+ *.egg-info/
20
+ .installed.cfg
21
+ *.egg
22
+
23
+ # Virtual Environment
24
+ venv/
25
+ .venv/
26
+ virtualenv/
27
+ .env/
28
+ env/
29
+ ENV/
30
+
31
+
32
+ # IDE
33
+ .vscode/
34
+ .idea/
35
+ *.swp
36
+ *.swo
37
+ *~
38
+
39
+ # Environment variables
40
+ .env
41
+ .env.local
42
+
43
+ # Jupyter Notebook
44
+ .ipynb_checkpoints
45
+
46
+ # Data files
47
+ #*.csv
48
+ #*.geojson
49
+ #*.shp
50
+ #*.shx
51
+ #*.dbf
52
+ #*.prj
53
+
54
+ # OS
55
+ .DS_Store
56
+ Thumbs.db
57
+
58
+ # Gradio
59
+ flagged/
60
+
61
+ # Documents
62
+ DEPLOYMENT.md
63
+ LOCAL_TESTING_GUIDE.md
64
+ PROJECT_SUMMARY.md
65
+ QUICKSTART.md
66
+ TESTING_CHECKLIST.md
LICENSE.md ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # The MIT License (MIT)
2
+
3
+ ---
4
+
5
+ Copyright (c) 2026 rifatSDAS
6
+
7
+ Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
8
+
9
+ The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
10
+
11
+ THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
README.md CHANGED
@@ -1,14 +1,151 @@
1
  ---
2
  title: Geospatial Ai Query
3
- emoji: 📊
4
  colorFrom: purple
5
  colorTo: gray
6
  sdk: gradio
7
  sdk_version: 6.3.0
8
  app_file: app.py
9
  pinned: false
 
10
  license: mit
11
  short_description: "Query geospatial data with natural language\_interface"
12
  ---
13
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
1
  ---
2
  title: Geospatial Ai Query
3
+ emoji: 🌍📊
4
  colorFrom: purple
5
  colorTo: gray
6
  sdk: gradio
7
  sdk_version: 6.3.0
8
  app_file: app.py
9
  pinned: false
10
+ version: 1.0.0
11
  license: mit
12
  short_description: "Query geospatial data with natural language\_interface"
13
  ---
14
 
15
+ # 🌍 Geospatial AI Query System
16
+
17
+ An intelligent natural language interface for querying and visualizing global geographic data including socioeconomic and environmental information at scales, i.e., countries, continents, and specific regions.
18
+
19
+ ## Features
20
+
21
+ ### 🤖 Natural Language Processing
22
+ - Ask questions in plain English about countries, regions, and global indicators
23
+ - LLM-powered query parsing using Mistral-7B-Instruct-v0.3
24
+ - Automatic extraction of locations, indicators, and visualization preferences
25
+
26
+ ### 📊 Multi-Modal Visualization
27
+ - **Interactive Maps**: Choropleth maps with country-level data
28
+ - **Dynamic Charts**: Bar charts, scatter plots, and trend visualizations using Plotly
29
+ - **Data Tables**: Formatted tables with key socioeconomic indicators
30
+
31
+ ### 🌐 Comprehensive Data Coverage
32
+ - **Demographic**: Population, population density, urban/rural distribution
33
+ - **Economic**: GDP, GDP per capita, trade indicators
34
+ - **Geographic**: Country boundaries, areas, continents, regional groups
35
+ - **Environmental**: CO2 emissions, renewable energy usage, forest area (sample data)
36
+ - **Derived Metrics**: Population density, GDP growth rates
37
+ - **Social**: Development indices, education, health metrics
38
+
39
+ ## Example Queries
40
+
41
+ ```
42
+ "Show me population of Asian countries"
43
+ "Compare GDP of European nations"
44
+ "What's the population density in Africa?"
45
+ "Display economic indicators for South American countries"
46
+ "Show me top 10 countries by GDP"
47
+ "Compare population vs GDP for BRICS nations"
48
+ ```
49
+
50
+ ## How It Works
51
+
52
+ 1. **Query Input**: User enters natural language query
53
+ 2. **LLM Parsing**: Mistral-7B-Instruct-v0.3 extracts structured information (locations, indicators, visualization type)
54
+ 3. **Data Fetching**: GeoPandas retrieves and processes geospatial data
55
+ 4. **Visualization**: Results rendered as interactive maps, charts, or tables
56
+ 5. **Multi-format Output**: View results in your preferred format
57
+
58
+ ## Technology Stack
59
+
60
+ - **Frontend**: Gradio for web interface
61
+ - **LLM**: Hugging Face Inference API (Mistral-7B-Instruct-v0.3)
62
+ - **Geospatial**: GeoPandas, Folium
63
+ - **Visualization**: Plotly Express
64
+ - **Data**: Natural Earth, World Bank Open Data
65
+
66
+ ## Data Sources
67
+
68
+ - **Natural Earth**: Country boundaries and geographic data
69
+ - **World Bank**: Economic and demographic indicators
70
+ - **Derived Metrics**: Population density, GDP per capita
71
+
72
+ ## Local Development
73
+
74
+ ```bash
75
+ # Clone repository
76
+ git clone https://huggingface.co/spaces/rifatSDAS/geospatial-ai-query
77
+ cd geospatial-ai-query
78
+
79
+ # Install dependencies
80
+ pip install -r requirements.txt
81
+
82
+ # Set HuggingFace token (optional, for LLM features)
83
+ export HF_TOKEN=your_token_here
84
+
85
+ # Run application
86
+ python app.py
87
+ ```
88
+
89
+ ## Deployment on Hugging Face Spaces
90
+
91
+ 1. Create new Space on Hugging Face
92
+ 2. Select Gradio SDK
93
+ 3. Upload `app.py` and `requirements.txt`
94
+ 4. Add `HF_TOKEN` in Space settings (Settings > Repository secrets)
95
+ 5. Space will automatically build and deploy
96
+
97
+ ## Configuration
98
+
99
+ ### Environment Variables
100
+ - `HF_TOKEN`: Hugging Face API token for LLM inference (optional)
101
+
102
+ ## Use Cases
103
+
104
+ ### Education
105
+ - Interactive geography - demography, economy, and socioeconomic lessons
106
+ - Data visualization for research projects
107
+ - Understanding global trends and patterns
108
+
109
+ ### Business Intelligence
110
+ - Market analysis by region
111
+ - Demographic research for expansion planning
112
+ - Competitive geographic and landscape analysis
113
+
114
+ ### Research
115
+ - Geographic - demographic, economy, and socioeconomic data exploration
116
+ - Regional to global scale analysis
117
+ - Trend identification and data visualization and extracttion
118
+
119
+ ## About the Developer
120
+
121
+ Built by Dr. Kazi Rifat Ahmed, a **Full Stack Geospatial AI Engineer** specializing in:
122
+ - AI/ML-DL for geospatial applications
123
+ - Cloud-native geospatial software engineering & architecture
124
+ - Large-scale Satellite/Earth Observation data data analysis, processing, analytics, and visualization
125
+ - Blockchain and Quantum Computing for geospatial applications
126
+ - Research Advanced Geospatial Science, Technology, and Applications
127
+ - Co-founder and Technical Lead for Satellite Data Services business in Space sector, i.e., QuentuED (https://quentued.de) and Sensor Aktor (https://sensor-aktor.de)
128
+
129
+ ### Tech Stack Proficiency
130
+ Python | Java | JavaScript | TypeScript | C/C++ | Bash | Cloud-Native Architecture (kubernetes) | DevOps | AI/ML/DL | MLOps | LLM Integration | Blockchain | Remote Sensing Science & Technology | Geospatial Data Science & Engineering
131
+
132
+ ### Research Interests
133
+ Geospatial AI | Satellite Data Engineering | Drone Sensors | Geospatial Big Data Analytics | Earth Observation Systems & Sensors | Advanced Remote Sensing Techniques | Space Technology | Quantum Computing | Blockchain | | Satellite Data Services | Planetary Science & Exploration
134
+
135
+ ## License
136
+
137
+ This project is licensed under the MIT License - see the [LICENSE.md](LICENSE) file for details.
138
+
139
+ ## Contributing
140
+
141
+ Contributions welcome! Please feel free to submit issues and pull requests.
142
+
143
+ ## Contact
144
+
145
+ For collaboration opportunities in satellite data services & applications, large-scale satellite data analytics, geospatial AI, blockchain & quantum computing for geospatial applications, or advanced geospatial science, technology & applications, feel free to reach out!
146
+
147
+ ---
148
+
149
+ **Tags**: #geospatial #geospatial-ai #AI #ML #DL #LLM #satellite-data #earth-observation #blockchain #quantum-computing #data-visualization #natural-language #gradio #huggingface
150
+
151
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
USER_GUIDE.md ADDED
@@ -0,0 +1,399 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # User Guide: Geospatial AI Query System
2
+
3
+ ## Table of Contents
4
+ 1. [Getting Started](#getting-started)
5
+ 2. [Query Examples](#query-examples)
6
+ 3. [Understanding Results](#understanding-results)
7
+ 4. [Advanced Features](#advanced-features)
8
+ 5. [Tips & Best Practices](#tips--best-practices)
9
+ 6. [Troubleshooting](#troubleshooting)
10
+
11
+ ## Getting Started
12
+
13
+ ### What Can You Query?
14
+
15
+ This system allows you to explore global data about:
16
+
17
+ **Geographic Coverage:**
18
+ - Individual countries (e.g., "United States", "China", "Germany")
19
+ - Continents (Asia, Europe, Africa, North America, South America, Oceania)
20
+ - Country groups (BRICS, G7, ASEAN, EU, GCC)
21
+
22
+ **Data Categories:**
23
+ - **Demographics**: Population, population density, urban/rural distribution
24
+ - **Economy**: GDP, GDP per capita, growth rates, unemployment
25
+ - **Environment**: CO2 emissions, renewable energy, forest coverage
26
+ - **Geography**: Land area, borders, geographic features
27
+
28
+ ### How to Use
29
+
30
+ 1. **Enter Your Query**: Type your question in natural language
31
+ 2. **Select Output Format**: Choose All, Map, Chart, or Table
32
+ 3. **Select Advanced Visualization Options**:
33
+ - For Maps: Choose base map style and color scheme, legend color
34
+ - For Charts: Select chart type (bar, scatter, line, bubble, etc.), color theme
35
+ 4. **Click Analyze**: The AI will process your query
36
+ 5. **Explore Results**: View interactive visualizations and data tables, download if needed
37
+
38
+ ## Query Examples
39
+
40
+ ### Basic Queries
41
+
42
+ #### Population Queries
43
+ ```
44
+ "Show me population of Asian countries"
45
+ "What is the population of Brazil?"
46
+ "Compare population density in Europe vs Africa"
47
+ "Which countries have the largest population?"
48
+ ```
49
+
50
+ #### Economic Queries
51
+ ```
52
+ "Show me GDP of top 10 economies"
53
+ "Compare GDP per capita in Scandinavian countries"
54
+ "What's the economic situation in South America?"
55
+ "Display GDP growth rates for G7 countries"
56
+ ```
57
+
58
+ #### Environmental Queries
59
+ ```
60
+ "Show CO2 emissions in major economies"
61
+ "Which countries have the most renewable energy?"
62
+ "Compare forest coverage in tropical countries"
63
+ "Environmental indicators for BRICS nations"
64
+ ```
65
+
66
+ ### Advanced Queries
67
+
68
+ #### Multi-Indicator Comparisons
69
+ ```
70
+ "Compare GDP and population for European countries"
71
+ "Show relationship between GDP and CO2 emissions"
72
+ "Analyze economic and environmental indicators for Asia"
73
+ ```
74
+
75
+ #### Regional Analysis
76
+ ```
77
+ "Show all indicators for Middle Eastern countries"
78
+ "Compare ASEAN nations across all metrics"
79
+ "Regional breakdown of population density"
80
+ ```
81
+
82
+ #### Specific Country Groups
83
+ ```
84
+ "Display data for BRICS countries"
85
+ "Compare G7 with emerging markets"
86
+ "Show EU member states economic indicators"
87
+ "Gulf Cooperation Council statistics"
88
+ ```
89
+
90
+ ### Query Patterns
91
+
92
+ | Pattern | Example | Result |
93
+ | ----------------------------------- | -------------------------------- | ----------------- |
94
+ | "Show me [indicator] of [location]" | "Show me GDP of Asian countries" | Bar chart + table |
95
+ | "Compare [locations]" | "Compare Europe vs Asia" | Comparison chart |
96
+ | "What is [indicator] in [location]" | "What is population in Africa?" | Specific value |
97
+ | "Display data for [group]" | "Display data for BRICS" | All indicators |
98
+ | "Top N countries by [indicator]" | "Top 10 countries by GDP" | Ranked list |
99
+
100
+ ## Understanding Results
101
+
102
+ ### Map View 🗺️
103
+
104
+ **Features:**
105
+ - Color-coded choropleth maps
106
+ - Interactive zoom and pan
107
+ - Click on countries for details
108
+ - Layer controls for different indicators
109
+ - Download as an interactive map with HTML format to open with any browser
110
+
111
+ **How to Read:**
112
+ - Darker colors = Higher values
113
+ - Gray areas = No data available
114
+ - Blue markers = Country centers
115
+ - Popup windows = Detailed statistics
116
+
117
+ ### Chart View 📊
118
+
119
+ **Chart Types:**
120
+
121
+ 1. **Bar Charts** (Vertical): Best for comparing values across countries
122
+ - Sorted by value (highest to lowest)
123
+ - Color-coded by continent
124
+ - Shows top 20 countries
125
+
126
+ 2. **Horizontal Bar Charts**: Best for long country names
127
+ - Easier to read labels
128
+ - Sorted by value
129
+ - Color-coded by continent
130
+
131
+ 3. **Scatter Plots**: Best for analyzing relationships
132
+ - Each point = one country
133
+ - Size = population
134
+ - Color = continent
135
+ - Reveals correlations between indicators
136
+
137
+ 4. **Pie Charts**: Best for showing proportions
138
+ - Shows distribution of values
139
+ - Displays top countries
140
+ - Percentage of total shown
141
+
142
+ 5. **Treemap**: Best for hierarchical data visualization
143
+ - Rectangle size = indicator value
144
+ - Grouped by continent
145
+ - Color intensity shows magnitude
146
+
147
+ 6. **Bubble Charts**: Best for multi-dimensional analysis
148
+ - X-axis = country position
149
+ - Bubble size = indicator value
150
+ - Color = continent
151
+ - Great for spotting outliers
152
+
153
+ **Interactive Features:**
154
+ - Hover for details
155
+ - Zoom and pan
156
+ - Download as image
157
+
158
+ ### Table View 📋
159
+
160
+ **Columns:**
161
+ - Country name
162
+ - Continent
163
+ - Population
164
+ - GDP
165
+ - Population density
166
+ - GDP per capita
167
+
168
+ **Features:**
169
+ - Sortable columns
170
+ - Formatted numbers
171
+ - Up to 50 countries displayed
172
+ - Export as CSV
173
+
174
+ ## Advanced Features
175
+
176
+ ### Country Groups
177
+
178
+ The system recognizes these special groups:
179
+
180
+ **BRICS**: Brazil, Russia, India, China, South Africa
181
+ ```
182
+ "Show me BRICS economic indicators"
183
+ ```
184
+
185
+ **G7**: USA, Japan, Germany, UK, France, Italy, Canada
186
+ ```
187
+ "Compare G7 countries"
188
+ ```
189
+
190
+ **ASEAN**: 10 Southeast Asian nations
191
+ ```
192
+ "ASEAN population statistics"
193
+ ```
194
+
195
+ **EU**: 27 European Union member states
196
+ ```
197
+ "EU environmental data"
198
+ ```
199
+
200
+ **GCC**: 6 Gulf Cooperation Council countries
201
+ ```
202
+ "GCC GDP comparison"
203
+ ```
204
+
205
+ ### Multi-Modal Analysis
206
+
207
+ **Simultaneous Views:**
208
+ Select "All" to see:
209
+ - Map (geographic distribution)
210
+ - Chart (comparative visualization)
211
+ - Table (detailed numbers)
212
+
213
+ **Use Cases:**
214
+ - Comprehensive analysis
215
+ - Interactive data visualizations
216
+ - Research, news reporting, and educational purposes
217
+
218
+ ### Data Enrichment
219
+
220
+ The system automatically calculates:
221
+ - **Population Density**: Population per km²
222
+ - **GDP per Capita**: GDP divided by population
223
+ - **Regional Aggregates**: Continental totals and averages
224
+
225
+ ## Tips & Best Practices
226
+
227
+ ### Writing Effective Queries
228
+
229
+ **✅ DO:**
230
+ - Be specific: "Show GDP of European countries"
231
+ - Use natural language: "What's the population of China?"
232
+ - Specify what you want: "Compare Africa and China GDP"
233
+ - Use recognized names: "BRICS", "G7", "Asian countries"
234
+
235
+ **❌ DON'T:**
236
+ - Be too vague: "Show me data"
237
+ - Use ambiguous terms: "Show stuff about countries"
238
+ - Expect real-time data (data may be from recent years)
239
+ - Query non-existent indicators
240
+
241
+ ### Choosing Output Format
242
+
243
+ | Format | Best For | When to Use |
244
+ | --------- | ---------------------- | --------------------------------------- |
245
+ | **All** | Comprehensive analysis | First-time queries, presentations |
246
+ | **Map** | Geographic patterns | Spatial distribution, regional analysis |
247
+ | **Chart** | Comparisons | Rankings, trends, relationships |
248
+ | **Table** | Specific numbers | Detailed data, exports, reports |
249
+
250
+ ### Interpreting Results
251
+
252
+ **For Rankings:**
253
+ - Use bar charts
254
+ - Sort by indicator value
255
+ - Focus on top/bottom performers
256
+
257
+ **For Comparisons:**
258
+ - Use scatter plots
259
+ - Look for clusters and outliers
260
+ - Analyze relationships
261
+
262
+ **For Geographic Patterns:**
263
+ - Use maps
264
+ - Observe regional groupings
265
+ - Identify spatial trends
266
+
267
+ ### Query Optimization
268
+
269
+ **Fast Queries:**
270
+ ```
271
+ "GDP of G7" # Specific group
272
+ "Population of Africa" # Single country
273
+ ```
274
+
275
+ **Slower Queries:**
276
+ ```
277
+ "All data for all countries" # Too broad
278
+ "Compare 50 countries across 20 indicators" # Too complex
279
+ ```
280
+
281
+ ## Troubleshooting
282
+
283
+ ### Common Issues
284
+
285
+ #### "No data found"
286
+
287
+ **Possible Causes:**
288
+ - Misspelled country name
289
+ - Unrecognized location
290
+ - No data available for indicator
291
+
292
+ **Solutions:**
293
+ - Check spelling (use common English names)
294
+ - Try continent instead: "Asian countries" vs "countries in Asia"
295
+ - Use recognized groups: BRICS, G7, EU
296
+
297
+ #### "Error processing query"
298
+
299
+ **Possible Causes:**
300
+ - Query too complex
301
+ - Server timeout
302
+ - Invalid syntax
303
+
304
+ **Solutions:**
305
+ - Simplify query
306
+ - Break into smaller queries
307
+ - Use example queries as templates
308
+
309
+ #### Unexpected Results
310
+
311
+ **Possible Causes:**
312
+ - Query interpreted differently
313
+ - Multiple countries with similar names
314
+ - Ambiguous indicator names
315
+
316
+ **Solutions:**
317
+ - Be more specific
318
+ - Use full country names
319
+ - Specify exact indicators
320
+
321
+ ### Data Limitations
322
+
323
+ **What's Available:**
324
+ - Country-level data (not city/region level)
325
+ - Recent years (data may lag 1-2 years)
326
+ - Major indicators (population, GDP, environment)
327
+ - 177 countries from Natural Earth dataset
328
+
329
+ **What's NOT Available:**
330
+ - Real-time/live data
331
+ - Sub-national data (cities, provinces)
332
+ - Historical time series (full implementation pending)
333
+ - Highly specific indicators
334
+
335
+ ### Performance Tips
336
+
337
+ **For Faster Results:**
338
+ 1. Start with specific queries
339
+ 2. Use recognized country groups
340
+ 3. Limit to 1-2 indicators
341
+ 4. Choose single output format
342
+
343
+ **For Better Quality:**
344
+ 1. Use precise country names
345
+ 2. Specify exact indicators
346
+ 3. Be explicit about comparisons
347
+ 4. Include context in query
348
+
349
+ ## Example Workflows
350
+
351
+ ### Research Workflow
352
+
353
+ 1. **Explore**: "Show me data for ASEAN countries"
354
+ 2. **Analyze**: "Compare GDP growth in ASEAN"
355
+ 3. **Deep Dive**: "What's the GDP per capita in Vietnam?"
356
+ 4. **Compare**: "Compare Vietnam with Thailand"
357
+
358
+ ### Presentation Workflow
359
+
360
+ 1. **Overview**: Select "All" format
361
+ 2. **Geographic**: Focus on map view
362
+ 3. **Rankings**: Use chart view
363
+ 4. **Details**: Reference table view
364
+
365
+ ### Educational Workflow
366
+
367
+ 1. **Context**: "Show me African countries"
368
+ 2. **Compare**: "Compare population in Africa vs Europe"
369
+ 3. **Analyze**: "Why does Africa have lower GDP per capita?"
370
+ 4. **Discuss**: Use visualizations to support discussion
371
+
372
+ ## Getting Help
373
+
374
+ ### Resources
375
+
376
+ - **Examples**: Click example queries in the app
377
+ - **Feedback**: Use 👍 👎 buttons to rate results
378
+ - **Issues**: Report bugs via GitHub issues
379
+ - **Discussions**: Join Hugging Face Space discussions
380
+
381
+ ### Support Channels
382
+
383
+ - **Community Forum**: Ask questions in Space discussions
384
+ - **Documentation**: Check README.md and DEPLOYMENT.md
385
+ - **Updates**: Follow Space for new features
386
+
387
+ ### Feature Requests
388
+
389
+ Have ideas? I'd love to hear them!
390
+ - Add comment in Space discussions
391
+ - Open feature request on GitHub
392
+ - Share your use case
393
+
394
+ ---
395
+
396
+ **Last Updated**: January 2026
397
+ **Version**: 1.0.0
398
+
399
+ **Happy Exploring Geospatial Data! 🌍**
app.py ADDED
@@ -0,0 +1,574 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ import pandas as pd
3
+ import geopandas as gpd
4
+ import folium
5
+ from folium import plugins
6
+ import plotly.express as px
7
+ import plotly.graph_objects as go
8
+ from huggingface_hub import InferenceClient
9
+ import json
10
+ import os
11
+ import tempfile
12
+ import io
13
+ from datetime import datetime
14
+ import numpy as np
15
+ from pathlib import Path
16
+ import warnings
17
+
18
+ # Suppress GeoPandas CRS warnings (area/centroid calculations are approximate for demo purposes)
19
+ warnings.filterwarnings('ignore', message='.*Geometry is in a geographic CRS.*')
20
+
21
+ import branca.colormap as cm
22
+
23
+
24
+ def format_number(num):
25
+ """Format large numbers with K/M/B/T suffixes for better readability."""
26
+ if num is None or (isinstance(num, float) and np.isnan(num)):
27
+ return 'N/A'
28
+ abs_num = abs(num)
29
+ if abs_num >= 1e12:
30
+ return f'{num/1e12:.1f}T'
31
+ elif abs_num >= 1e9:
32
+ return f'{num/1e9:.1f}B'
33
+ elif abs_num >= 1e6:
34
+ return f'{num/1e6:.1f}M'
35
+ elif abs_num >= 1e3:
36
+ return f'{num/1e3:.1f}K'
37
+ else:
38
+ return f'{num:.1f}'
39
+
40
+
41
+ # Path to local Natural Earth data (geopandas.datasets was deprecated in GeoPandas 1.0)
42
+ DATA_DIR = Path(__file__).parent / "data" / "ne_110m_admin_0_countries"
43
+ NATURAL_EARTH_SHP = DATA_DIR / "ne_110m_admin_0_countries.shp"
44
+
45
+ # Initialize HF Inference Client
46
+ client = InferenceClient(token=os.environ.get("HF_TOKEN"))
47
+
48
+ # ===== UI/UX Enhancement Constants =====
49
+ MAP_STYLES = {
50
+ "Light": "CartoDB positron",
51
+ "Dark": "CartoDB dark_matter",
52
+ "Street": "OpenStreetMap",
53
+ "Satellite": "Esri.WorldImagery"
54
+ }
55
+
56
+ COLOR_SCHEMES = {
57
+ "Default": px.colors.qualitative.Plotly,
58
+ "Vivid": px.colors.qualitative.Vivid,
59
+ "Pastel": px.colors.qualitative.Pastel,
60
+ "Bold": px.colors.qualitative.Bold,
61
+ "Earth": px.colors.qualitative.Safe
62
+ }
63
+
64
+ CHOROPLETH_COLORS = {
65
+ "Yellow-Orange-Red": "YlOrRd",
66
+ "Yellow-Green-Blue": "YlGnBu",
67
+ "Purple-Red": "PuRd",
68
+ "Blue-Purple": "BuPu",
69
+ "Greens": "Greens",
70
+ "Blues": "Blues",
71
+ "Oranges": "OrRd",
72
+ "Spectral": "Spectral"
73
+ }
74
+
75
+ INDICATORS = {
76
+ "Population": "pop_est",
77
+ "GDP (Million $)": "gdp_md_est",
78
+ "Population Density": "pop_density",
79
+ "GDP per Capita": "gdp_per_capita"
80
+ }
81
+
82
+ # Global cache for world data
83
+ _world_data_cache = None
84
+
85
+ def load_world_data():
86
+ """Load world countries geospatial data"""
87
+ global _world_data_cache
88
+ if _world_data_cache is None:
89
+ raw = gpd.read_file(NATURAL_EARTH_SHP)
90
+ # Select only the columns we need (using original uppercase names)
91
+ # and rename them to match expected lowercase names
92
+ _world_data_cache = raw[['NAME', 'CONTINENT', 'POP_EST', 'GDP_MD', 'geometry']].copy()
93
+ _world_data_cache.columns = ['name', 'continent', 'pop_est', 'gdp_md_est', 'geometry']
94
+ return _world_data_cache
95
+
96
+ def parse_query_with_llm(user_query):
97
+ """
98
+ Use LLM to parse natural language query into structured format
99
+ """
100
+ system_prompt = """You are a geospatial and geographic data query parser. Extract structured information from user queries.
101
+
102
+ Response format (JSON only):
103
+ {
104
+ "locations": ["country/region names"],
105
+ "indicators": ["GDP", "population", "CO2 emissions", etc.],
106
+ "time_range": {"start": "YYYY", "end": "YYYY"},
107
+ "visualization": "map/chart/table",
108
+ "aggregation": "sum/average/comparison",
109
+ "query_type": "single_country/multi_country/regional/global"
110
+ }
111
+
112
+ Examples:
113
+ - "Show me GDP of Asian countries" → locations: Asia, indicators: GDP, visualization: chart
114
+ - "Compare population density in Europe vs Africa" → locations: [Europe, Africa], indicators: population density
115
+ - "Environmental data for Brazil over last decade" → locations: [Brazil], indicators: environmental
116
+
117
+ Return ONLY valid JSON, no explanations."""
118
+
119
+ messages = [
120
+ {"role": "system", "content": system_prompt},
121
+ {"role": "user", "content": f"Parse this query: {user_query}"}
122
+ ]
123
+
124
+ try:
125
+ response = client.chat_completion(
126
+ messages=messages,
127
+ model="mistralai/Mistral-7B-Instruct-v0.3",
128
+ max_tokens=500,
129
+ temperature=0.1
130
+ )
131
+
132
+ parsed = json.loads(response.choices[0].message.content)
133
+ return parsed
134
+ except Exception as e:
135
+ print(f"LLM parsing error: {e}")
136
+ return {
137
+ "locations": [],
138
+ "indicators": ["population", "gdp_md_est"],
139
+ "visualization": "table",
140
+ "query_type": "global"
141
+ }
142
+
143
+ def fetch_geospatial_data(parsed_query):
144
+ """
145
+ Fetch and process geospatial data based on parsed query
146
+ """
147
+ world = load_world_data()
148
+
149
+ # Filter by locations
150
+ locations = parsed_query.get("locations", [])
151
+ if locations and locations[0].lower() != "global":
152
+ # Filter by continent or country
153
+ mask = world['continent'].isin(locations) | world['name'].isin(locations)
154
+ filtered_data = world[mask]
155
+ else:
156
+ filtered_data = world
157
+
158
+ # Add computed indicators
159
+ filtered_data['pop_density'] = filtered_data['pop_est'] / filtered_data['geometry'].area * 1000000
160
+ filtered_data['gdp_per_capita'] = filtered_data['gdp_md_est'] / filtered_data['pop_est'] * 1000000
161
+
162
+ return filtered_data
163
+
164
+ def create_interactive_map(gdf, indicator='pop_est', map_style='Light', color_scale='Yellow-Orange-Red'):
165
+ """
166
+ Create an interactive Folium map with customizable style and colors
167
+ """
168
+ # Calculate center
169
+ center_lat = gdf.geometry.centroid.y.mean()
170
+ center_lon = gdf.geometry.centroid.x.mean()
171
+
172
+ # Get tile style
173
+ tiles = MAP_STYLES.get(map_style, 'CartoDB positron')
174
+
175
+ # Create map
176
+ m = folium.Map(
177
+ location=[center_lat, center_lon],
178
+ zoom_start=2,
179
+ tiles=tiles
180
+ )
181
+
182
+ # Get color scheme
183
+ fill_color = CHOROPLETH_COLORS.get(color_scale, 'YlOrRd')
184
+
185
+ # Get min/max for the indicator, excluding NaN values
186
+ valid_values = gdf[indicator].dropna()
187
+ if valid_values.empty:
188
+ vmin, vmax = 0, 1 # Default range if no valid data
189
+ else:
190
+ vmin = float(valid_values.min())
191
+ vmax = float(valid_values.max())
192
+
193
+ # Ensure vmin and vmax are valid numbers
194
+ if np.isnan(vmin) or np.isnan(vmax) or np.isinf(vmin) or np.isinf(vmax):
195
+ vmin, vmax = 0, 1
196
+
197
+ # Ensure vmax > vmin to avoid division issues
198
+ if vmax <= vmin:
199
+ vmax = vmin + 1
200
+
201
+ # Map color scheme names to branca colormap
202
+ color_map_dict = {
203
+ 'YlOrRd': cm.linear.YlOrRd_09,
204
+ 'YlGnBu': cm.linear.YlGnBu_09,
205
+ 'Blues': cm.linear.Blues_09,
206
+ 'Greens': cm.linear.Greens_09,
207
+ 'Reds': cm.linear.Reds_09,
208
+ 'PuRd': cm.linear.PuRd_09,
209
+ 'OrRd': cm.linear.OrRd_09
210
+ }
211
+
212
+ # Create colormap with properly formatted tick labels
213
+ base = color_map_dict.get(fill_color, cm.linear.YlOrRd_09)
214
+
215
+ # Generate tick positions
216
+ n_ticks = 6
217
+ tick_values = list(np.linspace(vmin, vmax, n_ticks))
218
+
219
+ # Sample colors from base colormap at tick positions
220
+ if vmax > vmin:
221
+ hex_colors = [base.rgb_hex_str((v - vmin) / (vmax - vmin)) for v in tick_values]
222
+ else:
223
+ hex_colors = [base.rgb_hex_str(0.5)] * n_ticks
224
+
225
+ # Create colormap (tick_labels expects floats, so we use index values directly)
226
+ colormap = cm.LinearColormap(
227
+ colors=hex_colors,
228
+ index=tick_values,
229
+ vmin=vmin,
230
+ vmax=vmax,
231
+ caption=f"{indicator.replace('_', ' ').title()} ({format_number(vmin)} - {format_number(vmax)})"
232
+ )
233
+
234
+ # Add choropleth without its auto-legend
235
+ choropleth = folium.Choropleth(
236
+ geo_data=gdf,
237
+ data=gdf,
238
+ columns=['name', indicator],
239
+ key_on='feature.properties.name',
240
+ fill_color=fill_color,
241
+ fill_opacity=0.7,
242
+ line_opacity=0.2,
243
+ legend_name=None, # Disable auto legend
244
+ nan_fill_color='lightgray'
245
+ )
246
+ choropleth.add_to(m)
247
+
248
+ # Remove any auto-generated colormap from choropleth
249
+ for key in list(choropleth._children.keys()):
250
+ if key.startswith('color_map'):
251
+ del choropleth._children[key]
252
+ break
253
+
254
+ # Add our custom colormap with formatted labels (default Folium style)
255
+ colormap.add_to(m)
256
+
257
+ # Add tooltips
258
+ for idx, row in gdf.iterrows():
259
+ folium.Marker(
260
+ location=[row.geometry.centroid.y, row.geometry.centroid.x],
261
+ popup=f"""
262
+ <b>{row['name']}</b><br>
263
+ Population: {row['pop_est']:,.0f}<br>
264
+ GDP: ${row['gdp_md_est']:,.0f}M<br>
265
+ Continent: {row['continent']}
266
+ """,
267
+ icon=folium.Icon(icon='info-sign', color='blue')
268
+ ).add_to(m)
269
+
270
+ # Add layer control
271
+ folium.LayerControl().add_to(m)
272
+
273
+ return m
274
+
275
+ def create_chart(df, indicators, chart_type='bar', color_scheme='Default', top_n=20):
276
+ """
277
+ Create interactive Plotly charts with customizable options
278
+ """
279
+ # Get color sequence
280
+ colors = COLOR_SCHEMES.get(color_scheme, px.colors.qualitative.Plotly)
281
+
282
+ # Sort and limit data
283
+ sorted_df = df.sort_values(indicators[0], ascending=False).head(top_n)
284
+
285
+ if chart_type == 'bar':
286
+ fig = px.bar(
287
+ sorted_df,
288
+ x='name',
289
+ y=indicators[0],
290
+ color='continent',
291
+ title=f'Top {top_n} Countries by {indicators[0].replace("_", " ").title()}',
292
+ labels={'name': 'Country', indicators[0]: indicators[0].replace('_', ' ').title()},
293
+ color_discrete_sequence=colors,
294
+ height=500
295
+ )
296
+ elif chart_type == 'horizontal_bar':
297
+ fig = px.bar(
298
+ sorted_df,
299
+ y='name',
300
+ x=indicators[0],
301
+ color='continent',
302
+ title=f'Top {top_n} Countries by {indicators[0].replace("_", " ").title()}',
303
+ labels={'name': 'Country', indicators[0]: indicators[0].replace('_', ' ').title()},
304
+ color_discrete_sequence=colors,
305
+ orientation='h',
306
+ height=600
307
+ )
308
+ fig.update_layout(yaxis={'categoryorder': 'total ascending'})
309
+ elif chart_type == 'scatter':
310
+ fig = px.scatter(
311
+ df,
312
+ x=indicators[0] if len(indicators) > 0 else 'gdp_md_est',
313
+ y=indicators[1] if len(indicators) > 1 else 'pop_est',
314
+ size='pop_est',
315
+ color='continent',
316
+ hover_name='name',
317
+ title='Country Comparison',
318
+ labels={
319
+ indicators[0]: indicators[0].replace('_', ' ').title() if len(indicators) > 0 else 'GDP',
320
+ indicators[1]: indicators[1].replace('_', ' ').title() if len(indicators) > 1 else 'Population'
321
+ },
322
+ color_discrete_sequence=colors,
323
+ height=500
324
+ )
325
+ elif chart_type == 'pie':
326
+ fig = px.pie(
327
+ sorted_df,
328
+ values=indicators[0],
329
+ names='name',
330
+ title=f'Top {top_n} Countries by {indicators[0].replace("_", " ").title()}',
331
+ color_discrete_sequence=colors,
332
+ height=500
333
+ )
334
+ fig.update_traces(textposition='inside', textinfo='percent+label')
335
+ elif chart_type == 'treemap':
336
+ fig = px.treemap(
337
+ sorted_df,
338
+ path=['continent', 'name'],
339
+ values=indicators[0],
340
+ title=f'Top {top_n} Countries by {indicators[0].replace("_", " ").title()}',
341
+ color='continent',
342
+ color_discrete_sequence=colors,
343
+ height=600
344
+ )
345
+ elif chart_type == 'bubble':
346
+ fig = px.scatter(
347
+ df,
348
+ x='gdp_md_est',
349
+ y='pop_est',
350
+ size=indicators[0],
351
+ color='continent',
352
+ hover_name='name',
353
+ title=f'Bubble Chart: Size = {indicators[0].replace("_", " ").title()}',
354
+ labels={'gdp_md_est': 'GDP (Million $)', 'pop_est': 'Population'},
355
+ color_discrete_sequence=colors,
356
+ size_max=60,
357
+ height=500
358
+ )
359
+ else: # default bar
360
+ fig = px.bar(
361
+ sorted_df,
362
+ x='name',
363
+ y=indicators[0],
364
+ color='continent',
365
+ title=f'Top {top_n} Countries by {indicators[0].replace("_", " ").title()}',
366
+ color_discrete_sequence=colors,
367
+ height=500
368
+ )
369
+
370
+ fig.update_layout(
371
+ xaxis_tickangle=-45,
372
+ template='plotly_white'
373
+ )
374
+
375
+ return fig
376
+
377
+ def create_data_table(df):
378
+ """
379
+ Create formatted data table
380
+ """
381
+ # Select relevant columns
382
+ display_cols = ['name', 'continent', 'pop_est', 'gdp_md_est', 'pop_density', 'gdp_per_capita']
383
+ table_df = df[display_cols].copy()
384
+
385
+ # Rename columns
386
+ table_df.columns = ['Country', 'Continent', 'Population', 'GDP (Million $)',
387
+ 'Pop. Density (per km²)', 'GDP per Capita ($)']
388
+
389
+ # Format numbers
390
+ table_df['Population'] = table_df['Population'].apply(lambda x: f'{x:,.0f}')
391
+ table_df['GDP (Million $)'] = table_df['GDP (Million $)'].apply(lambda x: f'${x:,.0f}')
392
+ table_df['Pop. Density (per km²)'] = table_df['Pop. Density (per km²)'].apply(lambda x: f'{x:.2f}')
393
+ table_df['GDP per Capita ($)'] = table_df['GDP per Capita ($)'].apply(lambda x: f'${x:,.2f}')
394
+
395
+ return table_df.sort_values('Population', ascending=False).head(50)
396
+
397
+ def process_query(user_query, output_format, chart_type, map_style, color_scheme, choropleth_color, top_n, indicator):
398
+ """
399
+ Main processing function with advanced options
400
+ """
401
+ try:
402
+ # Parse query with LLM
403
+ parsed = parse_query_with_llm(user_query)
404
+
405
+ # Fetch data
406
+ gdf = fetch_geospatial_data(parsed)
407
+
408
+ if gdf.empty:
409
+ return None, None, None, "No data found for your query. Try different locations or indicators.", None, None
410
+
411
+ # Use selected indicator (override LLM parsing if user selected one)
412
+ selected_indicator = INDICATORS.get(indicator, 'pop_est')
413
+ mapped_indicators = [selected_indicator]
414
+
415
+ # Generate outputs based on format
416
+ map_html = None
417
+ chart_fig = None
418
+ table_df = None
419
+ map_file = None
420
+ csv_file = None
421
+
422
+ summary = f"🔍 **Query:** {user_query}\n\n"
423
+ summary += f"📍 **Locations:** {', '.join(parsed.get('locations', ['Global']))}\n"
424
+ summary += f"📊 **Indicator:** {indicator}\n"
425
+ summary += f"🌍 **Countries found:** {len(gdf)}\n\n"
426
+ summary += f"⚙️ **Options:** Chart: {chart_type} | Map: {map_style} | Top N: {top_n}"
427
+
428
+ if output_format in ['All', 'Map']:
429
+ m = create_interactive_map(gdf, mapped_indicators[0], map_style, choropleth_color)
430
+ map_html = m._repr_html_()
431
+ # Save map to temp file for download
432
+ map_file = tempfile.NamedTemporaryFile(delete=False, suffix='.html', mode='w', encoding='utf-8')
433
+ m.save(map_file.name)
434
+ map_file = map_file.name
435
+
436
+ if output_format in ['All', 'Chart']:
437
+ chart_fig = create_chart(gdf, mapped_indicators, chart_type, color_scheme, int(top_n))
438
+
439
+ if output_format in ['All', 'Table']:
440
+ table_df = create_data_table(gdf)
441
+ # Save table to temp CSV file for download
442
+ csv_file = tempfile.NamedTemporaryFile(delete=False, suffix='.csv', mode='w', encoding='utf-8')
443
+ table_df.to_csv(csv_file.name, index=False)
444
+ csv_file = csv_file.name
445
+
446
+ return map_html, chart_fig, table_df, summary, map_file, csv_file
447
+
448
+ except Exception as e:
449
+ error_msg = f"Error processing query: {str(e)}\n\nPlease try rephrasing your query."
450
+ return None, None, None, error_msg, None, None
451
+
452
+ # Gradio Interface
453
+ def create_interface():
454
+ with gr.Blocks(title="Geospatial AI Query System") as demo:
455
+ gr.Markdown("""
456
+ # 🌍 Geospatial AI Query System
457
+ ### Natural Language Interface for Geographic Data
458
+
459
+ Ask questions about countries, regions, and global indicators using natural language!
460
+
461
+ **Example Queries:**
462
+ - "Show me population of Asian countries"
463
+ - "Compare GDP of European nations"
464
+ - "What's the population density in Africa?"
465
+ - "Display economic indicators for South American countries"
466
+ """)
467
+
468
+ with gr.Row():
469
+ with gr.Column(scale=3):
470
+ query_input = gr.Textbox(
471
+ label="Your Query",
472
+ placeholder="E.g., Show me GDP and population of BRICS countries",
473
+ lines=2
474
+ )
475
+ with gr.Column(scale=1):
476
+ output_format = gr.Radio(
477
+ choices=['All', 'Map', 'Chart', 'Table'],
478
+ value='All',
479
+ label="Output Format"
480
+ )
481
+
482
+ # Advanced Options in Accordion
483
+ with gr.Accordion("⚙️ Advanced Options", open=False):
484
+ with gr.Row():
485
+ chart_type = gr.Dropdown(
486
+ choices=['bar', 'horizontal_bar', 'scatter', 'pie', 'treemap', 'bubble'],
487
+ value='bar',
488
+ label="📊 Chart Type"
489
+ )
490
+ map_style = gr.Dropdown(
491
+ choices=list(MAP_STYLES.keys()),
492
+ value='Light',
493
+ label="🗺️ Map Style"
494
+ )
495
+ with gr.Row():
496
+ color_scheme = gr.Dropdown(
497
+ choices=list(COLOR_SCHEMES.keys()),
498
+ value='Default',
499
+ label="🎨 Chart Colors"
500
+ )
501
+ choropleth_color = gr.Dropdown(
502
+ choices=list(CHOROPLETH_COLORS.keys()),
503
+ value='Yellow-Orange-Red',
504
+ label="🌈 Map Colors"
505
+ )
506
+ with gr.Row():
507
+ top_n = gr.Slider(
508
+ minimum=5,
509
+ maximum=50,
510
+ value=20,
511
+ step=5,
512
+ label="🔢 Top N Countries"
513
+ )
514
+ indicator = gr.Dropdown(
515
+ choices=list(INDICATORS.keys()),
516
+ value="Population",
517
+ label="📈 Indicator"
518
+ )
519
+
520
+ submit_btn = gr.Button("🔍 Analyze", variant="primary", size="lg")
521
+
522
+ gr.Markdown("### Results")
523
+
524
+ summary_output = gr.Textbox(label="Query Summary", lines=4)
525
+
526
+ with gr.Tabs():
527
+ with gr.Tab("📊 Chart"):
528
+ chart_output = gr.Plot(label="Interactive Chart")
529
+
530
+ with gr.Tab("🗺️ Map"):
531
+ map_output = gr.HTML(label="Interactive Map")
532
+ map_download = gr.File(label="📥 Download Map (HTML)", visible=True)
533
+
534
+ with gr.Tab("📋 Table"):
535
+ table_output = gr.Dataframe(label="Data Table")
536
+ csv_download = gr.File(label="📥 Download Table (CSV)", visible=True)
537
+
538
+ # Examples
539
+ gr.Examples(
540
+ examples=[
541
+ ["Show me population of Asian countries", "All"],
542
+ ["Compare GDP of top 10 economies", "Chart"],
543
+ ["What's the population density in European countries?", "Map"],
544
+ ["Display data for African nations", "Table"],
545
+ ["Show me South American countries' economic indicators", "All"]
546
+ ],
547
+ inputs=[query_input, output_format]
548
+ )
549
+
550
+ # Event handler
551
+ submit_btn.click(
552
+ fn=process_query,
553
+ inputs=[query_input, output_format, chart_type, map_style, color_scheme, choropleth_color, top_n, indicator],
554
+ outputs=[map_output, chart_output, table_output, summary_output, map_download, csv_download]
555
+ )
556
+
557
+ gr.Markdown("""
558
+ ---
559
+ **About:** This app uses LLMs to parse natural language queries and visualize global geospatial data.
560
+
561
+ **Data Sources:** Natural Earth, World Bank Open Data
562
+
563
+ **Built by:** [rifatSDAS](https://github.com/rifatSDAS)
564
+ """)
565
+
566
+ return demo
567
+
568
+ if __name__ == "__main__":
569
+ demo = create_interface()
570
+ # Enable queue for better concurrency handling on HF Spaces
571
+ demo.queue(default_concurrency_limit=10)
572
+ demo.launch(theme=gr.themes.Soft())
573
+ # To enable Progressive Web App (PWA) features, uncomment the line below
574
+ # demo.launch(theme=gr.themes.Soft(), pwa=True)
config.py ADDED
@@ -0,0 +1,179 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Configuration file for Geospatial AI Query System
2
+
3
+ # LLM Configuration
4
+ LLM_MODEL = "mistralai/Mistral-7B-Instruct-v0.3"
5
+ LLM_MAX_TOKENS = 500
6
+ LLM_TEMPERATURE = 0.1
7
+
8
+ # Gradio Configuration
9
+ APP_TITLE = "🌍 Geospatial AI Query System"
10
+ APP_DESCRIPTION = """
11
+ ### Natural Language Interface for Global Socioeconomic Data
12
+
13
+ Ask questions about countries, regions, and global indicators using natural language!
14
+ """
15
+
16
+ # Server Configuration
17
+ SERVER_NAME = "0.0.0.0" # Listen on all interfaces
18
+ SERVER_PORT = 7860
19
+ ENABLE_QUEUE = True
20
+ QUEUE_CONCURRENCY = 5
21
+
22
+ # Visualization Configuration
23
+ MAP_DEFAULT_ZOOM = 2
24
+ MAP_TILE_STYLE = "CartoDB dark_matter" # Options: OpenStreetMap, CartoDB positron, CartoDB dark_matter
25
+ CHART_HEIGHT = 500
26
+ CHART_THEME = "plotly_dark" # Options: plotly, plotly_white, plotly_dark, ggplot2, seaborn
27
+ MAX_COUNTRIES_IN_TABLE = 50
28
+ MAX_COUNTRIES_IN_CHART = 20
29
+
30
+ # Data Configuration
31
+ USE_CACHE = True
32
+ CACHE_SIZE = 128 # Number of queries to cache
33
+ DEFAULT_INDICATOR = "pop_est"
34
+
35
+ # Feature Flags
36
+ ENABLE_ADVANCED_STATS = True
37
+ ENABLE_DATA_EXPORT = False # Future feature
38
+ ENABLE_TIME_SERIES = False # Future feature
39
+
40
+ # Example Queries (shown in UI)
41
+ EXAMPLE_QUERIES = [
42
+ ("Show me population of Asian countries", "All"),
43
+ ("Compare GDP of top 10 economies", "Chart"),
44
+ ("What's the population density in European countries?", "Map"),
45
+ ("Display data for African nations", "Table"),
46
+ ("Show me South American countries' economic indicators", "All")
47
+ ]
48
+
49
+ # Country Group Definitions
50
+ COUNTRY_GROUPS = {
51
+ 'brics': ['Brazil', 'Russia', 'India', 'China', 'South Africa'],
52
+ 'g7': ['United States of America', 'Japan', 'Germany', 'United Kingdom',
53
+ 'France', 'Italy', 'Canada'],
54
+ 'g20': ['Argentina', 'Australia', 'Brazil', 'Canada', 'China', 'France',
55
+ 'Germany', 'India', 'Indonesia', 'Italy', 'Japan', 'South Korea',
56
+ 'Mexico', 'Russia', 'Saudi Arabia', 'South Africa', 'Turkey',
57
+ 'United Kingdom', 'United States of America'],
58
+ 'asean': ['Indonesia', 'Thailand', 'Philippines', 'Vietnam', 'Myanmar',
59
+ 'Malaysia', 'Singapore', 'Cambodia', 'Laos', 'Brunei'],
60
+ 'gcc': ['Saudi Arabia', 'United Arab Emirates', 'Kuwait', 'Qatar',
61
+ 'Bahrain', 'Oman'],
62
+ 'eu': ['Germany', 'France', 'Italy', 'Spain', 'Poland', 'Romania',
63
+ 'Netherlands', 'Belgium', 'Greece', 'Portugal', 'Czech Republic',
64
+ 'Hungary', 'Sweden', 'Austria', 'Bulgaria', 'Denmark', 'Finland',
65
+ 'Slovakia', 'Ireland', 'Croatia', 'Lithuania', 'Slovenia', 'Latvia',
66
+ 'Estonia', 'Cyprus', 'Luxembourg', 'Malta']
67
+ }
68
+
69
+ # Indicator Mappings
70
+ INDICATOR_ALIASES = {
71
+ 'population': 'pop_est',
72
+ 'people': 'pop_est',
73
+ 'inhabitants': 'pop_est',
74
+ 'gdp': 'gdp_md_est',
75
+ 'economy': 'gdp_md_est',
76
+ 'economic output': 'gdp_md_est',
77
+ 'density': 'pop_density',
78
+ 'population density': 'pop_density',
79
+ 'per capita': 'gdp_per_capita',
80
+ 'gdp per capita': 'gdp_per_capita',
81
+ 'wealth per person': 'gdp_per_capita'
82
+ }
83
+
84
+ # Display Names for Indicators
85
+ INDICATOR_DISPLAY_NAMES = {
86
+ 'pop_est': 'Population',
87
+ 'gdp_md_est': 'GDP (Million USD)',
88
+ 'pop_density': 'Population Density (per km²)',
89
+ 'gdp_per_capita': 'GDP per Capita (USD)',
90
+ 'co2_per_capita': 'CO2 per Capita (tons)',
91
+ 'renewable_energy': 'Renewable Energy (%)',
92
+ 'forest_coverage': 'Forest Coverage (%)',
93
+ 'gdp_growth': 'GDP Growth Rate (%)',
94
+ 'unemployment': 'Unemployment Rate (%)',
95
+ 'inflation': 'Inflation Rate (%)'
96
+ }
97
+
98
+ # Color Schemes for Visualizations
99
+ COLOR_SCHEMES = {
100
+ 'choropleth': 'YlOrRd', # For maps
101
+ 'bar_chart': 'continent', # Color by continent
102
+ 'scatter_plot': 'continent'
103
+ }
104
+
105
+ # Error Messages
106
+ ERROR_MESSAGES = {
107
+ 'no_data': "No data found for your query. Try different locations or indicators.",
108
+ 'parsing_error': "Error parsing your query. Please try rephrasing.",
109
+ 'processing_error': "Error processing query: {error}. Please try again.",
110
+ 'llm_error': "Error connecting to LLM service. Using fallback query parsing."
111
+ }
112
+
113
+ # API Rate Limiting (requests per minute)
114
+ RATE_LIMIT_RPM = 20
115
+ RATE_LIMIT_ENABLED = True
116
+
117
+ # Logging Configuration
118
+ LOG_LEVEL = "INFO" # Options: DEBUG, INFO, WARNING, ERROR
119
+ LOG_FORMAT = "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
120
+
121
+ # Advanced Features (Future)
122
+ ENABLE_SATELLITE_INTEGRATION = False
123
+ ENABLE_REAL_TIME_DATA = False
124
+ ENABLE_CUSTOM_DATASETS = False
125
+
126
+ # Footer Information
127
+ FOOTER_TEXT = """
128
+ ---
129
+ **About:** This app uses LLMs to parse natural language queries and visualize global geospatial data.
130
+
131
+ **Data Sources:** Natural Earth, World Bank Open Data (sample data)
132
+
133
+ **Built by:** Full Stack Geospatial AI Engineer | Satellite Data Specialist
134
+
135
+ **GitHub:** [Your GitHub URL]
136
+ **LinkedIn:** [Your LinkedIn URL]
137
+ **Website:** [Your Website URL]
138
+ """
139
+
140
+ # Custom CSS (optional)
141
+ CUSTOM_CSS = """
142
+ .gradio-container {
143
+ font-family: 'Arial', sans-serif;
144
+ }
145
+ """
146
+
147
+ # Performance Tuning
148
+ OPTIMIZE_MEMORY = True
149
+ LAZY_LOADING = True
150
+ BATCH_PROCESSING = False
151
+
152
+ # Debug Mode
153
+ DEBUG_MODE = False
154
+ VERBOSE_LOGGING = False
155
+
156
+ # Analytics (optional, for future implementation)
157
+ ENABLE_ANALYTICS = False
158
+ ANALYTICS_PROVIDER = None # Options: google, mixpanel, custom
159
+
160
+ # Internationalization (future)
161
+ DEFAULT_LANGUAGE = "en"
162
+ SUPPORTED_LANGUAGES = ["en"] # English only for now
163
+
164
+ # Data Sources (for future expansion)
165
+ DATA_SOURCES = {
166
+ 'naturalearth': {
167
+ 'enabled': True,
168
+ 'priority': 1
169
+ },
170
+ 'worldbank': {
171
+ 'enabled': False, # Requires API key
172
+ 'priority': 2,
173
+ 'api_key': None
174
+ },
175
+ 'un': {
176
+ 'enabled': False, # Requires API setup
177
+ 'priority': 3
178
+ }
179
+ }
data_utils.py ADDED
@@ -0,0 +1,209 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Enhanced data handlers for multiple geospatial data sources
3
+ """
4
+ import pandas as pd
5
+ import requests
6
+ from typing import Dict, List, Optional
7
+ import json
8
+
9
+ class DataEnhancer:
10
+ """
11
+ Additional data sources and enrichment for geospatial queries
12
+ """
13
+
14
+ @staticmethod
15
+ def get_sample_economic_data():
16
+ """
17
+ Sample economic indicators (in production, connect to World Bank API)
18
+ """
19
+ return {
20
+ 'United States': {'gdp_growth': 2.1, 'unemployment': 3.7, 'inflation': 3.2},
21
+ 'China': {'gdp_growth': 5.2, 'unemployment': 5.0, 'inflation': 0.2},
22
+ 'Germany': {'gdp_growth': 0.1, 'unemployment': 3.0, 'inflation': 6.1},
23
+ 'India': {'gdp_growth': 7.2, 'unemployment': 8.0, 'inflation': 5.4},
24
+ 'Brazil': {'gdp_growth': 2.9, 'unemployment': 8.5, 'inflation': 4.6},
25
+ 'United Kingdom': {'gdp_growth': 0.5, 'unemployment': 3.9, 'inflation': 4.0},
26
+ 'France': {'gdp_growth': 0.9, 'unemployment': 7.2, 'inflation': 5.2},
27
+ 'Japan': {'gdp_growth': 1.9, 'unemployment': 2.6, 'inflation': 3.2},
28
+ 'South Korea': {'gdp_growth': 1.4, 'unemployment': 2.7, 'inflation': 3.6},
29
+ 'Canada': {'gdp_growth': 1.1, 'unemployment': 5.4, 'inflation': 3.9}
30
+ }
31
+
32
+ @staticmethod
33
+ def get_sample_environmental_data():
34
+ """
35
+ Sample environmental indicators
36
+ """
37
+ return {
38
+ 'United States': {'co2_per_capita': 15.5, 'renewable_energy': 12.6, 'forest_coverage': 33.9},
39
+ 'China': {'co2_per_capita': 7.4, 'renewable_energy': 12.4, 'forest_coverage': 23.0},
40
+ 'Germany': {'co2_per_capita': 8.4, 'renewable_energy': 19.3, 'forest_coverage': 32.7},
41
+ 'India': {'co2_per_capita': 1.9, 'renewable_energy': 17.5, 'forest_coverage': 24.4},
42
+ 'Brazil': {'co2_per_capita': 2.2, 'renewable_energy': 46.1, 'forest_coverage': 59.4},
43
+ 'Russia': {'co2_per_capita': 11.4, 'renewable_energy': 5.1, 'forest_coverage': 49.8},
44
+ 'Japan': {'co2_per_capita': 8.7, 'renewable_energy': 10.2, 'forest_coverage': 68.5},
45
+ 'Australia': {'co2_per_capita': 16.8, 'renewable_energy': 11.9, 'forest_coverage': 17.4}
46
+ }
47
+
48
+ @staticmethod
49
+ def enrich_dataframe(df: pd.DataFrame, data_type: str = 'economic') -> pd.DataFrame:
50
+ """
51
+ Enrich existing dataframe with additional indicators
52
+ """
53
+ enriched_df = df.copy()
54
+
55
+ if data_type == 'economic':
56
+ extra_data = DataEnhancer.get_sample_economic_data()
57
+ elif data_type == 'environmental':
58
+ extra_data = DataEnhancer.get_sample_environmental_data()
59
+ else:
60
+ return enriched_df
61
+
62
+ # Add new columns
63
+ for indicator in ['gdp_growth', 'unemployment', 'inflation',
64
+ 'co2_per_capita', 'renewable_energy', 'forest_coverage']:
65
+ enriched_df[indicator] = enriched_df['name'].map(
66
+ lambda x: extra_data.get(x, {}).get(indicator, None)
67
+ )
68
+
69
+ return enriched_df
70
+
71
+ @staticmethod
72
+ def get_regional_aggregates(df: pd.DataFrame) -> pd.DataFrame:
73
+ """
74
+ Calculate regional aggregates
75
+ """
76
+ regional_stats = df.groupby('continent').agg({
77
+ 'pop_est': 'sum',
78
+ 'gdp_md_est': 'sum',
79
+ 'name': 'count'
80
+ }).reset_index()
81
+
82
+ regional_stats.columns = ['continent', 'total_population', 'total_gdp', 'country_count']
83
+ regional_stats['avg_gdp_per_capita'] = (
84
+ regional_stats['total_gdp'] / regional_stats['total_population'] * 1000000
85
+ )
86
+
87
+ return regional_stats
88
+
89
+ class QueryEnhancer:
90
+ """
91
+ Enhance and validate queries
92
+ """
93
+
94
+ CONTINENT_MAP = {
95
+ 'asia': 'Asia',
96
+ 'europe': 'Europe',
97
+ 'africa': 'Africa',
98
+ 'north america': 'North America',
99
+ 'south america': 'South America',
100
+ 'oceania': 'Oceania',
101
+ 'antarctica': 'Antarctica'
102
+ }
103
+
104
+ COUNTRY_GROUPS = {
105
+ 'brics': ['Brazil', 'Russia', 'India', 'China', 'South Africa'],
106
+ 'g7': ['United States of America', 'Japan', 'Germany', 'United Kingdom',
107
+ 'France', 'Italy', 'Canada'],
108
+ 'asean': ['Indonesia', 'Thailand', 'Philippines', 'Vietnam', 'Myanmar',
109
+ 'Malaysia', 'Singapore', 'Cambodia', 'Laos', 'Brunei'],
110
+ 'gcc': ['Saudi Arabia', 'United Arab Emirates', 'Kuwait', 'Qatar', 'Bahrain', 'Oman'],
111
+ 'eu': ['Germany', 'France', 'Italy', 'Spain', 'Poland', 'Romania', 'Netherlands',
112
+ 'Belgium', 'Greece', 'Portugal', 'Czech Republic', 'Hungary', 'Sweden',
113
+ 'Austria', 'Bulgaria', 'Denmark', 'Finland', 'Slovakia', 'Ireland',
114
+ 'Croatia', 'Lithuania', 'Slovenia', 'Latvia', 'Estonia', 'Cyprus',
115
+ 'Luxembourg', 'Malta']
116
+ }
117
+
118
+ @classmethod
119
+ def expand_location(cls, location: str) -> List[str]:
120
+ """
121
+ Expand location strings to actual country/region names
122
+ """
123
+ location_lower = location.lower()
124
+
125
+ # Check if it's a continent
126
+ if location_lower in cls.CONTINENT_MAP:
127
+ return [cls.CONTINENT_MAP[location_lower]]
128
+
129
+ # Check if it's a country group
130
+ if location_lower in cls.COUNTRY_GROUPS:
131
+ return cls.COUNTRY_GROUPS[location_lower]
132
+
133
+ # Return as-is
134
+ return [location]
135
+
136
+ @classmethod
137
+ def validate_indicators(cls, indicators: List[str]) -> List[str]:
138
+ """
139
+ Validate and normalize indicator names
140
+ """
141
+ valid_indicators = []
142
+ indicator_mapping = {
143
+ 'population': 'pop_est',
144
+ 'gdp': 'gdp_md_est',
145
+ 'density': 'pop_density',
146
+ 'per capita': 'gdp_per_capita',
147
+ 'co2': 'co2_per_capita',
148
+ 'renewable': 'renewable_energy',
149
+ 'forest': 'forest_coverage',
150
+ 'growth': 'gdp_growth',
151
+ 'unemployment': 'unemployment',
152
+ 'inflation': 'inflation'
153
+ }
154
+
155
+ for indicator in indicators:
156
+ indicator_lower = indicator.lower()
157
+ for key, value in indicator_mapping.items():
158
+ if key in indicator_lower:
159
+ valid_indicators.append(value)
160
+ break
161
+ else:
162
+ valid_indicators.append('pop_est') # default
163
+
164
+ return list(set(valid_indicators)) # Remove duplicates
165
+
166
+ # Statistical analysis utilities
167
+ class GeoStats:
168
+ """
169
+ Statistical analysis for geospatial data
170
+ """
171
+
172
+ @staticmethod
173
+ def calculate_correlation(df: pd.DataFrame, col1: str, col2: str) -> float:
174
+ """
175
+ Calculate correlation between two indicators
176
+ """
177
+ try:
178
+ return df[[col1, col2]].corr().iloc[0, 1]
179
+ except:
180
+ return 0.0
181
+
182
+ @staticmethod
183
+ def get_outliers(df: pd.DataFrame, column: str) -> pd.DataFrame:
184
+ """
185
+ Identify outliers using IQR method
186
+ """
187
+ Q1 = df[column].quantile(0.25)
188
+ Q3 = df[column].quantile(0.75)
189
+ IQR = Q3 - Q1
190
+
191
+ lower_bound = Q1 - 1.5 * IQR
192
+ upper_bound = Q3 + 1.5 * IQR
193
+
194
+ outliers = df[(df[column] < lower_bound) | (df[column] > upper_bound)]
195
+ return outliers
196
+
197
+ @staticmethod
198
+ def generate_summary_stats(df: pd.DataFrame, column: str) -> Dict:
199
+ """
200
+ Generate summary statistics for a column
201
+ """
202
+ return {
203
+ 'mean': df[column].mean(),
204
+ 'median': df[column].median(),
205
+ 'std': df[column].std(),
206
+ 'min': df[column].min(),
207
+ 'max': df[column].max(),
208
+ 'count': df[column].count()
209
+ }
requirements.txt ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ gradio
2
+ pandas
3
+ geopandas
4
+ folium
5
+ plotly
6
+ huggingface-hub
7
+ shapely
8
+ pyproj
9
+ numpy
10
+ requests
11
+ pytest
setup.bat ADDED
@@ -0,0 +1,89 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ @echo off
2
+ REM Geospatial AI Query System - Quick Start Script (Windows)
3
+ REM This script helps you set up and run the application locally
4
+
5
+ echo ================================
6
+ echo Geospatial AI Query System - Setup
7
+ echo ================================
8
+ echo.
9
+
10
+ REM Check Python version
11
+ echo Checking Python version...
12
+ python --version >nul 2>&1
13
+ if errorlevel 1 (
14
+ echo [ERROR] Python 3 is not installed. Please install Python 3.8 or higher.
15
+ pause
16
+ exit /b 1
17
+ )
18
+ python --version
19
+ echo [OK] Python found
20
+ echo.
21
+
22
+ REM Create virtual environment
23
+ echo Creating virtual environment...
24
+ if not exist "venv" (
25
+ python -m venv venv
26
+ echo [OK] Virtual environment created
27
+ ) else (
28
+ echo [OK] Virtual environment already exists
29
+ )
30
+ echo.
31
+
32
+ REM Activate virtual environment
33
+ echo Activating virtual environment...
34
+ call venv\Scripts\activate.bat
35
+ echo [OK] Virtual environment activated
36
+ echo.
37
+
38
+ REM Upgrade pip
39
+ echo Upgrading pip...
40
+ python -m pip install --upgrade pip >nul 2>&1
41
+ echo [OK] Pip upgraded
42
+ echo.
43
+
44
+ REM Install requirements
45
+ echo Installing dependencies...
46
+ echo This may take a few minutes...
47
+ pip install -r requirements.txt
48
+ if errorlevel 1 (
49
+ echo [ERROR] Failed to install dependencies
50
+ pause
51
+ exit /b 1
52
+ )
53
+ echo [OK] All dependencies installed successfully
54
+ echo.
55
+
56
+ REM Check for HF_TOKEN
57
+ echo Checking for Hugging Face token...
58
+ if "%HF_TOKEN%"=="" (
59
+ echo [WARNING] HF_TOKEN not set (optional for testing)
60
+ echo To enable LLM features, get a token from:
61
+ echo https://huggingface.co/settings/tokens
62
+ echo Then run: set HF_TOKEN=your_token_here
63
+ ) else (
64
+ echo [OK] HF_TOKEN found
65
+ )
66
+ echo.
67
+
68
+ REM Run tests
69
+ echo Running tests...
70
+ pytest test_app.py -v
71
+ if errorlevel 1 (
72
+ echo [WARNING] Some tests failed (app may still work)
73
+ ) else (
74
+ echo [OK] All tests passed
75
+ )
76
+ echo.
77
+
78
+ REM Start application
79
+ echo ================================
80
+ echo Starting application...
81
+ echo ================================
82
+ echo.
83
+ echo The app will be available at:
84
+ echo http://localhost:7860
85
+ echo.
86
+ echo Press Ctrl+C to stop the application
87
+ echo.
88
+
89
+ python app.py
setup.sh ADDED
@@ -0,0 +1,87 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+
3
+ # Geospatial AI Query System - Quick Start Script
4
+ # This script helps you set up and run the application locally
5
+
6
+ echo "🌍 Geospatial AI Query System - Setup"
7
+ echo "======================================"
8
+ echo ""
9
+
10
+ # Check Python version
11
+ echo "Checking Python version..."
12
+ python_version=$(python3 --version 2>&1)
13
+ if [[ $? -ne 0 ]]; then
14
+ echo "❌ Python 3 is not installed. Please install Python 3.8 or higher."
15
+ exit 1
16
+ fi
17
+ echo "✅ Found: $python_version"
18
+ echo ""
19
+
20
+ # Create virtual environment
21
+ echo "Creating virtual environment..."
22
+ if [ ! -d "venv" ]; then
23
+ python3 -m venv venv
24
+ echo "✅ Virtual environment created"
25
+ else
26
+ echo "✅ Virtual environment already exists"
27
+ fi
28
+ echo ""
29
+
30
+ # Activate virtual environment
31
+ echo "Activating virtual environment..."
32
+ source venv/bin/activate
33
+ echo "✅ Virtual environment activated"
34
+ echo ""
35
+
36
+ # Upgrade pip
37
+ echo "Upgrading pip..."
38
+ pip install --upgrade pip > /dev/null 2>&1
39
+ echo "✅ Pip upgraded"
40
+ echo ""
41
+
42
+ # Install requirements
43
+ echo "Installing dependencies..."
44
+ echo "This may take a few minutes..."
45
+ pip install -r requirements.txt
46
+ if [[ $? -eq 0 ]]; then
47
+ echo "✅ All dependencies installed successfully"
48
+ else
49
+ echo "❌ Failed to install dependencies"
50
+ exit 1
51
+ fi
52
+ echo ""
53
+
54
+ # Check for HF_TOKEN
55
+ echo "Checking for Hugging Face token..."
56
+ if [ -z "$HF_TOKEN" ]; then
57
+ echo "⚠️ HF_TOKEN not set (optional for testing)"
58
+ echo " To enable LLM features, get a token from:"
59
+ echo " https://huggingface.co/settings/tokens"
60
+ echo " Then run: export HF_TOKEN=your_token_here"
61
+ else
62
+ echo "✅ HF_TOKEN found"
63
+ fi
64
+ echo ""
65
+
66
+ # Run tests
67
+ echo "Running tests..."
68
+ pytest test_app.py -v
69
+ if [[ $? -eq 0 ]]; then
70
+ echo "✅ All tests passed"
71
+ else
72
+ echo "⚠️ Some tests failed (app may still work)"
73
+ fi
74
+ echo ""
75
+
76
+ # Start application
77
+ echo "======================================"
78
+ echo "🚀 Starting application..."
79
+ echo "======================================"
80
+ echo ""
81
+ echo "The app will be available at:"
82
+ echo "http://localhost:7860"
83
+ echo ""
84
+ echo "Press Ctrl+C to stop the application"
85
+ echo ""
86
+
87
+ python app.py
test_app.py ADDED
@@ -0,0 +1,173 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Test suite for Geospatial AI Query System
3
+ Run: pytest test_app.py
4
+ """
5
+
6
+ import pytest
7
+ import pandas as pd
8
+ import geopandas as gpd
9
+ from pathlib import Path
10
+ from data_utils import DataEnhancer, QueryEnhancer, GeoStats
11
+
12
+ # Path to local Natural Earth data (geopandas.datasets was deprecated in GeoPandas 1.0)
13
+ DATA_DIR = Path(__file__).parent / "data" / "ne_110m_admin_0_countries"
14
+ NATURAL_EARTH_SHP = DATA_DIR / "ne_110m_admin_0_countries.shp"
15
+
16
+ class TestDataEnhancer:
17
+ """Test data enhancement utilities"""
18
+
19
+ def test_economic_data_structure(self):
20
+ """Test economic data has correct structure"""
21
+ data = DataEnhancer.get_sample_economic_data()
22
+ assert isinstance(data, dict)
23
+ assert 'United States' in data
24
+ assert 'gdp_growth' in data['United States']
25
+ assert 'unemployment' in data['United States']
26
+ assert 'inflation' in data['United States']
27
+
28
+ def test_environmental_data_structure(self):
29
+ """Test environmental data has correct structure"""
30
+ data = DataEnhancer.get_sample_environmental_data()
31
+ assert isinstance(data, dict)
32
+ assert 'China' in data
33
+ assert 'co2_per_capita' in data['China']
34
+ assert 'renewable_energy' in data['China']
35
+
36
+ def test_enrich_dataframe(self):
37
+ """Test dataframe enrichment"""
38
+ # Create sample dataframe
39
+ df = pd.DataFrame({
40
+ 'name': ['United States', 'China', 'Germany'],
41
+ 'pop_est': [331000000, 1440000000, 83000000],
42
+ 'gdp_md_est': [21000000, 14000000, 3800000]
43
+ })
44
+
45
+ enriched = DataEnhancer.enrich_dataframe(df, 'economic')
46
+ assert 'gdp_growth' in enriched.columns
47
+ assert enriched.loc[enriched['name'] == 'United States', 'gdp_growth'].iloc[0] == 2.1
48
+
49
+ class TestQueryEnhancer:
50
+ """Test query enhancement utilities"""
51
+
52
+ def test_expand_continent(self):
53
+ """Test continent expansion"""
54
+ result = QueryEnhancer.expand_location('asia')
55
+ assert result == ['Asia']
56
+
57
+ def test_expand_country_group_brics(self):
58
+ """Test BRICS expansion"""
59
+ result = QueryEnhancer.expand_location('brics')
60
+ assert 'Brazil' in result
61
+ assert 'India' in result
62
+ assert 'China' in result
63
+ assert len(result) == 5
64
+
65
+ def test_expand_country_group_g7(self):
66
+ """Test G7 expansion"""
67
+ result = QueryEnhancer.expand_location('g7')
68
+ assert 'United States of America' in result
69
+ assert 'Japan' in result
70
+ assert len(result) == 7
71
+
72
+ def test_validate_indicators(self):
73
+ """Test indicator validation"""
74
+ indicators = ['GDP', 'Population', 'CO2 emissions']
75
+ result = QueryEnhancer.validate_indicators(indicators)
76
+ assert 'gdp_md_est' in result
77
+ assert 'pop_est' in result
78
+
79
+ class TestGeoStats:
80
+ """Test statistical utilities"""
81
+
82
+ def test_calculate_correlation(self):
83
+ """Test correlation calculation"""
84
+ df = pd.DataFrame({
85
+ 'col1': [1, 2, 3, 4, 5],
86
+ 'col2': [2, 4, 6, 8, 10]
87
+ })
88
+ corr = GeoStats.calculate_correlation(df, 'col1', 'col2')
89
+ assert corr == 1.0 # Perfect positive correlation
90
+
91
+ def test_summary_stats(self):
92
+ """Test summary statistics"""
93
+ df = pd.DataFrame({
94
+ 'values': [10, 20, 30, 40, 50]
95
+ })
96
+ stats = GeoStats.generate_summary_stats(df, 'values')
97
+ assert stats['mean'] == 30.0
98
+ assert stats['median'] == 30.0
99
+ assert stats['min'] == 10
100
+ assert stats['max'] == 50
101
+
102
+ class TestIntegration:
103
+ """Integration tests"""
104
+
105
+ def test_world_data_loading(self):
106
+ """Test loading world data"""
107
+ world = gpd.read_file(NATURAL_EARTH_SHP)
108
+ assert not world.empty
109
+ assert 'NAME' in world.columns or 'name' in world.columns
110
+ assert 'CONTINENT' in world.columns or 'continent' in world.columns
111
+ assert 'POP_EST' in world.columns or 'pop_est' in world.columns
112
+
113
+ def test_query_to_data_pipeline(self):
114
+ """Test complete query to data pipeline"""
115
+ # Load data
116
+ world = gpd.read_file(NATURAL_EARTH_SHP)
117
+
118
+ # Normalize column names to lowercase
119
+ world.columns = world.columns.str.lower()
120
+
121
+ # Expand query
122
+ locations = QueryEnhancer.expand_location('brics')
123
+
124
+ # Filter data (try both 'name' and 'admin' columns)
125
+ name_col = 'name' if 'name' in world.columns else 'admin'
126
+ filtered = world[world[name_col].isin(locations)]
127
+
128
+ assert not filtered.empty
129
+ assert len(filtered) > 0
130
+
131
+ def test_data_enrichment_pipeline(self):
132
+ """Test data enrichment pipeline"""
133
+ # Load data
134
+ world = gpd.read_file(NATURAL_EARTH_SHP)
135
+
136
+ # Normalize column names to lowercase
137
+ world.columns = world.columns.str.lower()
138
+
139
+ # Take sample
140
+ sample = world.head(10)
141
+
142
+ # Enrich
143
+ enriched = DataEnhancer.enrich_dataframe(sample, 'economic')
144
+
145
+ assert 'gdp_growth' in enriched.columns
146
+ assert 'unemployment' in enriched.columns
147
+
148
+ # Sample query test cases
149
+ SAMPLE_QUERIES = [
150
+ "Show me population of Asian countries",
151
+ "Compare GDP of European nations",
152
+ "What's the population density in Africa?",
153
+ "Display economic indicators for South American countries",
154
+ "Show me top 10 countries by GDP",
155
+ "Compare BRICS nations",
156
+ "Environmental data for G7 countries"
157
+ ]
158
+
159
+ class TestQueryParsing:
160
+ """Test query parsing (mock LLM responses)"""
161
+
162
+ def test_query_keywords(self):
163
+ """Test that queries contain expected keywords"""
164
+ for query in SAMPLE_QUERIES:
165
+ assert len(query) > 0
166
+ assert any(continent in query.lower() for continent in
167
+ ['asian', 'european', 'africa', 'south american', 'brics', 'g7']) or \
168
+ any(indicator in query.lower() for indicator in
169
+ ['population', 'gdp', 'economic', 'environmental'])
170
+
171
+ if __name__ == "__main__":
172
+ # Run tests
173
+ pytest.main([__file__, '-v'])