Spaces:

rifatSDAS
/

geospatial-ai-query

Running

App Files Files Community

rifatSDAS commited on Jan 20

Commit

2171c22

1 Parent(s): ae175a2

Initial commit: Geospatial AI Query System

Browse files

Files changed (11) hide show

.gitignore +66 -0
LICENSE.md +11 -0
README.md +138 -1
USER_GUIDE.md +399 -0
app.py +574 -0
config.py +179 -0
data_utils.py +209 -0
requirements.txt +11 -0
setup.bat +89 -0
setup.sh +87 -0
test_app.py +173 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,66 @@

+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+# Virtual Environment
+venv/
+.venv/
+virtualenv/
+.env/
+env/
+ENV/
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+*~
+# Environment variables
+.env
+.env.local
+# Jupyter Notebook
+.ipynb_checkpoints
+# Data files
+#*.csv
+#*.geojson
+#*.shp
+#*.shx
+#*.dbf
+#*.prj
+# OS
+.DS_Store
+Thumbs.db
+# Gradio
+flagged/
+# Documents
+DEPLOYMENT.md
+LOCAL_TESTING_GUIDE.md
+PROJECT_SUMMARY.md
+QUICKSTART.md
+TESTING_CHECKLIST.md

LICENSE.md ADDED Viewed

	@@ -0,0 +1,11 @@

+# The MIT License (MIT)
+---
+Copyright (c) 2026 rifatSDAS
+Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

README.md CHANGED Viewed

@@ -1,14 +1,151 @@
 ---
 title: Geospatial Ai Query
-emoji: 📊
 colorFrom: purple
 colorTo: gray
 sdk: gradio
 sdk_version: 6.3.0
 app_file: app.py
 pinned: false
 license: mit
 short_description: "Query geospatial data with natural language\_interface"
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
 title: Geospatial Ai Query
+emoji: 🌍📊
 colorFrom: purple
 colorTo: gray
 sdk: gradio
 sdk_version: 6.3.0
 app_file: app.py
 pinned: false
+version: 1.0.0
 license: mit
 short_description: "Query geospatial data with natural language\_interface"
 ---
+# 🌍 Geospatial AI Query System
+An intelligent natural language interface for querying and visualizing global geographic data including socioeconomic and environmental information at scales, i.e., countries, continents, and specific regions.
+## Features
+### 🤖 Natural Language Processing
+- Ask questions in plain English about countries, regions, and global indicators
+- LLM-powered query parsing using Mistral-7B-Instruct-v0.3
+- Automatic extraction of locations, indicators, and visualization preferences
+### 📊 Multi-Modal Visualization
+- **Interactive Maps**: Choropleth maps with country-level data
+- **Dynamic Charts**: Bar charts, scatter plots, and trend visualizations using Plotly
+- **Data Tables**: Formatted tables with key socioeconomic indicators
+### 🌐 Comprehensive Data Coverage
+- **Demographic**: Population, population density, urban/rural distribution
+- **Economic**: GDP, GDP per capita, trade indicators
+- **Geographic**: Country boundaries, areas, continents, regional groups
+- **Environmental**: CO2 emissions, renewable energy usage, forest area (sample data)
+- **Derived Metrics**: Population density, GDP growth rates
+- **Social**: Development indices, education, health metrics
+## Example Queries
+```
+"Show me population of Asian countries"
+"Compare GDP of European nations"
+"What's the population density in Africa?"
+"Display economic indicators for South American countries"
+"Show me top 10 countries by GDP"
+"Compare population vs GDP for BRICS nations"
+```
+## How It Works
+1. **Query Input**: User enters natural language query
+2. **LLM Parsing**: Mistral-7B-Instruct-v0.3 extracts structured information (locations, indicators, visualization type)
+3. **Data Fetching**: GeoPandas retrieves and processes geospatial data
+4. **Visualization**: Results rendered as interactive maps, charts, or tables
+5. **Multi-format Output**: View results in your preferred format
+## Technology Stack
+- **Frontend**: Gradio for web interface
+- **LLM**: Hugging Face Inference API (Mistral-7B-Instruct-v0.3)
+- **Geospatial**: GeoPandas, Folium
+- **Visualization**: Plotly Express
+- **Data**: Natural Earth, World Bank Open Data
+## Data Sources
+- **Natural Earth**: Country boundaries and geographic data
+- **World Bank**: Economic and demographic indicators
+- **Derived Metrics**: Population density, GDP per capita
+## Local Development
+```bash
+# Clone repository
+git clone https://huggingface.co/spaces/rifatSDAS/geospatial-ai-query
+cd geospatial-ai-query
+# Install dependencies
+pip install -r requirements.txt
+# Set HuggingFace token (optional, for LLM features)
+export HF_TOKEN=your_token_here
+# Run application
+python app.py
+```
+## Deployment on Hugging Face Spaces
+1. Create new Space on Hugging Face
+2. Select Gradio SDK
+3. Upload `app.py` and `requirements.txt`
+4. Add `HF_TOKEN` in Space settings (Settings > Repository secrets)
+5. Space will automatically build and deploy
+## Configuration
+### Environment Variables
+- `HF_TOKEN`: Hugging Face API token for LLM inference (optional)
+## Use Cases
+### Education
+- Interactive geography - demography, economy, and socioeconomic lessons
+- Data visualization for research projects
+- Understanding global trends and patterns
+### Business Intelligence
+- Market analysis by region
+- Demographic research for expansion planning
+- Competitive geographic and landscape analysis
+### Research
+- Geographic - demographic, economy, and socioeconomic data exploration
+- Regional to global scale analysis
+- Trend identification and data visualization and extracttion
+## About the Developer
+Built by Dr. Kazi Rifat Ahmed, a **Full Stack Geospatial AI Engineer** specializing in:
+- AI/ML-DL for geospatial applications
+- Cloud-native geospatial software engineering & architecture
+- Large-scale Satellite/Earth Observation data data analysis, processing, analytics, and visualization
+- Blockchain and Quantum Computing for geospatial applications
+- Research Advanced Geospatial Science, Technology, and Applications
+- Co-founder and Technical Lead for Satellite Data Services business in Space sector, i.e., QuentuED (https://quentued.de) and Sensor Aktor (https://sensor-aktor.de)
+### Tech Stack Proficiency
+Python | Java | JavaScript | TypeScript | C/C++ | Bash | Cloud-Native Architecture (kubernetes) | DevOps | AI/ML/DL | MLOps | LLM Integration | Blockchain | Remote Sensing Science & Technology | Geospatial Data Science & Engineering
+### Research Interests
+Geospatial AI | Satellite Data Engineering | Drone Sensors | Geospatial Big Data Analytics | Earth Observation Systems & Sensors | Advanced Remote Sensing Techniques | Space Technology | Quantum Computing | Blockchain | | Satellite Data Services | Planetary Science & Exploration
+## License
+This project is licensed under the MIT License - see the [LICENSE.md](LICENSE) file for details.
+## Contributing
+Contributions welcome! Please feel free to submit issues and pull requests.
+## Contact
+For collaboration opportunities in satellite data services & applications, large-scale satellite data analytics, geospatial AI, blockchain & quantum computing for geospatial applications, or advanced geospatial science, technology & applications, feel free to reach out!
+---
+**Tags**: #geospatial #geospatial-ai #AI #ML #DL #LLM #satellite-data #earth-observation #blockchain #quantum-computing #data-visualization #natural-language #gradio #huggingface
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

USER_GUIDE.md ADDED Viewed

	@@ -0,0 +1,399 @@

+# User Guide: Geospatial AI Query System
+## Table of Contents
+1. [Getting Started](#getting-started)
+2. [Query Examples](#query-examples)
+3. [Understanding Results](#understanding-results)
+4. [Advanced Features](#advanced-features)
+5. [Tips & Best Practices](#tips--best-practices)
+6. [Troubleshooting](#troubleshooting)
+## Getting Started
+### What Can You Query?
+This system allows you to explore global data about:
+**Geographic Coverage:**
+- Individual countries (e.g., "United States", "China", "Germany")
+- Continents (Asia, Europe, Africa, North America, South America, Oceania)
+- Country groups (BRICS, G7, ASEAN, EU, GCC)
+**Data Categories:**
+- **Demographics**: Population, population density, urban/rural distribution
+- **Economy**: GDP, GDP per capita, growth rates, unemployment
+- **Environment**: CO2 emissions, renewable energy, forest coverage
+- **Geography**: Land area, borders, geographic features
+### How to Use
+1. **Enter Your Query**: Type your question in natural language
+2. **Select Output Format**: Choose All, Map, Chart, or Table
+3. **Select Advanced Visualization Options**:
+   - For Maps: Choose base map style and color scheme, legend color
+   - For Charts: Select chart type (bar, scatter, line, bubble, etc.), color theme
+4. **Click Analyze**: The AI will process your query
+5. **Explore Results**: View interactive visualizations and data tables, download if needed
+## Query Examples
+### Basic Queries
+#### Population Queries
+```
+"Show me population of Asian countries"
+"What is the population of Brazil?"
+"Compare population density in Europe vs Africa"
+"Which countries have the largest population?"
+```
+#### Economic Queries
+```
+"Show me GDP of top 10 economies"
+"Compare GDP per capita in Scandinavian countries"
+"What's the economic situation in South America?"
+"Display GDP growth rates for G7 countries"
+```
+#### Environmental Queries
+```
+"Show CO2 emissions in major economies"
+"Which countries have the most renewable energy?"
+"Compare forest coverage in tropical countries"
+"Environmental indicators for BRICS nations"
+```
+### Advanced Queries
+#### Multi-Indicator Comparisons
+```
+"Compare GDP and population for European countries"
+"Show relationship between GDP and CO2 emissions"
+"Analyze economic and environmental indicators for Asia"
+```
+#### Regional Analysis
+```
+"Show all indicators for Middle Eastern countries"
+"Compare ASEAN nations across all metrics"
+"Regional breakdown of population density"
+```
+#### Specific Country Groups
+```
+"Display data for BRICS countries"
+"Compare G7 with emerging markets"
+"Show EU member states economic indicators"
+"Gulf Cooperation Council statistics"
+```
+### Query Patterns
+| Pattern                             | Example                          | Result            |
+| ----------------------------------- | -------------------------------- | ----------------- |
+| "Show me [indicator] of [location]" | "Show me GDP of Asian countries" | Bar chart + table |
+| "Compare [locations]"               | "Compare Europe vs Asia"         | Comparison chart  |
+| "What is [indicator] in [location]" | "What is population in Africa?"  | Specific value    |
+| "Display data for [group]"          | "Display data for BRICS"         | All indicators    |
+| "Top N countries by [indicator]"    | "Top 10 countries by GDP"        | Ranked list       |
+## Understanding Results
+### Map View 🗺️
+**Features:**
+- Color-coded choropleth maps
+- Interactive zoom and pan
+- Click on countries for details
+- Layer controls for different indicators
+- Download as an interactive map with HTML format to open with any browser
+**How to Read:**
+- Darker colors = Higher values
+- Gray areas = No data available
+- Blue markers = Country centers
+- Popup windows = Detailed statistics
+### Chart View 📊
+**Chart Types:**
+1. **Bar Charts** (Vertical): Best for comparing values across countries
+   - Sorted by value (highest to lowest)
+   - Color-coded by continent
+   - Shows top 20 countries
+2. **Horizontal Bar Charts**: Best for long country names
+   - Easier to read labels
+   - Sorted by value
+   - Color-coded by continent
+3. **Scatter Plots**: Best for analyzing relationships
+   - Each point = one country
+   - Size = population
+   - Color = continent
+   - Reveals correlations between indicators
+4. **Pie Charts**: Best for showing proportions
+   - Shows distribution of values
+   - Displays top countries
+   - Percentage of total shown
+5. **Treemap**: Best for hierarchical data visualization
+   - Rectangle size = indicator value
+   - Grouped by continent
+   - Color intensity shows magnitude
+6. **Bubble Charts**: Best for multi-dimensional analysis
+   - X-axis = country position
+   - Bubble size = indicator value
+   - Color = continent
+   - Great for spotting outliers
+**Interactive Features:**
+- Hover for details
+- Zoom and pan
+- Download as image
+### Table View 📋
+**Columns:**
+- Country name
+- Continent
+- Population
+- GDP
+- Population density
+- GDP per capita
+**Features:**
+- Sortable columns
+- Formatted numbers
+- Up to 50 countries displayed
+- Export as CSV
+## Advanced Features
+### Country Groups
+The system recognizes these special groups:
+**BRICS**: Brazil, Russia, India, China, South Africa
+```
+"Show me BRICS economic indicators"
+```
+**G7**: USA, Japan, Germany, UK, France, Italy, Canada
+```
+"Compare G7 countries"
+```
+**ASEAN**: 10 Southeast Asian nations
+```
+"ASEAN population statistics"
+```
+**EU**: 27 European Union member states
+```
+"EU environmental data"
+```
+**GCC**: 6 Gulf Cooperation Council countries
+```
+"GCC GDP comparison"
+```
+### Multi-Modal Analysis
+**Simultaneous Views:**
+Select "All" to see:
+- Map (geographic distribution)
+- Chart (comparative visualization)
+- Table (detailed numbers)
+**Use Cases:**
+- Comprehensive analysis
+- Interactive data visualizations
+- Research, news reporting, and educational purposes
+### Data Enrichment
+The system automatically calculates:
+- **Population Density**: Population per km²
+- **GDP per Capita**: GDP divided by population
+- **Regional Aggregates**: Continental totals and averages
+## Tips & Best Practices
+### Writing Effective Queries
+**✅ DO:**
+- Be specific: "Show GDP of European countries"
+- Use natural language: "What's the population of China?"
+- Specify what you want: "Compare Africa and China GDP"
+- Use recognized names: "BRICS", "G7", "Asian countries"
+**❌ DON'T:**
+- Be too vague: "Show me data"
+- Use ambiguous terms: "Show stuff about countries"
+- Expect real-time data (data may be from recent years)
+- Query non-existent indicators
+### Choosing Output Format
+| Format    | Best For               | When to Use                             |
+| --------- | ---------------------- | --------------------------------------- |
+| **All**   | Comprehensive analysis | First-time queries, presentations       |
+| **Map**   | Geographic patterns    | Spatial distribution, regional analysis |
+| **Chart** | Comparisons            | Rankings, trends, relationships         |
+| **Table** | Specific numbers       | Detailed data, exports, reports         |
+### Interpreting Results
+**For Rankings:**
+- Use bar charts
+- Sort by indicator value
+- Focus on top/bottom performers
+**For Comparisons:**
+- Use scatter plots
+- Look for clusters and outliers
+- Analyze relationships
+**For Geographic Patterns:**
+- Use maps
+- Observe regional groupings
+- Identify spatial trends
+### Query Optimization
+**Fast Queries:**
+```
+"GDP of G7"           # Specific group
+"Population of Africa" # Single country
+```
+**Slower Queries:**
+```
+"All data for all countries"        # Too broad
+"Compare 50 countries across 20 indicators" # Too complex
+```
+## Troubleshooting
+### Common Issues
+#### "No data found"
+**Possible Causes:**
+- Misspelled country name
+- Unrecognized location
+- No data available for indicator
+**Solutions:**
+- Check spelling (use common English names)
+- Try continent instead: "Asian countries" vs "countries in Asia"
+- Use recognized groups: BRICS, G7, EU
+#### "Error processing query"
+**Possible Causes:**
+- Query too complex
+- Server timeout
+- Invalid syntax
+**Solutions:**
+- Simplify query
+- Break into smaller queries
+- Use example queries as templates
+#### Unexpected Results
+**Possible Causes:**
+- Query interpreted differently
+- Multiple countries with similar names
+- Ambiguous indicator names
+**Solutions:**
+- Be more specific
+- Use full country names
+- Specify exact indicators
+### Data Limitations
+**What's Available:**
+- Country-level data (not city/region level)
+- Recent years (data may lag 1-2 years)
+- Major indicators (population, GDP, environment)
+- 177 countries from Natural Earth dataset
+**What's NOT Available:**
+- Real-time/live data
+- Sub-national data (cities, provinces)
+- Historical time series (full implementation pending)
+- Highly specific indicators
+### Performance Tips
+**For Faster Results:**
+1. Start with specific queries
+2. Use recognized country groups
+3. Limit to 1-2 indicators
+4. Choose single output format
+**For Better Quality:**
+1. Use precise country names
+2. Specify exact indicators
+3. Be explicit about comparisons
+4. Include context in query
+## Example Workflows
+### Research Workflow
+1. **Explore**: "Show me data for ASEAN countries"
+2. **Analyze**: "Compare GDP growth in ASEAN"
+3. **Deep Dive**: "What's the GDP per capita in Vietnam?"
+4. **Compare**: "Compare Vietnam with Thailand"
+### Presentation Workflow
+1. **Overview**: Select "All" format
+2. **Geographic**: Focus on map view
+3. **Rankings**: Use chart view
+4. **Details**: Reference table view
+### Educational Workflow
+1. **Context**: "Show me African countries"
+2. **Compare**: "Compare population in Africa vs Europe"
+3. **Analyze**: "Why does Africa have lower GDP per capita?"
+4. **Discuss**: Use visualizations to support discussion
+## Getting Help
+### Resources
+- **Examples**: Click example queries in the app
+- **Feedback**: Use 👍 👎 buttons to rate results
+- **Issues**: Report bugs via GitHub issues
+- **Discussions**: Join Hugging Face Space discussions
+### Support Channels
+- **Community Forum**: Ask questions in Space discussions
+- **Documentation**: Check README.md and DEPLOYMENT.md
+- **Updates**: Follow Space for new features
+### Feature Requests
+Have ideas? I'd love to hear them!
+- Add comment in Space discussions
+- Open feature request on GitHub
+- Share your use case
+---
+**Last Updated**: January 2026
+**Version**: 1.0.0
+**Happy Exploring Geospatial Data! 🌍**

app.py ADDED Viewed

	@@ -0,0 +1,574 @@

+import gradio as gr
+import pandas as pd
+import geopandas as gpd
+import folium
+from folium import plugins
+import plotly.express as px
+import plotly.graph_objects as go
+from huggingface_hub import InferenceClient
+import json
+import os
+import tempfile
+import io
+from datetime import datetime
+import numpy as np
+from pathlib import Path
+import warnings
+# Suppress GeoPandas CRS warnings (area/centroid calculations are approximate for demo purposes)
+warnings.filterwarnings('ignore', message='.*Geometry is in a geographic CRS.*')
+import branca.colormap as cm
+def format_number(num):
+    """Format large numbers with K/M/B/T suffixes for better readability."""
+    if num is None or (isinstance(num, float) and np.isnan(num)):
+        return 'N/A'
+    abs_num = abs(num)
+    if abs_num >= 1e12:
+        return f'{num/1e12:.1f}T'
+    elif abs_num >= 1e9:
+        return f'{num/1e9:.1f}B'
+    elif abs_num >= 1e6:
+        return f'{num/1e6:.1f}M'
+    elif abs_num >= 1e3:
+        return f'{num/1e3:.1f}K'
+    else:
+        return f'{num:.1f}'
+# Path to local Natural Earth data (geopandas.datasets was deprecated in GeoPandas 1.0)
+DATA_DIR = Path(__file__).parent / "data" / "ne_110m_admin_0_countries"
+NATURAL_EARTH_SHP = DATA_DIR / "ne_110m_admin_0_countries.shp"
+# Initialize HF Inference Client
+client = InferenceClient(token=os.environ.get("HF_TOKEN"))
+# ===== UI/UX Enhancement Constants =====
+MAP_STYLES = {
+    "Light": "CartoDB positron",
+    "Dark": "CartoDB dark_matter",
+    "Street": "OpenStreetMap",
+    "Satellite": "Esri.WorldImagery"
+}
+COLOR_SCHEMES = {
+    "Default": px.colors.qualitative.Plotly,
+    "Vivid": px.colors.qualitative.Vivid,
+    "Pastel": px.colors.qualitative.Pastel,
+    "Bold": px.colors.qualitative.Bold,
+    "Earth": px.colors.qualitative.Safe
+}
+CHOROPLETH_COLORS = {
+    "Yellow-Orange-Red": "YlOrRd",
+    "Yellow-Green-Blue": "YlGnBu",
+    "Purple-Red": "PuRd",
+    "Blue-Purple": "BuPu",
+    "Greens": "Greens",
+    "Blues": "Blues",
+    "Oranges": "OrRd",
+    "Spectral": "Spectral"
+}
+INDICATORS = {
+    "Population": "pop_est",
+    "GDP (Million $)": "gdp_md_est",
+    "Population Density": "pop_density",
+    "GDP per Capita": "gdp_per_capita"
+}
+# Global cache for world data
+_world_data_cache = None
+def load_world_data():
+    """Load world countries geospatial data"""
+    global _world_data_cache
+    if _world_data_cache is None:
+        raw = gpd.read_file(NATURAL_EARTH_SHP)
+        # Select only the columns we need (using original uppercase names)
+        # and rename them to match expected lowercase names
+        _world_data_cache = raw[['NAME', 'CONTINENT', 'POP_EST', 'GDP_MD', 'geometry']].copy()
+        _world_data_cache.columns = ['name', 'continent', 'pop_est', 'gdp_md_est', 'geometry']
+    return _world_data_cache
+def parse_query_with_llm(user_query):
+    """
+    Use LLM to parse natural language query into structured format
+    """
+    system_prompt = """You are a geospatial and geographic data query parser. Extract structured information from user queries.
+    Response format (JSON only):
+    {
+        "locations": ["country/region names"],
+        "indicators": ["GDP", "population", "CO2 emissions", etc.],
+        "time_range": {"start": "YYYY", "end": "YYYY"},
+        "visualization": "map/chart/table",
+        "aggregation": "sum/average/comparison",
+        "query_type": "single_country/multi_country/regional/global"
+    }
+    Examples:
+    - "Show me GDP of Asian countries" → locations: Asia, indicators: GDP, visualization: chart
+    - "Compare population density in Europe vs Africa" → locations: [Europe, Africa], indicators: population density
+    - "Environmental data for Brazil over last decade" → locations: [Brazil], indicators: environmental
+    Return ONLY valid JSON, no explanations."""
+    messages = [
+        {"role": "system", "content": system_prompt},
+        {"role": "user", "content": f"Parse this query: {user_query}"}
+    ]
+    try:
+        response = client.chat_completion(
+            messages=messages,
+            model="mistralai/Mistral-7B-Instruct-v0.3",
+            max_tokens=500,
+            temperature=0.1
+        )
+        parsed = json.loads(response.choices[0].message.content)
+        return parsed
+    except Exception as e:
+        print(f"LLM parsing error: {e}")
+        return {
+            "locations": [],
+            "indicators": ["population", "gdp_md_est"],
+            "visualization": "table",
+            "query_type": "global"
+        }
+def fetch_geospatial_data(parsed_query):
+    """
+    Fetch and process geospatial data based on parsed query
+    """
+    world = load_world_data()
+    # Filter by locations
+    locations = parsed_query.get("locations", [])
+    if locations and locations[0].lower() != "global":
+        # Filter by continent or country
+        mask = world['continent'].isin(locations) | world['name'].isin(locations)
+        filtered_data = world[mask]
+    else:
+        filtered_data = world
+    # Add computed indicators
+    filtered_data['pop_density'] = filtered_data['pop_est'] / filtered_data['geometry'].area * 1000000
+    filtered_data['gdp_per_capita'] = filtered_data['gdp_md_est'] / filtered_data['pop_est'] * 1000000
+    return filtered_data
+def create_interactive_map(gdf, indicator='pop_est', map_style='Light', color_scale='Yellow-Orange-Red'):
+    """
+    Create an interactive Folium map with customizable style and colors
+    """
+    # Calculate center
+    center_lat = gdf.geometry.centroid.y.mean()
+    center_lon = gdf.geometry.centroid.x.mean()
+    # Get tile style
+    tiles = MAP_STYLES.get(map_style, 'CartoDB positron')
+    # Create map
+    m = folium.Map(
+        location=[center_lat, center_lon],
+        zoom_start=2,
+        tiles=tiles
+    )
+    # Get color scheme
+    fill_color = CHOROPLETH_COLORS.get(color_scale, 'YlOrRd')
+    # Get min/max for the indicator, excluding NaN values
+    valid_values = gdf[indicator].dropna()
+    if valid_values.empty:
+        vmin, vmax = 0, 1  # Default range if no valid data
+    else:
+        vmin = float(valid_values.min())
+        vmax = float(valid_values.max())
+    # Ensure vmin and vmax are valid numbers
+    if np.isnan(vmin) or np.isnan(vmax) or np.isinf(vmin) or np.isinf(vmax):
+        vmin, vmax = 0, 1
+    # Ensure vmax > vmin to avoid division issues
+    if vmax <= vmin:
+        vmax = vmin + 1
+    # Map color scheme names to branca colormap
+    color_map_dict = {
+        'YlOrRd': cm.linear.YlOrRd_09,
+        'YlGnBu': cm.linear.YlGnBu_09,
+        'Blues': cm.linear.Blues_09,
+        'Greens': cm.linear.Greens_09,
+        'Reds': cm.linear.Reds_09,
+        'PuRd': cm.linear.PuRd_09,
+        'OrRd': cm.linear.OrRd_09
+    }
+    # Create colormap with properly formatted tick labels
+    base = color_map_dict.get(fill_color, cm.linear.YlOrRd_09)
+    # Generate tick positions
+    n_ticks = 6
+    tick_values = list(np.linspace(vmin, vmax, n_ticks))
+    # Sample colors from base colormap at tick positions
+    if vmax > vmin:
+        hex_colors = [base.rgb_hex_str((v - vmin) / (vmax - vmin)) for v in tick_values]
+    else:
+        hex_colors = [base.rgb_hex_str(0.5)] * n_ticks
+    # Create colormap (tick_labels expects floats, so we use index values directly)
+    colormap = cm.LinearColormap(
+        colors=hex_colors,
+        index=tick_values,
+        vmin=vmin,
+        vmax=vmax,
+        caption=f"{indicator.replace('_', ' ').title()} ({format_number(vmin)} - {format_number(vmax)})"
+    )
+    # Add choropleth without its auto-legend
+    choropleth = folium.Choropleth(
+        geo_data=gdf,
+        data=gdf,
+        columns=['name', indicator],
+        key_on='feature.properties.name',
+        fill_color=fill_color,
+        fill_opacity=0.7,
+        line_opacity=0.2,
+        legend_name=None,  # Disable auto legend
+        nan_fill_color='lightgray'
+    )
+    choropleth.add_to(m)
+    # Remove any auto-generated colormap from choropleth
+    for key in list(choropleth._children.keys()):
+        if key.startswith('color_map'):
+            del choropleth._children[key]
+            break
+    # Add our custom colormap with formatted labels (default Folium style)
+    colormap.add_to(m)
+    # Add tooltips
+    for idx, row in gdf.iterrows():
+        folium.Marker(
+            location=[row.geometry.centroid.y, row.geometry.centroid.x],
+            popup=f"""
+            <b>{row['name']}</b><br>
+            Population: {row['pop_est']:,.0f}<br>
+            GDP: ${row['gdp_md_est']:,.0f}M<br>
+            Continent: {row['continent']}
+            """,
+            icon=folium.Icon(icon='info-sign', color='blue')
+        ).add_to(m)
+    # Add layer control
+    folium.LayerControl().add_to(m)
+    return m
+def create_chart(df, indicators, chart_type='bar', color_scheme='Default', top_n=20):
+    """
+    Create interactive Plotly charts with customizable options
+    """
+    # Get color sequence
+    colors = COLOR_SCHEMES.get(color_scheme, px.colors.qualitative.Plotly)
+    # Sort and limit data
+    sorted_df = df.sort_values(indicators[0], ascending=False).head(top_n)
+    if chart_type == 'bar':
+        fig = px.bar(
+            sorted_df,
+            x='name',
+            y=indicators[0],
+            color='continent',
+            title=f'Top {top_n} Countries by {indicators[0].replace("_", " ").title()}',
+            labels={'name': 'Country', indicators[0]: indicators[0].replace('_', ' ').title()},
+            color_discrete_sequence=colors,
+            height=500
+        )
+    elif chart_type == 'horizontal_bar':
+        fig = px.bar(
+            sorted_df,
+            y='name',
+            x=indicators[0],
+            color='continent',
+            title=f'Top {top_n} Countries by {indicators[0].replace("_", " ").title()}',
+            labels={'name': 'Country', indicators[0]: indicators[0].replace('_', ' ').title()},
+            color_discrete_sequence=colors,
+            orientation='h',
+            height=600
+        )
+        fig.update_layout(yaxis={'categoryorder': 'total ascending'})
+    elif chart_type == 'scatter':
+        fig = px.scatter(
+            df,
+            x=indicators[0] if len(indicators) > 0 else 'gdp_md_est',
+            y=indicators[1] if len(indicators) > 1 else 'pop_est',
+            size='pop_est',
+            color='continent',
+            hover_name='name',
+            title='Country Comparison',
+            labels={
+                indicators[0]: indicators[0].replace('_', ' ').title() if len(indicators) > 0 else 'GDP',
+                indicators[1]: indicators[1].replace('_', ' ').title() if len(indicators) > 1 else 'Population'
+            },
+            color_discrete_sequence=colors,
+            height=500
+        )
+    elif chart_type == 'pie':
+        fig = px.pie(
+            sorted_df,
+            values=indicators[0],
+            names='name',
+            title=f'Top {top_n} Countries by {indicators[0].replace("_", " ").title()}',
+            color_discrete_sequence=colors,
+            height=500
+        )
+        fig.update_traces(textposition='inside', textinfo='percent+label')
+    elif chart_type == 'treemap':
+        fig = px.treemap(
+            sorted_df,
+            path=['continent', 'name'],
+            values=indicators[0],
+            title=f'Top {top_n} Countries by {indicators[0].replace("_", " ").title()}',
+            color='continent',
+            color_discrete_sequence=colors,
+            height=600
+        )
+    elif chart_type == 'bubble':
+        fig = px.scatter(
+            df,
+            x='gdp_md_est',
+            y='pop_est',
+            size=indicators[0],
+            color='continent',
+            hover_name='name',
+            title=f'Bubble Chart: Size = {indicators[0].replace("_", " ").title()}',
+            labels={'gdp_md_est': 'GDP (Million $)', 'pop_est': 'Population'},
+            color_discrete_sequence=colors,
+            size_max=60,
+            height=500
+        )
+    else:  # default bar
+        fig = px.bar(
+            sorted_df,
+            x='name',
+            y=indicators[0],
+            color='continent',
+            title=f'Top {top_n} Countries by {indicators[0].replace("_", " ").title()}',
+            color_discrete_sequence=colors,
+            height=500
+        )
+    fig.update_layout(
+        xaxis_tickangle=-45,
+        template='plotly_white'
+    )
+    return fig
+def create_data_table(df):
+    """
+    Create formatted data table
+    """
+    # Select relevant columns
+    display_cols = ['name', 'continent', 'pop_est', 'gdp_md_est', 'pop_density', 'gdp_per_capita']
+    table_df = df[display_cols].copy()
+    # Rename columns
+    table_df.columns = ['Country', 'Continent', 'Population', 'GDP (Million $)',
+                        'Pop. Density (per km²)', 'GDP per Capita ($)']
+    # Format numbers
+    table_df['Population'] = table_df['Population'].apply(lambda x: f'{x:,.0f}')
+    table_df['GDP (Million $)'] = table_df['GDP (Million $)'].apply(lambda x: f'${x:,.0f}')
+    table_df['Pop. Density (per km²)'] = table_df['Pop. Density (per km²)'].apply(lambda x: f'{x:.2f}')
+    table_df['GDP per Capita ($)'] = table_df['GDP per Capita ($)'].apply(lambda x: f'${x:,.2f}')
+    return table_df.sort_values('Population', ascending=False).head(50)
+def process_query(user_query, output_format, chart_type, map_style, color_scheme, choropleth_color, top_n, indicator):
+    """
+    Main processing function with advanced options
+    """
+    try:
+        # Parse query with LLM
+        parsed = parse_query_with_llm(user_query)
+        # Fetch data
+        gdf = fetch_geospatial_data(parsed)
+        if gdf.empty:
+            return None, None, None, "No data found for your query. Try different locations or indicators.", None, None
+        # Use selected indicator (override LLM parsing if user selected one)
+        selected_indicator = INDICATORS.get(indicator, 'pop_est')
+        mapped_indicators = [selected_indicator]
+        # Generate outputs based on format
+        map_html = None
+        chart_fig = None
+        table_df = None
+        map_file = None
+        csv_file = None
+        summary = f"🔍 **Query:** {user_query}\n\n"
+        summary += f"📍 **Locations:** {', '.join(parsed.get('locations', ['Global']))}\n"
+        summary += f"📊 **Indicator:** {indicator}\n"
+        summary += f"🌍 **Countries found:** {len(gdf)}\n\n"
+        summary += f"⚙️ **Options:** Chart: {chart_type} | Map: {map_style} | Top N: {top_n}"
+        if output_format in ['All', 'Map']:
+            m = create_interactive_map(gdf, mapped_indicators[0], map_style, choropleth_color)
+            map_html = m._repr_html_()
+            # Save map to temp file for download
+            map_file = tempfile.NamedTemporaryFile(delete=False, suffix='.html', mode='w', encoding='utf-8')
+            m.save(map_file.name)
+            map_file = map_file.name
+        if output_format in ['All', 'Chart']:
+            chart_fig = create_chart(gdf, mapped_indicators, chart_type, color_scheme, int(top_n))
+        if output_format in ['All', 'Table']:
+            table_df = create_data_table(gdf)
+            # Save table to temp CSV file for download
+            csv_file = tempfile.NamedTemporaryFile(delete=False, suffix='.csv', mode='w', encoding='utf-8')
+            table_df.to_csv(csv_file.name, index=False)
+            csv_file = csv_file.name
+        return map_html, chart_fig, table_df, summary, map_file, csv_file
+    except Exception as e:
+        error_msg = f"Error processing query: {str(e)}\n\nPlease try rephrasing your query."
+        return None, None, None, error_msg, None, None
+# Gradio Interface
+def create_interface():
+    with gr.Blocks(title="Geospatial AI Query System") as demo:
+        gr.Markdown("""
+        # 🌍 Geospatial AI Query System
+        ### Natural Language Interface for Geographic Data
+        Ask questions about countries, regions, and global indicators using natural language!
+        **Example Queries:**
+        - "Show me population of Asian countries"
+        - "Compare GDP of European nations"
+        - "What's the population density in Africa?"
+        - "Display economic indicators for South American countries"
+        """)
+        with gr.Row():
+            with gr.Column(scale=3):
+                query_input = gr.Textbox(
+                    label="Your Query",
+                    placeholder="E.g., Show me GDP and population of BRICS countries",
+                    lines=2
+                )
+            with gr.Column(scale=1):
+                output_format = gr.Radio(
+                    choices=['All', 'Map', 'Chart', 'Table'],
+                    value='All',
+                    label="Output Format"
+                )
+        # Advanced Options in Accordion
+        with gr.Accordion("⚙️ Advanced Options", open=False):
+            with gr.Row():
+                chart_type = gr.Dropdown(
+                    choices=['bar', 'horizontal_bar', 'scatter', 'pie', 'treemap', 'bubble'],
+                    value='bar',
+                    label="📊 Chart Type"
+                )
+                map_style = gr.Dropdown(
+                    choices=list(MAP_STYLES.keys()),
+                    value='Light',
+                    label="🗺️ Map Style"
+                )
+            with gr.Row():
+                color_scheme = gr.Dropdown(
+                    choices=list(COLOR_SCHEMES.keys()),
+                    value='Default',
+                    label="🎨 Chart Colors"
+                )
+                choropleth_color = gr.Dropdown(
+                    choices=list(CHOROPLETH_COLORS.keys()),
+                    value='Yellow-Orange-Red',
+                    label="🌈 Map Colors"
+                )
+            with gr.Row():
+                top_n = gr.Slider(
+                    minimum=5,
+                    maximum=50,
+                    value=20,
+                    step=5,
+                    label="🔢 Top N Countries"
+                )
+                indicator = gr.Dropdown(
+                    choices=list(INDICATORS.keys()),
+                    value="Population",
+                    label="📈 Indicator"
+                )
+        submit_btn = gr.Button("🔍 Analyze", variant="primary", size="lg")
+        gr.Markdown("### Results")
+        summary_output = gr.Textbox(label="Query Summary", lines=4)
+        with gr.Tabs():
+            with gr.Tab("📊 Chart"):
+                chart_output = gr.Plot(label="Interactive Chart")
+            with gr.Tab("🗺️ Map"):
+                map_output = gr.HTML(label="Interactive Map")
+                map_download = gr.File(label="📥 Download Map (HTML)", visible=True)
+            with gr.Tab("📋 Table"):
+                table_output = gr.Dataframe(label="Data Table")
+                csv_download = gr.File(label="📥 Download Table (CSV)", visible=True)
+        # Examples
+        gr.Examples(
+            examples=[
+                ["Show me population of Asian countries", "All"],
+                ["Compare GDP of top 10 economies", "Chart"],
+                ["What's the population density in European countries?", "Map"],
+                ["Display data for African nations", "Table"],
+                ["Show me South American countries' economic indicators", "All"]
+            ],
+            inputs=[query_input, output_format]
+        )
+        # Event handler
+        submit_btn.click(
+            fn=process_query,
+            inputs=[query_input, output_format, chart_type, map_style, color_scheme, choropleth_color, top_n, indicator],
+            outputs=[map_output, chart_output, table_output, summary_output, map_download, csv_download]
+        )
+        gr.Markdown("""
+        ---
+        **About:** This app uses LLMs to parse natural language queries and visualize global geospatial data.
+        **Data Sources:** Natural Earth, World Bank Open Data
+        **Built by:** [rifatSDAS](https://github.com/rifatSDAS)
+        """)
+    return demo
+if __name__ == "__main__":
+    demo = create_interface()
+    # Enable queue for better concurrency handling on HF Spaces
+    demo.queue(default_concurrency_limit=10)
+    demo.launch(theme=gr.themes.Soft())
+    # To enable Progressive Web App (PWA) features, uncomment the line below
+    # demo.launch(theme=gr.themes.Soft(), pwa=True)

config.py ADDED Viewed

	@@ -0,0 +1,179 @@

+# Configuration file for Geospatial AI Query System
+# LLM Configuration
+LLM_MODEL = "mistralai/Mistral-7B-Instruct-v0.3"
+LLM_MAX_TOKENS = 500
+LLM_TEMPERATURE = 0.1
+# Gradio Configuration
+APP_TITLE = "🌍 Geospatial AI Query System"
+APP_DESCRIPTION = """
+### Natural Language Interface for Global Socioeconomic Data
+Ask questions about countries, regions, and global indicators using natural language!
+"""
+# Server Configuration
+SERVER_NAME = "0.0.0.0"  # Listen on all interfaces
+SERVER_PORT = 7860
+ENABLE_QUEUE = True
+QUEUE_CONCURRENCY = 5
+# Visualization Configuration
+MAP_DEFAULT_ZOOM = 2
+MAP_TILE_STYLE = "CartoDB dark_matter"  # Options: OpenStreetMap, CartoDB positron, CartoDB dark_matter
+CHART_HEIGHT = 500
+CHART_THEME = "plotly_dark"  # Options: plotly, plotly_white, plotly_dark, ggplot2, seaborn
+MAX_COUNTRIES_IN_TABLE = 50
+MAX_COUNTRIES_IN_CHART = 20
+# Data Configuration
+USE_CACHE = True
+CACHE_SIZE = 128  # Number of queries to cache
+DEFAULT_INDICATOR = "pop_est"
+# Feature Flags
+ENABLE_ADVANCED_STATS = True
+ENABLE_DATA_EXPORT = False  # Future feature
+ENABLE_TIME_SERIES = False  # Future feature
+# Example Queries (shown in UI)
+EXAMPLE_QUERIES = [
+    ("Show me population of Asian countries", "All"),
+    ("Compare GDP of top 10 economies", "Chart"),
+    ("What's the population density in European countries?", "Map"),
+    ("Display data for African nations", "Table"),
+    ("Show me South American countries' economic indicators", "All")
+]
+# Country Group Definitions
+COUNTRY_GROUPS = {
+    'brics': ['Brazil', 'Russia', 'India', 'China', 'South Africa'],
+    'g7': ['United States of America', 'Japan', 'Germany', 'United Kingdom',
+           'France', 'Italy', 'Canada'],
+    'g20': ['Argentina', 'Australia', 'Brazil', 'Canada', 'China', 'France',
+            'Germany', 'India', 'Indonesia', 'Italy', 'Japan', 'South Korea',
+            'Mexico', 'Russia', 'Saudi Arabia', 'South Africa', 'Turkey',
+            'United Kingdom', 'United States of America'],
+    'asean': ['Indonesia', 'Thailand', 'Philippines', 'Vietnam', 'Myanmar',
+              'Malaysia', 'Singapore', 'Cambodia', 'Laos', 'Brunei'],
+    'gcc': ['Saudi Arabia', 'United Arab Emirates', 'Kuwait', 'Qatar',
+            'Bahrain', 'Oman'],
+    'eu': ['Germany', 'France', 'Italy', 'Spain', 'Poland', 'Romania',
+           'Netherlands', 'Belgium', 'Greece', 'Portugal', 'Czech Republic',
+           'Hungary', 'Sweden', 'Austria', 'Bulgaria', 'Denmark', 'Finland',
+           'Slovakia', 'Ireland', 'Croatia', 'Lithuania', 'Slovenia', 'Latvia',
+           'Estonia', 'Cyprus', 'Luxembourg', 'Malta']
+}
+# Indicator Mappings
+INDICATOR_ALIASES = {
+    'population': 'pop_est',
+    'people': 'pop_est',
+    'inhabitants': 'pop_est',
+    'gdp': 'gdp_md_est',
+    'economy': 'gdp_md_est',
+    'economic output': 'gdp_md_est',
+    'density': 'pop_density',
+    'population density': 'pop_density',
+    'per capita': 'gdp_per_capita',
+    'gdp per capita': 'gdp_per_capita',
+    'wealth per person': 'gdp_per_capita'
+}
+# Display Names for Indicators
+INDICATOR_DISPLAY_NAMES = {
+    'pop_est': 'Population',
+    'gdp_md_est': 'GDP (Million USD)',
+    'pop_density': 'Population Density (per km²)',
+    'gdp_per_capita': 'GDP per Capita (USD)',
+    'co2_per_capita': 'CO2 per Capita (tons)',
+    'renewable_energy': 'Renewable Energy (%)',
+    'forest_coverage': 'Forest Coverage (%)',
+    'gdp_growth': 'GDP Growth Rate (%)',
+    'unemployment': 'Unemployment Rate (%)',
+    'inflation': 'Inflation Rate (%)'
+}
+# Color Schemes for Visualizations
+COLOR_SCHEMES = {
+    'choropleth': 'YlOrRd',  # For maps
+    'bar_chart': 'continent',  # Color by continent
+    'scatter_plot': 'continent'
+}
+# Error Messages
+ERROR_MESSAGES = {
+    'no_data': "No data found for your query. Try different locations or indicators.",
+    'parsing_error': "Error parsing your query. Please try rephrasing.",
+    'processing_error': "Error processing query: {error}. Please try again.",
+    'llm_error': "Error connecting to LLM service. Using fallback query parsing."
+}
+# API Rate Limiting (requests per minute)
+RATE_LIMIT_RPM = 20
+RATE_LIMIT_ENABLED = True
+# Logging Configuration
+LOG_LEVEL = "INFO"  # Options: DEBUG, INFO, WARNING, ERROR
+LOG_FORMAT = "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
+# Advanced Features (Future)
+ENABLE_SATELLITE_INTEGRATION = False
+ENABLE_REAL_TIME_DATA = False
+ENABLE_CUSTOM_DATASETS = False
+# Footer Information
+FOOTER_TEXT = """
+---
+**About:** This app uses LLMs to parse natural language queries and visualize global geospatial data.
+**Data Sources:** Natural Earth, World Bank Open Data (sample data)
+**Built by:** Full Stack Geospatial AI Engineer | Satellite Data Specialist
+**GitHub:** [Your GitHub URL]
+**LinkedIn:** [Your LinkedIn URL]
+**Website:** [Your Website URL]
+"""
+# Custom CSS (optional)
+CUSTOM_CSS = """
+.gradio-container {
+    font-family: 'Arial', sans-serif;
+}
+"""
+# Performance Tuning
+OPTIMIZE_MEMORY = True
+LAZY_LOADING = True
+BATCH_PROCESSING = False
+# Debug Mode
+DEBUG_MODE = False
+VERBOSE_LOGGING = False
+# Analytics (optional, for future implementation)
+ENABLE_ANALYTICS = False
+ANALYTICS_PROVIDER = None  # Options: google, mixpanel, custom
+# Internationalization (future)
+DEFAULT_LANGUAGE = "en"
+SUPPORTED_LANGUAGES = ["en"]  # English only for now
+# Data Sources (for future expansion)
+DATA_SOURCES = {
+    'naturalearth': {
+        'enabled': True,
+        'priority': 1
+    },
+    'worldbank': {
+        'enabled': False,  # Requires API key
+        'priority': 2,
+        'api_key': None
+    },
+    'un': {
+        'enabled': False,  # Requires API setup
+        'priority': 3
+    }
+}

data_utils.py ADDED Viewed

	@@ -0,0 +1,209 @@

+"""
+Enhanced data handlers for multiple geospatial data sources
+"""
+import pandas as pd
+import requests
+from typing import Dict, List, Optional
+import json
+class DataEnhancer:
+    """
+    Additional data sources and enrichment for geospatial queries
+    """
+    @staticmethod
+    def get_sample_economic_data():
+        """
+        Sample economic indicators (in production, connect to World Bank API)
+        """
+        return {
+            'United States': {'gdp_growth': 2.1, 'unemployment': 3.7, 'inflation': 3.2},
+            'China': {'gdp_growth': 5.2, 'unemployment': 5.0, 'inflation': 0.2},
+            'Germany': {'gdp_growth': 0.1, 'unemployment': 3.0, 'inflation': 6.1},
+            'India': {'gdp_growth': 7.2, 'unemployment': 8.0, 'inflation': 5.4},
+            'Brazil': {'gdp_growth': 2.9, 'unemployment': 8.5, 'inflation': 4.6},
+            'United Kingdom': {'gdp_growth': 0.5, 'unemployment': 3.9, 'inflation': 4.0},
+            'France': {'gdp_growth': 0.9, 'unemployment': 7.2, 'inflation': 5.2},
+            'Japan': {'gdp_growth': 1.9, 'unemployment': 2.6, 'inflation': 3.2},
+            'South Korea': {'gdp_growth': 1.4, 'unemployment': 2.7, 'inflation': 3.6},
+            'Canada': {'gdp_growth': 1.1, 'unemployment': 5.4, 'inflation': 3.9}
+        }
+    @staticmethod
+    def get_sample_environmental_data():
+        """
+        Sample environmental indicators
+        """
+        return {
+            'United States': {'co2_per_capita': 15.5, 'renewable_energy': 12.6, 'forest_coverage': 33.9},
+            'China': {'co2_per_capita': 7.4, 'renewable_energy': 12.4, 'forest_coverage': 23.0},
+            'Germany': {'co2_per_capita': 8.4, 'renewable_energy': 19.3, 'forest_coverage': 32.7},
+            'India': {'co2_per_capita': 1.9, 'renewable_energy': 17.5, 'forest_coverage': 24.4},
+            'Brazil': {'co2_per_capita': 2.2, 'renewable_energy': 46.1, 'forest_coverage': 59.4},
+            'Russia': {'co2_per_capita': 11.4, 'renewable_energy': 5.1, 'forest_coverage': 49.8},
+            'Japan': {'co2_per_capita': 8.7, 'renewable_energy': 10.2, 'forest_coverage': 68.5},
+            'Australia': {'co2_per_capita': 16.8, 'renewable_energy': 11.9, 'forest_coverage': 17.4}
+        }
+    @staticmethod
+    def enrich_dataframe(df: pd.DataFrame, data_type: str = 'economic') -> pd.DataFrame:
+        """
+        Enrich existing dataframe with additional indicators
+        """
+        enriched_df = df.copy()
+        if data_type == 'economic':
+            extra_data = DataEnhancer.get_sample_economic_data()
+        elif data_type == 'environmental':
+            extra_data = DataEnhancer.get_sample_environmental_data()
+        else:
+            return enriched_df
+        # Add new columns
+        for indicator in ['gdp_growth', 'unemployment', 'inflation',
+                            'co2_per_capita', 'renewable_energy', 'forest_coverage']:
+            enriched_df[indicator] = enriched_df['name'].map(
+                lambda x: extra_data.get(x, {}).get(indicator, None)
+            )
+        return enriched_df
+    @staticmethod
+    def get_regional_aggregates(df: pd.DataFrame) -> pd.DataFrame:
+        """
+        Calculate regional aggregates
+        """
+        regional_stats = df.groupby('continent').agg({
+            'pop_est': 'sum',
+            'gdp_md_est': 'sum',
+            'name': 'count'
+        }).reset_index()
+        regional_stats.columns = ['continent', 'total_population', 'total_gdp', 'country_count']
+        regional_stats['avg_gdp_per_capita'] = (
+            regional_stats['total_gdp'] / regional_stats['total_population'] * 1000000
+        )
+        return regional_stats
+class QueryEnhancer:
+    """
+    Enhance and validate queries
+    """
+    CONTINENT_MAP = {
+        'asia': 'Asia',
+        'europe': 'Europe',
+        'africa': 'Africa',
+        'north america': 'North America',
+        'south america': 'South America',
+        'oceania': 'Oceania',
+        'antarctica': 'Antarctica'
+    }
+    COUNTRY_GROUPS = {
+        'brics': ['Brazil', 'Russia', 'India', 'China', 'South Africa'],
+        'g7': ['United States of America', 'Japan', 'Germany', 'United Kingdom',
+                'France', 'Italy', 'Canada'],
+        'asean': ['Indonesia', 'Thailand', 'Philippines', 'Vietnam', 'Myanmar',
+                    'Malaysia', 'Singapore', 'Cambodia', 'Laos', 'Brunei'],
+        'gcc': ['Saudi Arabia', 'United Arab Emirates', 'Kuwait', 'Qatar', 'Bahrain', 'Oman'],
+        'eu': ['Germany', 'France', 'Italy', 'Spain', 'Poland', 'Romania', 'Netherlands',
+                'Belgium', 'Greece', 'Portugal', 'Czech Republic', 'Hungary', 'Sweden',
+                'Austria', 'Bulgaria', 'Denmark', 'Finland', 'Slovakia', 'Ireland',
+                'Croatia', 'Lithuania', 'Slovenia', 'Latvia', 'Estonia', 'Cyprus',
+                'Luxembourg', 'Malta']
+    }
+    @classmethod
+    def expand_location(cls, location: str) -> List[str]:
+        """
+        Expand location strings to actual country/region names
+        """
+        location_lower = location.lower()
+        # Check if it's a continent
+        if location_lower in cls.CONTINENT_MAP:
+            return [cls.CONTINENT_MAP[location_lower]]
+        # Check if it's a country group
+        if location_lower in cls.COUNTRY_GROUPS:
+            return cls.COUNTRY_GROUPS[location_lower]
+        # Return as-is
+        return [location]
+    @classmethod
+    def validate_indicators(cls, indicators: List[str]) -> List[str]:
+        """
+        Validate and normalize indicator names
+        """
+        valid_indicators = []
+        indicator_mapping = {
+            'population': 'pop_est',
+            'gdp': 'gdp_md_est',
+            'density': 'pop_density',
+            'per capita': 'gdp_per_capita',
+            'co2': 'co2_per_capita',
+            'renewable': 'renewable_energy',
+            'forest': 'forest_coverage',
+            'growth': 'gdp_growth',
+            'unemployment': 'unemployment',
+            'inflation': 'inflation'
+        }
+        for indicator in indicators:
+            indicator_lower = indicator.lower()
+            for key, value in indicator_mapping.items():
+                if key in indicator_lower:
+                    valid_indicators.append(value)
+                    break
+            else:
+                valid_indicators.append('pop_est')  # default
+        return list(set(valid_indicators))  # Remove duplicates
+# Statistical analysis utilities
+class GeoStats:
+    """
+    Statistical analysis for geospatial data
+    """
+    @staticmethod
+    def calculate_correlation(df: pd.DataFrame, col1: str, col2: str) -> float:
+        """
+        Calculate correlation between two indicators
+        """
+        try:
+            return df[[col1, col2]].corr().iloc[0, 1]
+        except:
+            return 0.0
+    @staticmethod
+    def get_outliers(df: pd.DataFrame, column: str) -> pd.DataFrame:
+        """
+        Identify outliers using IQR method
+        """
+        Q1 = df[column].quantile(0.25)
+        Q3 = df[column].quantile(0.75)
+        IQR = Q3 - Q1
+        lower_bound = Q1 - 1.5 * IQR
+        upper_bound = Q3 + 1.5 * IQR
+        outliers = df[(df[column] < lower_bound) | (df[column] > upper_bound)]
+        return outliers
+    @staticmethod
+    def generate_summary_stats(df: pd.DataFrame, column: str) -> Dict:
+        """
+        Generate summary statistics for a column
+        """
+        return {
+            'mean': df[column].mean(),
+            'median': df[column].median(),
+            'std': df[column].std(),
+            'min': df[column].min(),
+            'max': df[column].max(),
+            'count': df[column].count()
+        }

requirements.txt ADDED Viewed

	@@ -0,0 +1,11 @@

+gradio
+pandas
+geopandas
+folium
+plotly
+huggingface-hub
+shapely
+pyproj
+numpy
+requests
+pytest

setup.bat ADDED Viewed

	@@ -0,0 +1,89 @@

+@echo off
+REM Geospatial AI Query System - Quick Start Script (Windows)
+REM This script helps you set up and run the application locally
+echo ================================
+echo Geospatial AI Query System - Setup
+echo ================================
+echo.
+REM Check Python version
+echo Checking Python version...
+python --version >nul 2>&1
+if errorlevel 1 (
+    echo [ERROR] Python 3 is not installed. Please install Python 3.8 or higher.
+    pause
+    exit /b 1
+)
+python --version
+echo [OK] Python found
+echo.
+REM Create virtual environment
+echo Creating virtual environment...
+if not exist "venv" (
+    python -m venv venv
+    echo [OK] Virtual environment created
+) else (
+    echo [OK] Virtual environment already exists
+)
+echo.
+REM Activate virtual environment
+echo Activating virtual environment...
+call venv\Scripts\activate.bat
+echo [OK] Virtual environment activated
+echo.
+REM Upgrade pip
+echo Upgrading pip...
+python -m pip install --upgrade pip >nul 2>&1
+echo [OK] Pip upgraded
+echo.
+REM Install requirements
+echo Installing dependencies...
+echo This may take a few minutes...
+pip install -r requirements.txt
+if errorlevel 1 (
+    echo [ERROR] Failed to install dependencies
+    pause
+    exit /b 1
+)
+echo [OK] All dependencies installed successfully
+echo.
+REM Check for HF_TOKEN
+echo Checking for Hugging Face token...
+if "%HF_TOKEN%"=="" (
+    echo [WARNING] HF_TOKEN not set (optional for testing)
+    echo    To enable LLM features, get a token from:
+    echo    https://huggingface.co/settings/tokens
+    echo    Then run: set HF_TOKEN=your_token_here
+) else (
+    echo [OK] HF_TOKEN found
+)
+echo.
+REM Run tests
+echo Running tests...
+pytest test_app.py -v
+if errorlevel 1 (
+    echo [WARNING] Some tests failed (app may still work)
+) else (
+    echo [OK] All tests passed
+)
+echo.
+REM Start application
+echo ================================
+echo Starting application...
+echo ================================
+echo.
+echo The app will be available at:
+echo http://localhost:7860
+echo.
+echo Press Ctrl+C to stop the application
+echo.
+python app.py

setup.sh ADDED Viewed

	@@ -0,0 +1,87 @@

+#!/bin/bash
+# Geospatial AI Query System - Quick Start Script
+# This script helps you set up and run the application locally
+echo "🌍 Geospatial AI Query System - Setup"
+echo "======================================"
+echo ""
+# Check Python version
+echo "Checking Python version..."
+python_version=$(python3 --version 2>&1)
+if [[ $? -ne 0 ]]; then
+    echo "❌ Python 3 is not installed. Please install Python 3.8 or higher."
+    exit 1
+fi
+echo "✅ Found: $python_version"
+echo ""
+# Create virtual environment
+echo "Creating virtual environment..."
+if [ ! -d "venv" ]; then
+    python3 -m venv venv
+    echo "✅ Virtual environment created"
+else
+    echo "✅ Virtual environment already exists"
+fi
+echo ""
+# Activate virtual environment
+echo "Activating virtual environment..."
+source venv/bin/activate
+echo "✅ Virtual environment activated"
+echo ""
+# Upgrade pip
+echo "Upgrading pip..."
+pip install --upgrade pip > /dev/null 2>&1
+echo "✅ Pip upgraded"
+echo ""
+# Install requirements
+echo "Installing dependencies..."
+echo "This may take a few minutes..."
+pip install -r requirements.txt
+if [[ $? -eq 0 ]]; then
+    echo "✅ All dependencies installed successfully"
+else
+    echo "❌ Failed to install dependencies"
+    exit 1
+fi
+echo ""
+# Check for HF_TOKEN
+echo "Checking for Hugging Face token..."
+if [ -z "$HF_TOKEN" ]; then
+    echo "⚠️  HF_TOKEN not set (optional for testing)"
+    echo "   To enable LLM features, get a token from:"
+    echo "   https://huggingface.co/settings/tokens"
+    echo "   Then run: export HF_TOKEN=your_token_here"
+else
+    echo "✅ HF_TOKEN found"
+fi
+echo ""
+# Run tests
+echo "Running tests..."
+pytest test_app.py -v
+if [[ $? -eq 0 ]]; then
+    echo "✅ All tests passed"
+else
+    echo "⚠️  Some tests failed (app may still work)"
+fi
+echo ""
+# Start application
+echo "======================================"
+echo "🚀 Starting application..."
+echo "======================================"
+echo ""
+echo "The app will be available at:"
+echo "http://localhost:7860"
+echo ""
+echo "Press Ctrl+C to stop the application"
+echo ""
+python app.py

test_app.py ADDED Viewed

	@@ -0,0 +1,173 @@

+"""
+Test suite for Geospatial AI Query System
+Run: pytest test_app.py
+"""
+import pytest
+import pandas as pd
+import geopandas as gpd
+from pathlib import Path
+from data_utils import DataEnhancer, QueryEnhancer, GeoStats
+# Path to local Natural Earth data (geopandas.datasets was deprecated in GeoPandas 1.0)
+DATA_DIR = Path(__file__).parent / "data" / "ne_110m_admin_0_countries"
+NATURAL_EARTH_SHP = DATA_DIR / "ne_110m_admin_0_countries.shp"
+class TestDataEnhancer:
+    """Test data enhancement utilities"""
+    def test_economic_data_structure(self):
+        """Test economic data has correct structure"""
+        data = DataEnhancer.get_sample_economic_data()
+        assert isinstance(data, dict)
+        assert 'United States' in data
+        assert 'gdp_growth' in data['United States']
+        assert 'unemployment' in data['United States']
+        assert 'inflation' in data['United States']
+    def test_environmental_data_structure(self):
+        """Test environmental data has correct structure"""
+        data = DataEnhancer.get_sample_environmental_data()
+        assert isinstance(data, dict)
+        assert 'China' in data
+        assert 'co2_per_capita' in data['China']
+        assert 'renewable_energy' in data['China']
+    def test_enrich_dataframe(self):
+        """Test dataframe enrichment"""
+        # Create sample dataframe
+        df = pd.DataFrame({
+            'name': ['United States', 'China', 'Germany'],
+            'pop_est': [331000000, 1440000000, 83000000],
+            'gdp_md_est': [21000000, 14000000, 3800000]
+        })
+        enriched = DataEnhancer.enrich_dataframe(df, 'economic')
+        assert 'gdp_growth' in enriched.columns
+        assert enriched.loc[enriched['name'] == 'United States', 'gdp_growth'].iloc[0] == 2.1
+class TestQueryEnhancer:
+    """Test query enhancement utilities"""
+    def test_expand_continent(self):
+        """Test continent expansion"""
+        result = QueryEnhancer.expand_location('asia')
+        assert result == ['Asia']
+    def test_expand_country_group_brics(self):
+        """Test BRICS expansion"""
+        result = QueryEnhancer.expand_location('brics')
+        assert 'Brazil' in result
+        assert 'India' in result
+        assert 'China' in result
+        assert len(result) == 5
+    def test_expand_country_group_g7(self):
+        """Test G7 expansion"""
+        result = QueryEnhancer.expand_location('g7')
+        assert 'United States of America' in result
+        assert 'Japan' in result
+        assert len(result) == 7
+    def test_validate_indicators(self):
+        """Test indicator validation"""
+        indicators = ['GDP', 'Population', 'CO2 emissions']
+        result = QueryEnhancer.validate_indicators(indicators)
+        assert 'gdp_md_est' in result
+        assert 'pop_est' in result
+class TestGeoStats:
+    """Test statistical utilities"""
+    def test_calculate_correlation(self):
+        """Test correlation calculation"""
+        df = pd.DataFrame({
+            'col1': [1, 2, 3, 4, 5],
+            'col2': [2, 4, 6, 8, 10]
+        })
+        corr = GeoStats.calculate_correlation(df, 'col1', 'col2')
+        assert corr == 1.0  # Perfect positive correlation
+    def test_summary_stats(self):
+        """Test summary statistics"""
+        df = pd.DataFrame({
+            'values': [10, 20, 30, 40, 50]
+        })
+        stats = GeoStats.generate_summary_stats(df, 'values')
+        assert stats['mean'] == 30.0
+        assert stats['median'] == 30.0
+        assert stats['min'] == 10
+        assert stats['max'] == 50
+class TestIntegration:
+    """Integration tests"""
+    def test_world_data_loading(self):
+        """Test loading world data"""
+        world = gpd.read_file(NATURAL_EARTH_SHP)
+        assert not world.empty
+        assert 'NAME' in world.columns or 'name' in world.columns
+        assert 'CONTINENT' in world.columns or 'continent' in world.columns
+        assert 'POP_EST' in world.columns or 'pop_est' in world.columns
+    def test_query_to_data_pipeline(self):
+        """Test complete query to data pipeline"""
+        # Load data
+        world = gpd.read_file(NATURAL_EARTH_SHP)
+        # Normalize column names to lowercase
+        world.columns = world.columns.str.lower()
+        # Expand query
+        locations = QueryEnhancer.expand_location('brics')
+        # Filter data (try both 'name' and 'admin' columns)
+        name_col = 'name' if 'name' in world.columns else 'admin'
+        filtered = world[world[name_col].isin(locations)]
+        assert not filtered.empty
+        assert len(filtered) > 0
+    def test_data_enrichment_pipeline(self):
+        """Test data enrichment pipeline"""
+        # Load data
+        world = gpd.read_file(NATURAL_EARTH_SHP)
+        # Normalize column names to lowercase
+        world.columns = world.columns.str.lower()
+        # Take sample
+        sample = world.head(10)
+        # Enrich
+        enriched = DataEnhancer.enrich_dataframe(sample, 'economic')
+        assert 'gdp_growth' in enriched.columns
+        assert 'unemployment' in enriched.columns
+# Sample query test cases
+SAMPLE_QUERIES = [
+    "Show me population of Asian countries",
+    "Compare GDP of European nations",
+    "What's the population density in Africa?",
+    "Display economic indicators for South American countries",
+    "Show me top 10 countries by GDP",
+    "Compare BRICS nations",
+    "Environmental data for G7 countries"
+]
+class TestQueryParsing:
+    """Test query parsing (mock LLM responses)"""
+    def test_query_keywords(self):
+        """Test that queries contain expected keywords"""
+        for query in SAMPLE_QUERIES:
+            assert len(query) > 0
+            assert any(continent in query.lower() for continent in
+                        ['asian', 'european', 'africa', 'south american', 'brics', 'g7']) or \
+                   any(indicator in query.lower() for indicator in
+                        ['population', 'gdp', 'economic', 'environmental'])
+if __name__ == "__main__":
+    # Run tests
+    pytest.main([__file__, '-v'])