dialect-map / README.md
Kakashi75's picture
add:Docker file
702c2ca
metadata
title: Telugu Dialect Map
emoji: πŸ—ΊοΈ
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
license: mit

Telugu Dialect Map

Interactive web-based visualization of Telugu dialect words across Telangana and Andhra Pradesh districts.

🎯 Features

  • 48 Districts: 33 Telangana + 15 Andhra Pradesh districts
  • Dynamic Data Loading: Automatically loads data from JSON sources
  • Interactive Map: Click districts to explore local vocabulary, meanings, and sources
  • Rich Content: 3000+ verified dialect terms from crowdsourced and JSONL data
  • Zero Build Required: Pure static site with automatic data loading
  • Google Sheets Integration: Automated synchronization with Google Sheets

πŸš€ Deployment Options

Option 1: Hugging Face Spaces (Recommended for Public Access)

Deploy to Hugging Face Spaces with continuous automation:

  1. Create a Space on Hugging Face
  2. Configure secrets for config.json and credentials.json
  3. Push your code to the Space repository
  4. Access your app at https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE

πŸ“– Complete HF Spaces Setup Guide β†’

Option 2: Local Development

πŸš€ How to Run

Complete Setup (First Time)

# 1. Navigate to project directory
cd /home/kashikuldeep/Desktop/swechaworkspace/dilacet-map/indian-dialects-maps

# 2. Activate virtual environment
source venv/bin/activate

# 3. Start automation (syncs Google Sheets every 5 minutes)
python scripts/automation_runner.py &

# 4. Start web server (in new terminal or background)
python -m http.server 8080 &

# 5. Open in browser
# Navigate to: http://localhost:8080/

Quick Run (After First Time)

# Activate virtual environment
source venv/bin/activate

# Start both services
python scripts/automation_runner.py &
python -m http.server 8080 &

# Open: http://localhost:8080/

Stop Services

# Press Ctrl+C in the terminals, or:
pkill -f automation_runner
pkill -f "http.server 8080"

Important: Use http://localhost:8080/ not file:// URLs (browser security blocks JSON loading from file://)


🎯 What You'll See

  1. Automation Console: Shows sync status every 5 minutes
  2. Web Interface: Interactive map with 48 districts (33 Telangana + 15 Andhra Pradesh)
  3. Auto-Updates: Edit Google Sheets β†’ Changes appear within 5 minutes!

πŸ€– Automated Data Updates

The project includes automated synchronization from Google Sheets:

  • File Watcher: Automatically converts CSV β†’ JSON when files change
  • Google Sheets Sync: Downloads sheet data every 5 minutes
  • Zero Manual Work: Update your Google Sheet and changes appear automatically!

Configuration

Your automation is already configured for:

  • Processed Dialects Sheet: 901 rows
  • Digiwords Sheet: 178 rows
  • Sync Interval: Every 5 minutes

To modify settings, edit config.json:

{
  "google_sheets": {
    "enabled": true,
    "sync_interval_minutes": 5,
    "spreadsheets": [...]
  }
}

πŸ“– Full Automation Setup Guide β†’

πŸ“‚ Project Structure

indian-dialects-maps/
β”œβ”€β”€ index.html                      # Main visualization (open this via http server)
β”œβ”€β”€ data/
β”‚   └── processed/
β”‚       β”œβ”€β”€ processed_dialects.json      # JSONL-processed dialect data
β”‚       └── digiwords_grouped.json       # Crowdsourced dialect data
β”œβ”€β”€ sheets_output/                       # CSV files (auto-converted to JSON)
β”‚   β”œβ”€β”€ processed_dialects.csv
β”‚   └── digiwords_grouped.csv
β”œβ”€β”€ scripts/                             # Automation scripts (NEW!)
β”‚   β”œβ”€β”€ csv_to_json.py                   # CSV β†’ JSON converter
β”‚   β”œβ”€β”€ sheets_sync.py                   # Google Sheets downloader
β”‚   β”œβ”€β”€ file_watcher.py                  # Auto-conversion on file changes
β”‚   └── automation_runner.py             # Main automation orchestrator
β”œβ”€β”€ config.json                          # Automation configuration
β”œβ”€β”€ requirements.txt                     # Python dependencies
β”œβ”€β”€ AUTOMATION_SETUP.md                  # Detailed setup guide
└── README.md

πŸ”„ How It Works

Manual Mode (Original)

  1. Load index.html: Contains hardcoded data for 33 Telangana districts
  2. Fetch processed_dialects.json: Enhances/adds districts from JSONL data
  3. Fetch digiwords_grouped.json:
    • Merges additional words into Telangana districts
    • Automatically adds 15 Andhra Pradesh districts with coordinates
  4. Render: All districts appear on the map with merged data

Automated Mode (NEW!)

  1. Google Sheets: Update your dialect data in Google Sheets
  2. Auto-Sync: Script downloads sheets as CSV (every 5 min)
  3. File Watcher: Detects CSV changes
  4. Auto-Convert: CSV files β†’ JSON format
  5. Browser: Refresh to see updates on the map!

Flow:

Google Sheet β†’ CSV (sheets_output/) β†’ JSON (data/processed/) β†’ Browser (index.html)
     ↓              ↓                      ↓
  Manual Edit   File Watcher         Auto-Refresh

Smart Merging:

  • Existing districts β†’ Appends new words
  • New AP districts β†’ Creates markers automatically
  • Graceful fallback if JSON files are missing

πŸ“Š Current Data Coverage

State Districts Words Sources
Telangana 33 2000+ Hardcoded + JSONL + Digiwords
Andhra Pradesh 15 1000+ Digiwords (crowdsourced)
Total 48 3000+ Multiple sources

Andhra Pradesh Districts:

Anantapur, Annamayya, Chittoor, East Godavari, Eluru, Kadapa, Kurnool, Nandyal, Ongole, Tirupati, Srikakulam, Visakhapatnam, Vizianagaram, West Godavari, Rayalaseema

🎨 Adding New Data

Update Existing JSON Files

Edit data/processed/digiwords_grouped.json:

{
  "Telangana": {
    "YourDistrict": [
      {"t": "ఀెలుగుΰ°ͺΰ°¦ΰ°‚", "m": "meaning", "s": "Crowd"}
    ]
  },
  "Andhra Pradesh": {
    "YourDistrict": [
      {"t": "ఀెలుగుΰ°ͺΰ°¦ΰ°‚", "m": "meaning", "s": "Crowd"}
    ]
  }
}

Edit data/processed/processed_dialects.json:

[
  {
    "name": "YourDistrict",
    "region": "Region Name",
    "words": [
      {"t": "ఀెలుగుΰ°ͺΰ°¦ΰ°‚", "m": "meaning", "s": "Source"}
    ]
  }
]

Then refresh the browser!

Add New AP District Coordinates

If adding a new Andhra Pradesh district, update index.html:

const AP_COORDINATES = {
    "YourDistrict": { 
        lat: 00.0000, 
        lng: 00.0000, 
        region: "Region Name", 
        history: "Historical context..." 
    },
    // ...
};

πŸ› οΈ Technical Stack

  • Frontend: HTML5, Vanilla JavaScript, Leaflet.js
  • Data Format: JSON (pre-processed)
  • Map Library: Leaflet with CartoDB basemap
  • Server: Any HTTP server (Python, Node, etc.)
  • No Build Step: Pure static site

πŸ› Troubleshooting

Data Not Loading?

βœ… Check #1: Are you using http://localhost:8000/?

  • ❌ Don't use file:// URLs
  • βœ… Use an HTTP server

βœ… Check #2: Is the server running?

python3 -m http.server 8000

βœ… Check #3: Hard refresh the page

  • Press Ctrl + Shift + R (Windows/Linux)
  • Press Cmd + Shift + R (Mac)

βœ… Check #4: Check browser console (F12)

  • Look for fetch errors
  • Should see loading messages

Server Won't Start?

# Kill process on port 8000
lsof -ti:8000 | xargs kill -9

# Start fresh
python3 -m http.server 8000

Still Not Working?

  1. Open browser console (F12)
  2. Look for error messages
  3. Check that JSON files exist in data/processed/
  4. Verify you're on the correct URL (localhost:8000 not 127.0.0.1 if redirects are weird)

πŸ“ Technical Notes

  • Telugu text uses web fonts (Poppins, Ramabhadra)
  • Console shows detailed merge logs for debugging
  • Map centered at (16.5Β°N, 79.8Β°E) to show both states
  • Zoom level: 6.5 (fits both Telangana and AP)
  • Data loads asynchronously with async/await

🀝 Contributing

To add more dialect data:

  1. Edit the JSON files in data/processed/
  2. Refresh http://localhost:8000/
  3. That's it!

Made with ❀️ for preserving Telugu linguistic heritage across Telangana and Andhra Pradesh