dialect-map / README.md
Kakashi75's picture
add:Docker file
702c2ca
---
title: Telugu Dialect Map
emoji: πŸ—ΊοΈ
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
license: mit
---
# Telugu Dialect Map
**Interactive web-based visualization of Telugu dialect words across Telangana and Andhra Pradesh districts.**
## 🎯 Features
- **48 Districts**: 33 Telangana + 15 Andhra Pradesh districts
- **Dynamic Data Loading**: Automatically loads data from JSON sources
- **Interactive Map**: Click districts to explore local vocabulary, meanings, and sources
- **Rich Content**: 3000+ verified dialect terms from crowdsourced and JSONL data
- **Zero Build Required**: Pure static site with automatic data loading
- **Google Sheets Integration**: Automated synchronization with Google Sheets
## πŸš€ Deployment Options
### Option 1: Hugging Face Spaces (Recommended for Public Access)
Deploy to Hugging Face Spaces with continuous automation:
1. **Create a Space** on [Hugging Face](https://huggingface.co/spaces)
2. **Configure secrets** for `config.json` and `credentials.json`
3. **Push your code** to the Space repository
4. **Access your app** at `https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE`
πŸ“– **[Complete HF Spaces Setup Guide β†’](SECRETS_SETUP.md)**
### Option 2: Local Development
## πŸš€ How to Run
### **Complete Setup (First Time)**
```bash
# 1. Navigate to project directory
cd /home/kashikuldeep/Desktop/swechaworkspace/dilacet-map/indian-dialects-maps
# 2. Activate virtual environment
source venv/bin/activate
# 3. Start automation (syncs Google Sheets every 5 minutes)
python scripts/automation_runner.py &
# 4. Start web server (in new terminal or background)
python -m http.server 8080 &
# 5. Open in browser
# Navigate to: http://localhost:8080/
```
### **Quick Run (After First Time)**
```bash
# Activate virtual environment
source venv/bin/activate
# Start both services
python scripts/automation_runner.py &
python -m http.server 8080 &
# Open: http://localhost:8080/
```
### **Stop Services**
```bash
# Press Ctrl+C in the terminals, or:
pkill -f automation_runner
pkill -f "http.server 8080"
```
**Important:** Use `http://localhost:8080/` not `file://` URLs (browser security blocks JSON loading from file://)
---
## 🎯 What You'll See
1. **Automation Console**: Shows sync status every 5 minutes
2. **Web Interface**: Interactive map with 48 districts (33 Telangana + 15 Andhra Pradesh)
3. **Auto-Updates**: Edit Google Sheets β†’ Changes appear within 5 minutes!
## πŸ€– Automated Data Updates
The project includes **automated synchronization** from Google Sheets:
- **File Watcher**: Automatically converts CSV β†’ JSON when files change
- **Google Sheets Sync**: Downloads sheet data every 5 minutes
- **Zero Manual Work**: Update your Google Sheet and changes appear automatically!
### Configuration
Your automation is already configured for:
- **Processed Dialects Sheet**: 901 rows
- **Digiwords Sheet**: 178 rows
- **Sync Interval**: Every 5 minutes
To modify settings, edit `config.json`:
```json
{
"google_sheets": {
"enabled": true,
"sync_interval_minutes": 5,
"spreadsheets": [...]
}
}
```
πŸ“– **[Full Automation Setup Guide β†’](AUTOMATION_SETUP.md)**
## πŸ“‚ Project Structure
```
indian-dialects-maps/
β”œβ”€β”€ index.html # Main visualization (open this via http server)
β”œβ”€β”€ data/
β”‚ └── processed/
β”‚ β”œβ”€β”€ processed_dialects.json # JSONL-processed dialect data
β”‚ └── digiwords_grouped.json # Crowdsourced dialect data
β”œβ”€β”€ sheets_output/ # CSV files (auto-converted to JSON)
β”‚ β”œβ”€β”€ processed_dialects.csv
β”‚ └── digiwords_grouped.csv
β”œβ”€β”€ scripts/ # Automation scripts (NEW!)
β”‚ β”œβ”€β”€ csv_to_json.py # CSV β†’ JSON converter
β”‚ β”œβ”€β”€ sheets_sync.py # Google Sheets downloader
β”‚ β”œβ”€β”€ file_watcher.py # Auto-conversion on file changes
β”‚ └── automation_runner.py # Main automation orchestrator
β”œβ”€β”€ config.json # Automation configuration
β”œβ”€β”€ requirements.txt # Python dependencies
β”œβ”€β”€ AUTOMATION_SETUP.md # Detailed setup guide
└── README.md
```
## πŸ”„ How It Works
### Manual Mode (Original)
1. **Load `index.html`**: Contains hardcoded data for 33 Telangana districts
2. **Fetch `processed_dialects.json`**: Enhances/adds districts from JSONL data
3. **Fetch `digiwords_grouped.json`**:
- Merges additional words into Telangana districts
- Automatically adds 15 Andhra Pradesh districts with coordinates
4. **Render**: All districts appear on the map with merged data
### Automated Mode (NEW!)
1. **Google Sheets**: Update your dialect data in Google Sheets
2. **Auto-Sync**: Script downloads sheets as CSV (every 5 min)
3. **File Watcher**: Detects CSV changes
4. **Auto-Convert**: CSV files β†’ JSON format
5. **Browser**: Refresh to see updates on the map!
**Flow:**
```
Google Sheet β†’ CSV (sheets_output/) β†’ JSON (data/processed/) β†’ Browser (index.html)
↓ ↓ ↓
Manual Edit File Watcher Auto-Refresh
```
**Smart Merging:**
- Existing districts β†’ Appends new words
- New AP districts β†’ Creates markers automatically
- Graceful fallback if JSON files are missing
## πŸ“Š Current Data Coverage
| State | Districts | Words | Sources |
|-------|-----------|-------|---------|
| **Telangana** | 33 | 2000+ | Hardcoded + JSONL + Digiwords |
| **Andhra Pradesh** | 15 | 1000+ | Digiwords (crowdsourced) |
| **Total** | **48** | **3000+** | Multiple sources |
### Andhra Pradesh Districts:
Anantapur, Annamayya, Chittoor, East Godavari, Eluru, Kadapa, Kurnool, Nandyal, Ongole, Tirupati, Srikakulam, Visakhapatnam, Vizianagaram, West Godavari, Rayalaseema
## 🎨 Adding New Data
### Update Existing JSON Files
**Edit `data/processed/digiwords_grouped.json`:**
```json
{
"Telangana": {
"YourDistrict": [
{"t": "ఀెలుగుΰ°ͺΰ°¦ΰ°‚", "m": "meaning", "s": "Crowd"}
]
},
"Andhra Pradesh": {
"YourDistrict": [
{"t": "ఀెలుగుΰ°ͺΰ°¦ΰ°‚", "m": "meaning", "s": "Crowd"}
]
}
}
```
**Edit `data/processed/processed_dialects.json`:**
```json
[
{
"name": "YourDistrict",
"region": "Region Name",
"words": [
{"t": "ఀెలుగుΰ°ͺΰ°¦ΰ°‚", "m": "meaning", "s": "Source"}
]
}
]
```
Then refresh the browser!
### Add New AP District Coordinates
If adding a new Andhra Pradesh district, update `index.html`:
```javascript
const AP_COORDINATES = {
"YourDistrict": {
lat: 00.0000,
lng: 00.0000,
region: "Region Name",
history: "Historical context..."
},
// ...
};
```
## πŸ› οΈ Technical Stack
- **Frontend**: HTML5, Vanilla JavaScript, Leaflet.js
- **Data Format**: JSON (pre-processed)
- **Map Library**: Leaflet with CartoDB basemap
- **Server**: Any HTTP server (Python, Node, etc.)
- **No Build Step**: Pure static site
## πŸ› Troubleshooting
### Data Not Loading?
βœ… **Check #1:** Are you using `http://localhost:8000/`?
- ❌ Don't use `file://` URLs
- βœ… Use an HTTP server
βœ… **Check #2:** Is the server running?
```bash
python3 -m http.server 8000
```
βœ… **Check #3:** Hard refresh the page
- Press `Ctrl + Shift + R` (Windows/Linux)
- Press `Cmd + Shift + R` (Mac)
βœ… **Check #4:** Check browser console (F12)
- Look for fetch errors
- Should see loading messages
### Server Won't Start?
```bash
# Kill process on port 8000
lsof -ti:8000 | xargs kill -9
# Start fresh
python3 -m http.server 8000
```
### Still Not Working?
1. Open browser console (F12)
2. Look for error messages
3. Check that JSON files exist in `data/processed/`
4. Verify you're on the correct URL (`localhost:8000` not `127.0.0.1` if redirects are weird)
## πŸ“ Technical Notes
- Telugu text uses web fonts (Poppins, Ramabhadra)
- Console shows detailed merge logs for debugging
- Map centered at (16.5Β°N, 79.8Β°E) to show both states
- Zoom level: 6.5 (fits both Telangana and AP)
- Data loads asynchronously with `async/await`
## 🀝 Contributing
To add more dialect data:
1. Edit the JSON files in `data/processed/`
2. Refresh `http://localhost:8000/`
3. That's it!
---
**Made with ❀️ for preserving Telugu linguistic heritage across Telangana and Andhra Pradesh**