Spaces:

team-dialect-map
/

dialect-map

Running

App Files Files Community

dialect-map / README.md

Kakashi75

add:Docker file

702c2ca 9 days ago

preview code

raw

history blame contribute delete

8.46 kB

	---
	title: Telugu Dialect Map
	emoji: 🗺️
	colorFrom: blue
	colorTo: purple
	sdk: docker
	pinned: false
	license: mit
	---

	# Telugu Dialect Map

	Interactive web-based visualization of Telugu dialect words across Telangana and Andhra Pradesh districts.

	## 🎯 Features

	- 48 Districts: 33 Telangana + 15 Andhra Pradesh districts
	- Dynamic Data Loading: Automatically loads data from JSON sources
	- Interactive Map: Click districts to explore local vocabulary, meanings, and sources
	- Rich Content: 3000+ verified dialect terms from crowdsourced and JSONL data
	- Zero Build Required: Pure static site with automatic data loading
	- Google Sheets Integration: Automated synchronization with Google Sheets

	## 🚀 Deployment Options

	### Option 1: Hugging Face Spaces (Recommended for Public Access)

	Deploy to Hugging Face Spaces with continuous automation:

	1. Create a Space on [Hugging Face](https://huggingface.co/spaces)
	2. Configure secrets for `config.json` and `credentials.json`
	3. Push your code to the Space repository
	4. Access your app at `https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE`

	📖 [Complete HF Spaces Setup Guide →](SECRETS_SETUP.md)

	### Option 2: Local Development

	## 🚀 How to Run

	### Complete Setup (First Time)

	```bash
	# 1. Navigate to project directory
	cd /home/kashikuldeep/Desktop/swechaworkspace/dilacet-map/indian-dialects-maps

	# 2. Activate virtual environment
	source venv/bin/activate

	# 3. Start automation (syncs Google Sheets every 5 minutes)
	python scripts/automation_runner.py &

	# 4. Start web server (in new terminal or background)
	python -m http.server 8080 &

	# 5. Open in browser
	# Navigate to: http://localhost:8080/
	```

	### Quick Run (After First Time)

	```bash
	# Activate virtual environment
	source venv/bin/activate

	# Start both services
	python scripts/automation_runner.py &
	python -m http.server 8080 &

	# Open: http://localhost:8080/
	```

	### Stop Services

	```bash
	# Press Ctrl+C in the terminals, or:
	pkill -f automation_runner
	pkill -f "http.server 8080"
	```

	Important: Use `http://localhost:8080/` not `file://` URLs (browser security blocks JSON loading from file://)

	---

	## 🎯 What You'll See

	1. Automation Console: Shows sync status every 5 minutes
	2. Web Interface: Interactive map with 48 districts (33 Telangana + 15 Andhra Pradesh)
	3. Auto-Updates: Edit Google Sheets → Changes appear within 5 minutes!

	## 🤖 Automated Data Updates

	The project includes automated synchronization from Google Sheets:

	- File Watcher: Automatically converts CSV → JSON when files change
	- Google Sheets Sync: Downloads sheet data every 5 minutes
	- Zero Manual Work: Update your Google Sheet and changes appear automatically!

	### Configuration

	Your automation is already configured for:
	- Processed Dialects Sheet: 901 rows
	- Digiwords Sheet: 178 rows
	- Sync Interval: Every 5 minutes

	To modify settings, edit `config.json`:
	```json
	{
	"google_sheets": {
	"enabled": true,
	"sync_interval_minutes": 5,
	"spreadsheets": [...]
	}
	}
	```

	📖 [Full Automation Setup Guide →](AUTOMATION_SETUP.md)

	## 📂 Project Structure

	```
	indian-dialects-maps/
	├── index.html # Main visualization (open this via http server)
	├── data/
	│ └── processed/
	│ ├── processed_dialects.json # JSONL-processed dialect data
	│ └── digiwords_grouped.json # Crowdsourced dialect data
	├── sheets_output/ # CSV files (auto-converted to JSON)
	│ ├── processed_dialects.csv
	│ └── digiwords_grouped.csv
	├── scripts/ # Automation scripts (NEW!)
	│ ├── csv_to_json.py # CSV → JSON converter
	│ ├── sheets_sync.py # Google Sheets downloader
	│ ├── file_watcher.py # Auto-conversion on file changes
	│ └── automation_runner.py # Main automation orchestrator
	├── config.json # Automation configuration
	├── requirements.txt # Python dependencies
	├── AUTOMATION_SETUP.md # Detailed setup guide
	└── README.md
	```

	## 🔄 How It Works

	### Manual Mode (Original)
	1. Load `index.html`: Contains hardcoded data for 33 Telangana districts
	2. Fetch `processed_dialects.json`: Enhances/adds districts from JSONL data
	3. Fetch `digiwords_grouped.json`:
	- Merges additional words into Telangana districts
	- Automatically adds 15 Andhra Pradesh districts with coordinates
	4. Render: All districts appear on the map with merged data

	### Automated Mode (NEW!)
	1. Google Sheets: Update your dialect data in Google Sheets
	2. Auto-Sync: Script downloads sheets as CSV (every 5 min)
	3. File Watcher: Detects CSV changes
	4. Auto-Convert: CSV files → JSON format
	5. Browser: Refresh to see updates on the map!

	Flow:
	```
	Google Sheet → CSV (sheets_output/) → JSON (data/processed/) → Browser (index.html)
	↓ ↓ ↓
	Manual Edit File Watcher Auto-Refresh
	```

	Smart Merging:
	- Existing districts → Appends new words
	- New AP districts → Creates markers automatically
	- Graceful fallback if JSON files are missing

	## 📊 Current Data Coverage

	\| State \| Districts \| Words \| Sources \|
	\|-------\|-----------\|-------\|---------\|
	\| Telangana \| 33 \| 2000+ \| Hardcoded + JSONL + Digiwords \|
	\| Andhra Pradesh \| 15 \| 1000+ \| Digiwords (crowdsourced) \|
	\| Total \| 48 \| 3000+ \| Multiple sources \|

	### Andhra Pradesh Districts:
	Anantapur, Annamayya, Chittoor, East Godavari, Eluru, Kadapa, Kurnool, Nandyal, Ongole, Tirupati, Srikakulam, Visakhapatnam, Vizianagaram, West Godavari, Rayalaseema

	## 🎨 Adding New Data

	### Update Existing JSON Files

	Edit `data/processed/digiwords_grouped.json`:
	```json
	{
	"Telangana": {
	"YourDistrict": [
	{"t": "తెలుగుపదం", "m": "meaning", "s": "Crowd"}
	]
	},
	"Andhra Pradesh": {
	"YourDistrict": [
	{"t": "తెలుగుపదం", "m": "meaning", "s": "Crowd"}
	]
	}
	}
	```

	Edit `data/processed/processed_dialects.json`:
	```json
	[
	{
	"name": "YourDistrict",
	"region": "Region Name",
	"words": [
	{"t": "తెలుగుపదం", "m": "meaning", "s": "Source"}
	]
	}
	]
	```

	Then refresh the browser!

	### Add New AP District Coordinates

	If adding a new Andhra Pradesh district, update `index.html`:

	```javascript
	const AP_COORDINATES = {
	"YourDistrict": {
	lat: 00.0000,
	lng: 00.0000,
	region: "Region Name",
	history: "Historical context..."
	},
	// ...
	};
	```

	## 🛠️ Technical Stack

	- Frontend: HTML5, Vanilla JavaScript, Leaflet.js
	- Data Format: JSON (pre-processed)
	- Map Library: Leaflet with CartoDB basemap
	- Server: Any HTTP server (Python, Node, etc.)
	- No Build Step: Pure static site

	## 🐛 Troubleshooting

	### Data Not Loading?

	✅ Check #1: Are you using `http://localhost:8000/`?
	- ❌ Don't use `file://` URLs
	- ✅ Use an HTTP server

	✅ Check #2: Is the server running?
	```bash
	python3 -m http.server 8000
	```

	✅ Check #3: Hard refresh the page
	- Press `Ctrl + Shift + R` (Windows/Linux)
	- Press `Cmd + Shift + R` (Mac)

	✅ Check #4: Check browser console (F12)
	- Look for fetch errors
	- Should see loading messages

	### Server Won't Start?

	```bash
	# Kill process on port 8000
	lsof -ti:8000 \| xargs kill -9

	# Start fresh
	python3 -m http.server 8000
	```

	### Still Not Working?

	1. Open browser console (F12)
	2. Look for error messages
	3. Check that JSON files exist in `data/processed/`
	4. Verify you're on the correct URL (`localhost:8000` not `127.0.0.1` if redirects are weird)

	## 📝 Technical Notes

	- Telugu text uses web fonts (Poppins, Ramabhadra)
	- Console shows detailed merge logs for debugging
	- Map centered at (16.5°N, 79.8°E) to show both states
	- Zoom level: 6.5 (fits both Telangana and AP)
	- Data loads asynchronously with `async/await`

	## 🤝 Contributing

	To add more dialect data:
	1. Edit the JSON files in `data/processed/`
	2. Refresh `http://localhost:8000/`
	3. That's it!

	---

	Made with ❤️ for preserving Telugu linguistic heritage across Telangana and Andhra Pradesh