Kakashi75 commited on
Commit
c7297b8
ยท
1 Parent(s): a25ac93

Made the intial files ready for the hf workflow

Browse files
Files changed (8) hide show
  1. .env.example +12 -0
  2. .gitignore +1 -0
  3. QUICKSTART.md +109 -0
  4. README.md +16 -0
  5. SECRETS_SETUP.md +361 -0
  6. app.py +184 -0
  7. config.json.example +27 -0
  8. credentials.json.example +12 -0
.env.example ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Hugging Face Spaces Environment Variables
2
+ # Copy this file to .env for local testing
3
+ # On HF Spaces, set these as Space secrets
4
+
5
+ # Full JSON content of config.json (paste the entire JSON as one line or use multiline)
6
+ HF_CONFIG_JSON='{"google_sheets":{"enabled":true,"sync_interval_minutes":5,"credentials_file":"credentials.json","spreadsheets":[{"id":"YOUR_SPREADSHEET_ID_1","sheet_name":"processed_dialects","output_file":"sheets_output/processed_dialects.csv"},{"id":"YOUR_SPREADSHEET_ID_2","sheet_name":"digiwords_grouped","output_file":"sheets_output/digiwords_grouped.csv"}]},"file_watcher":{"enabled":true,"watch_directory":"sheets_output","file_patterns":["*.csv"]},"output":{"json_directory":"data/processed"}}'
7
+
8
+ # Full JSON content of credentials.json (paste the entire service account JSON)
9
+ HF_CREDENTIALS_JSON='{"type":"service_account","project_id":"your-project","private_key_id":"...","private_key":"-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n","client_email":"...@...iam.gserviceaccount.com","client_id":"...","auth_uri":"https://accounts.google.com/o/oauth2/auth","token_uri":"https://oauth2.googleapis.com/token","auth_provider_x509_cert_url":"https://www.googleapis.com/oauth2/v1/certs","client_x509_cert_url":"..."}'
10
+
11
+ # Optional: Port for local development (HF Spaces uses 7860 by default)
12
+ PORT=7860
.gitignore CHANGED
@@ -31,6 +31,7 @@ token.json
31
 
32
  # Environment variables
33
  .env
 
34
 
35
  # IDE
36
  .vscode/
 
31
 
32
  # Environment variables
33
  .env
34
+ .env.local
35
 
36
  # IDE
37
  .vscode/
QUICKSTART.md ADDED
@@ -0,0 +1,109 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Quick Start Guide
2
+
3
+ ## For Local Development
4
+
5
+ ### 1. Install Dependencies
6
+
7
+ ```bash
8
+ cd /home/kashikuldeep/Desktop/dialect-map
9
+
10
+ # Install Python dependencies
11
+ pip install -r requirements.txt
12
+ # OR if you have externally managed Python:
13
+ pipx install -r requirements.txt
14
+ ```
15
+
16
+ ### 2. Configure Secrets
17
+
18
+ Choose ONE of these methods:
19
+
20
+ #### Method A: Using actual config files (easier for local dev)
21
+
22
+ ```bash
23
+ # Copy templates
24
+ cp config.json.example config.json
25
+ cp credentials.json.example credentials.json
26
+
27
+ # Edit with your actual Google Cloud credentials
28
+ nano config.json # Update spreadsheet IDs
29
+ nano credentials.json # Paste your service account JSON
30
+ ```
31
+
32
+ #### Method B: Using environment variables (simulates HF Spaces)
33
+
34
+ ```bash
35
+ # Copy .env template
36
+ cp .env.example .env
37
+
38
+ # Edit .env with your actual JSON content
39
+ nano .env
40
+
41
+ # Set your actual spreadsheet IDs and credentials
42
+ ```
43
+
44
+ ### 3. Run the Application
45
+
46
+ ```bash
47
+ # Start the app
48
+ python3 app.py
49
+
50
+ # Open your browser to:
51
+ # http://localhost:7860/index.html
52
+ ```
53
+
54
+ The app will:
55
+ - โœ… Load secrets from environment or files
56
+ - โœ… Start Google Sheets sync automation (every 5 minutes)
57
+ - โœ… Serve the interactive map on port 7860
58
+
59
+ ---
60
+
61
+ ## For Hugging Face Spaces Deployment
62
+
63
+ See **[SECRETS_SETUP.md](SECRETS_SETUP.md)** for complete deployment instructions.
64
+
65
+ **Quick summary:**
66
+ 1. Create a new Space on Hugging Face
67
+ 2. Add two secrets in Space settings:
68
+ - `HF_CONFIG_JSON` - your entire config.json content
69
+ - `HF_CREDENTIALS_JSON` - your entire credentials.json content
70
+ 3. Push your code to the Space
71
+ 4. Access your live app!
72
+
73
+ ---
74
+
75
+ ## Files You Need
76
+
77
+ | File | Purpose | How to Get |
78
+ |------|---------|------------|
79
+ | `config.json` | App configuration with spreadsheet IDs | Copy from `config.json.example` and edit |
80
+ | `credentials.json` | Google service account credentials | Download from Google Cloud Console |
81
+
82
+ **Important:** These files are in `.gitignore` - never commit them!
83
+
84
+ ---
85
+
86
+ ## Troubleshooting
87
+
88
+ **"ModuleNotFoundError: No module named 'google'"**
89
+ - Install dependencies: `pip install -r requirements.txt`
90
+
91
+ **"HF_CONFIG_JSON not found"**
92
+ - You need to either:
93
+ - Create `config.json` file locally, OR
94
+ - Set `HF_CONFIG_JSON` environment variable
95
+
96
+ **"Credentials file not found"**
97
+ - Follow [SECRETS_SETUP.md](SECRETS_SETUP.md) steps 1.1-1.4 to get credentials
98
+
99
+ ---
100
+
101
+ ## What Gets Created
102
+
103
+ When you run `app.py`:
104
+ - `config.json` - Created from `HF_CONFIG_JSON` env var (if set)
105
+ - `credentials.json` - Created from `HF_CREDENTIALS_JSON` env var (if set)
106
+ - `sheets_output/*.csv` - Downloaded from Google Sheets
107
+ - `data/processed/*.json` - Converted from CSV files
108
+
109
+ All of these are in `.gitignore` and safe to regenerate.
README.md CHANGED
@@ -9,6 +9,22 @@
9
  - **Interactive Map**: Click districts to explore local vocabulary, meanings, and sources
10
  - **Rich Content**: 3000+ verified dialect terms from crowdsourced and JSONL data
11
  - **Zero Build Required**: Pure static site with automatic data loading
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
 
13
  ## ๐Ÿš€ How to Run
14
 
 
9
  - **Interactive Map**: Click districts to explore local vocabulary, meanings, and sources
10
  - **Rich Content**: 3000+ verified dialect terms from crowdsourced and JSONL data
11
  - **Zero Build Required**: Pure static site with automatic data loading
12
+ - **Google Sheets Integration**: Automated synchronization with Google Sheets
13
+
14
+ ## ๐Ÿš€ Deployment Options
15
+
16
+ ### Option 1: Hugging Face Spaces (Recommended for Public Access)
17
+
18
+ Deploy to Hugging Face Spaces with continuous automation:
19
+
20
+ 1. **Create a Space** on [Hugging Face](https://huggingface.co/spaces)
21
+ 2. **Configure secrets** for `config.json` and `credentials.json`
22
+ 3. **Push your code** to the Space repository
23
+ 4. **Access your app** at `https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE`
24
+
25
+ ๐Ÿ“– **[Complete HF Spaces Setup Guide โ†’](SECRETS_SETUP.md)**
26
+
27
+ ### Option 2: Local Development
28
 
29
  ## ๐Ÿš€ How to Run
30
 
SECRETS_SETUP.md ADDED
@@ -0,0 +1,361 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Hugging Face Spaces Secrets Setup Guide
2
+
3
+ This guide explains how to configure and deploy your Telugu Dialect Map to Hugging Face Spaces with secure secrets management.
4
+
5
+ ## Overview
6
+
7
+ Your application requires two secret files:
8
+ - **`config.json`**: Configuration for Google Sheets sync and automation
9
+ - **`credentials.json`**: Google Cloud service account credentials
10
+
11
+ These files contain sensitive information and should NEVER be committed to git. Instead, we'll use Hugging Face Spaces **secrets** (environment variables) to store them securely.
12
+
13
+ ---
14
+
15
+ ## Step 1: Obtain Google Service Account Credentials
16
+
17
+ ### 1.1 Create a Google Cloud Project
18
+
19
+ 1. Go to [Google Cloud Console](https://console.cloud.google.com/)
20
+ 2. Create a new project or select an existing one
21
+ 3. Note your project ID
22
+
23
+ ### 1.2 Enable Google Sheets API
24
+
25
+ 1. In your project, go to **APIs & Services** โ†’ **Library**
26
+ 2. Search for "Google Sheets API"
27
+ 3. Click **Enable**
28
+
29
+ ### 1.3 Create a Service Account
30
+
31
+ 1. Go to **APIs & Services** โ†’ **Credentials**
32
+ 2. Click **Create Credentials** โ†’ **Service Account**
33
+ 3. Fill in the details:
34
+ - **Service account name**: `dialect-map-automation`
35
+ - **Service account ID**: (auto-generated)
36
+ - **Description**: "Service account for dialect map Google Sheets automation"
37
+ 4. Click **Create and Continue**
38
+ 5. Skip the optional steps (roles and user access)
39
+ 6. Click **Done**
40
+
41
+ ### 1.4 Create and Download Service Account Key
42
+
43
+ 1. Click on the service account you just created
44
+ 2. Go to the **Keys** tab
45
+ 3. Click **Add Key** โ†’ **Create new key**
46
+ 4. Select **JSON** format
47
+ 5. Click **Create**
48
+ 6. A JSON file will be downloaded - this is your `credentials.json`
49
+ 7. **Keep this file secure!** It provides full access to your Google Sheets
50
+
51
+ ### 1.5 Share Your Google Sheets with the Service Account
52
+
53
+ 1. Open your `credentials.json` file
54
+ 2. Find the `client_email` field (e.g., `dialect-map-automation@your-project.iam.gserviceaccount.com`)
55
+ 3. Copy this email address
56
+ 4. Open each Google Sheet you want to sync
57
+ 5. Click **Share** button
58
+ 6. Paste the service account email
59
+ 7. Give it **Editor** or **Viewer** access (Editor if you want to write data back)
60
+ 8. Click **Send**
61
+
62
+ ---
63
+
64
+ ## Step 2: Configure config.json
65
+
66
+ Create your `config.json` file with your specific settings:
67
+
68
+ ```json
69
+ {
70
+ "google_sheets": {
71
+ "enabled": true,
72
+ "sync_interval_minutes": 5,
73
+ "credentials_file": "credentials.json",
74
+ "spreadsheets": [
75
+ {
76
+ "id": "YOUR_ACTUAL_SPREADSHEET_ID_HERE",
77
+ "sheet_name": "processed_dialects",
78
+ "output_file": "sheets_output/processed_dialects.csv"
79
+ },
80
+ {
81
+ "id": "YOUR_ACTUAL_SPREADSHEET_ID_HERE",
82
+ "sheet_name": "digiwords_grouped",
83
+ "output_file": "sheets_output/digiwords_grouped.csv"
84
+ }
85
+ ]
86
+ },
87
+ "file_watcher": {
88
+ "enabled": true,
89
+ "watch_directory": "sheets_output",
90
+ "file_patterns": ["*.csv"]
91
+ },
92
+ "output": {
93
+ "json_directory": "data/processed"
94
+ }
95
+ }
96
+ ```
97
+
98
+ ### Finding Your Spreadsheet ID
99
+
100
+ Your Google Sheets URL looks like:
101
+ ```
102
+ https://docs.google.com/spreadsheets/d/1AbC123XyZ456_Example_ID/edit#gid=0
103
+ ^^^^^^^^^^^^^^^^^^^^^^^^^
104
+ This is your spreadsheet ID
105
+ ```
106
+
107
+ Copy the ID from your URL and replace `YOUR_ACTUAL_SPREADSHEET_ID_HERE` in the config.
108
+
109
+ ---
110
+
111
+ ## Step 3: Deploy to Hugging Face Spaces
112
+
113
+ ### 3.1 Create a New Space
114
+
115
+ 1. Go to [Hugging Face](https://huggingface.co/)
116
+ 2. Click your profile โ†’ **New Space**
117
+ 3. Fill in the details:
118
+ - **Space name**: `telugu-dialect-map` (or your choice)
119
+ - **License**: Choose appropriate license
120
+ - **Space SDK**: Select **Static** (we'll use a custom Python app)
121
+ - **Visibility**: Public or Private
122
+ 4. Click **Create Space**
123
+
124
+ ### 3.2 Push Your Code to the Space
125
+
126
+ You can either:
127
+
128
+ **Option A: Use Git**
129
+ ```bash
130
+ # Clone the Space repository
131
+ git clone https://huggingface.co/spaces/YOUR_USERNAME/telugu-dialect-map
132
+ cd telugu-dialect-map
133
+
134
+ # Copy your project files (excluding secrets!)
135
+ cp -r /path/to/dialect-map/* .
136
+
137
+ # Make sure .gitignore is in place
138
+ cat .gitignore # Should include config.json and credentials.json
139
+
140
+ # Commit and push
141
+ git add .
142
+ git commit -m "Initial commit"
143
+ git push
144
+ ```
145
+
146
+ **Option B: Upload via Web Interface**
147
+ 1. In your Space, click **Files** tab
148
+ 2. Click **Add file** โ†’ **Upload files**
149
+ 3. Select all your project files (EXCEPT `config.json` and `credentials.json`)
150
+ 4. Click **Commit changes**
151
+
152
+ ### 3.3 Add Secrets to Your Space
153
+
154
+ This is the **critical step** - we'll add your sensitive credentials as secrets.
155
+
156
+ 1. In your Space, click the **Settings** tab
157
+ 2. Scroll down to **Repository secrets**
158
+ 3. Add the following secrets:
159
+
160
+ #### Secret 1: HF_CONFIG_JSON
161
+
162
+ - **Name**: `HF_CONFIG_JSON`
163
+ - **Value**: Paste the **entire contents** of your `config.json` file
164
+
165
+ Example (all on one line):
166
+ ```
167
+ {"google_sheets":{"enabled":true,"sync_interval_minutes":5,"credentials_file":"credentials.json","spreadsheets":[{"id":"1AbC123XyZ456_Example","sheet_name":"processed_dialects","output_file":"sheets_output/processed_dialects.csv"}]},"file_watcher":{"enabled":true,"watch_directory":"sheets_output","file_patterns":["*.csv"]},"output":{"json_directory":"data/processed"}}
168
+ ```
169
+
170
+ #### Secret 2: HF_CREDENTIALS_JSON
171
+
172
+ - **Name**: `HF_CREDENTIALS_JSON`
173
+ - **Value**: Paste the **entire contents** of your `credentials.json` file
174
+
175
+ Example (all on one line, with escaped newlines in private key):
176
+ ```
177
+ {"type":"service_account","project_id":"your-project","private_key_id":"abc123...","private_key":"-----BEGIN PRIVATE KEY-----\\nMIIEvQIB...\\n-----END PRIVATE KEY-----\\n","client_email":"name@project.iam.gserviceaccount.com",...}
178
+ ```
179
+
180
+ 4. Click **Add secret** for each one
181
+
182
+ **Important Notes:**
183
+ - The entire JSON must be on one line (no newlines except in the `private_key` field where `\n` should be `\\n`)
184
+ - Make sure to escape special characters if needed
185
+ - You can use a JSON minifier tool to compact your JSON
186
+
187
+ ### 3.4 Rebuild Your Space
188
+
189
+ After adding secrets:
190
+ 1. Your Space should automatically rebuild
191
+ 2. Watch the **Logs** tab for any errors
192
+ 3. Once built, click **App** tab to view your running application
193
+
194
+ ---
195
+
196
+ ## Step 4: Verify Deployment
197
+
198
+ ### 4.1 Check Space Logs
199
+
200
+ 1. Go to your Space's **Logs** tab
201
+ 2. You should see:
202
+ ```
203
+ ๐Ÿ” Loading secrets from environment variables...
204
+ โœ… Created config.json from HF_CONFIG_JSON secret
205
+ โœ… Created credentials.json from HF_CREDENTIALS_JSON secret
206
+ ๐Ÿš€ Starting automation runner...
207
+ โœ… Automation runner started
208
+ ๐ŸŒ Starting web server on port 7860...
209
+ ```
210
+
211
+ 3. If you see errors, check:
212
+ - JSON formatting in your secrets
213
+ - Spreadsheet IDs are correct
214
+ - Service account has access to sheets
215
+
216
+ ### 4.2 Access Your Application
217
+
218
+ 1. Click the **App** tab
219
+ 2. You should see your Telugu Dialect Map interface
220
+ 3. The map should load with data from your Google Sheets
221
+
222
+ ### 4.3 Verify Automation
223
+
224
+ 1. Edit your Google Sheet (add/modify some dialect data)
225
+ 2. Wait 5 minutes (or your configured interval)
226
+ 3. Check the Space logs - you should see sync messages
227
+ 4. Refresh your app - changes should appear
228
+
229
+ ---
230
+
231
+ ## Troubleshooting
232
+
233
+ ### "HF_CONFIG_JSON not found in environment"
234
+
235
+ **Problem**: The secret wasn't added or has the wrong name.
236
+
237
+ **Solution**:
238
+ 1. Go to Space Settings โ†’ Repository secrets
239
+ 2. Verify the secret name is exactly `HF_CONFIG_JSON` (case-sensitive)
240
+ 3. Add it if missing
241
+ 4. Rebuild the Space
242
+
243
+ ### "Error parsing HF_CONFIG_JSON"
244
+
245
+ **Problem**: The JSON is malformed.
246
+
247
+ **Solution**:
248
+ 1. Copy your `config.json` content
249
+ 2. Use a JSON validator (e.g., [jsonlint.com](https://jsonlint.com/))
250
+ 3. Make sure it's valid JSON
251
+ 4. Remove all newlines (except `\n` in strings should become `\\n`)
252
+ 5. Update the secret with the corrected value
253
+
254
+ ### "Connection refused" or "Credentials invalid"
255
+
256
+ **Problem**: Google service account credentials are wrong or not shared.
257
+
258
+ **Solution**:
259
+ 1. Verify `credentials.json` content is correct
260
+ 2. Check that you shared your Google Sheets with the service account email
261
+ 3. Verify the Sheets API is enabled in Google Cloud Console
262
+ 4. Regenerate service account key if needed
263
+
264
+ ### "Automation not syncing"
265
+
266
+ **Problem**: Automation runner isn't working.
267
+
268
+ **Solution**:
269
+ 1. Check Space logs for error messages
270
+ 2. Verify `spreadsheets.id` values in config.json match your actual sheet IDs
271
+ 3. Verify `sheet_name` values match the tab names in your spreadsheets
272
+ 4. Check that service account has Editor access (not just Viewer)
273
+
274
+ ### Space keeps crashing or restarting
275
+
276
+ **Problem**: HF Spaces free tier may have resource limits.
277
+
278
+ **Solution**:
279
+ 1. Consider upgrading to a paid Space for guaranteed uptime
280
+ 2. Reduce sync interval (e.g., 15-30 minutes instead of 5)
281
+ 3. Check logs for memory/CPU issues
282
+
283
+ ---
284
+
285
+ ## Local Testing (Before Deploying)
286
+
287
+ Before deploying to HF Spaces, test locally:
288
+
289
+ ### 1. Create actual config files
290
+
291
+ ```bash
292
+ cd /home/kashikuldeep/Desktop/dialect-map
293
+
294
+ # Copy examples and fill in real values
295
+ cp config.json.example config.json
296
+ cp credentials.json.example credentials.json
297
+
298
+ # Edit with your actual values
299
+ nano config.json
300
+ nano credentials.json
301
+ ```
302
+
303
+ ### 2. Test the app
304
+
305
+ ```bash
306
+ # Run the app locally
307
+ python app.py
308
+
309
+ # Should see:
310
+ # โœ… Created config.json from HF_CONFIG_JSON secret (if using .env)
311
+ # ๐Ÿš€ Starting automation runner...
312
+ # ๐ŸŒ Starting web server on port 7860...
313
+
314
+ # Open browser to http://localhost:7860
315
+ ```
316
+
317
+ ### 3. Test with environment variables (simulating HF Spaces)
318
+
319
+ ```bash
320
+ # Create .env file
321
+ cp .env.example .env
322
+
323
+ # Edit .env with your actual JSON (minified)
324
+ nano .env
325
+
326
+ # Load environment and run
327
+ python -c "from dotenv import load_dotenv; load_dotenv()" && python app.py
328
+ ```
329
+
330
+ ---
331
+
332
+ ## Security Best Practices
333
+
334
+ 1. **Never commit secrets to git**
335
+ - Keep `config.json` and `credentials.json` in `.gitignore`
336
+ - Double-check before pushing code
337
+
338
+ 2. **Rotate credentials regularly**
339
+ - Generate new service account keys periodically
340
+ - Update HF Spaces secrets
341
+
342
+ 3. **Use minimal permissions**
343
+ - Service account should only have access to necessary sheets
344
+ - Use Viewer access if you don't need to write back
345
+
346
+ 4. **Monitor usage**
347
+ - Check Google Cloud Console for API usage
348
+ - Set up billing alerts
349
+ - Review Space logs regularly
350
+
351
+ ---
352
+
353
+ ## Need Help?
354
+
355
+ - **Google Cloud Issues**: [Google Cloud Support](https://cloud.google.com/support)
356
+ - **Hugging Face Spaces**: [HF Documentation](https://huggingface.co/docs/hub/spaces)
357
+ - **Project Issues**: Check the Space logs first, then review this guide
358
+
359
+ ---
360
+
361
+ **Happy Mapping! ๐Ÿ—บ๏ธ**
app.py ADDED
@@ -0,0 +1,184 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Hugging Face Spaces Entry Point
4
+ This script:
5
+ 1. Loads secrets from HF Spaces environment variables
6
+ 2. Creates config.json and credentials.json from those secrets
7
+ 3. Starts the automation runner in the background
8
+ 4. Serves the static web interface
9
+ """
10
+
11
+ import os
12
+ import json
13
+ import subprocess
14
+ import signal
15
+ import sys
16
+ import time
17
+ import threading
18
+ from pathlib import Path
19
+ from http.server import HTTPServer, SimpleHTTPRequestHandler
20
+ from functools import partial
21
+
22
+ # Configuration
23
+ PORT = int(os.getenv('PORT', 7860))
24
+ BASE_DIR = Path(__file__).parent
25
+ AUTOMATION_RUNNER = BASE_DIR / "scripts" / "automation_runner.py"
26
+
27
+ # Global process reference for cleanup
28
+ automation_process = None
29
+
30
+
31
+ def load_secrets_from_env():
32
+ """Load secrets from HF Spaces environment variables and create config files"""
33
+ print("๐Ÿ” Loading secrets from environment variables...")
34
+
35
+ # Load config.json from environment
36
+ config_json_str = os.getenv('HF_CONFIG_JSON')
37
+ if config_json_str:
38
+ try:
39
+ config_data = json.loads(config_json_str)
40
+ config_file = BASE_DIR / "config.json"
41
+ with open(config_file, 'w') as f:
42
+ json.dump(config_data, f, indent=2)
43
+ print(f"โœ… Created config.json from HF_CONFIG_JSON secret")
44
+ except json.JSONDecodeError as e:
45
+ print(f"โŒ Error parsing HF_CONFIG_JSON: {e}")
46
+ sys.exit(1)
47
+ else:
48
+ config_file = BASE_DIR / "config.json"
49
+ if not config_file.exists():
50
+ print("โš ๏ธ HF_CONFIG_JSON not found in environment")
51
+ print("โš ๏ธ Please set HF_CONFIG_JSON secret in your Hugging Face Space settings")
52
+ print("โš ๏ธ See SECRETS_SETUP.md for instructions")
53
+ # Don't exit - allow the app to run without automation
54
+
55
+ # Load credentials.json from environment
56
+ credentials_json_str = os.getenv('HF_CREDENTIALS_JSON')
57
+ if credentials_json_str:
58
+ try:
59
+ credentials_data = json.loads(credentials_json_str)
60
+ credentials_file = BASE_DIR / "credentials.json"
61
+ with open(credentials_file, 'w') as f:
62
+ json.dump(credentials_data, f, indent=2)
63
+ print(f"โœ… Created credentials.json from HF_CREDENTIALS_JSON secret")
64
+ except json.JSONDecodeError as e:
65
+ print(f"โŒ Error parsing HF_CREDENTIALS_JSON: {e}")
66
+ sys.exit(1)
67
+ else:
68
+ credentials_file = BASE_DIR / "credentials.json"
69
+ if not credentials_file.exists():
70
+ print("โš ๏ธ HF_CREDENTIALS_JSON not found in environment")
71
+ print("โš ๏ธ Google Sheets sync will not work without credentials")
72
+
73
+ print()
74
+
75
+
76
+ def start_automation():
77
+ """Start the automation runner in the background"""
78
+ global automation_process
79
+
80
+ # Check if automation script exists
81
+ if not AUTOMATION_RUNNER.exists():
82
+ print(f"โš ๏ธ Automation runner not found: {AUTOMATION_RUNNER}")
83
+ return
84
+
85
+ # Check if config exists
86
+ config_file = BASE_DIR / "config.json"
87
+ if not config_file.exists():
88
+ print("โš ๏ธ config.json not found, skipping automation startup")
89
+ return
90
+
91
+ print("๐Ÿš€ Starting automation runner...")
92
+ try:
93
+ automation_process = subprocess.Popen(
94
+ [sys.executable, str(AUTOMATION_RUNNER)],
95
+ stdout=subprocess.PIPE,
96
+ stderr=subprocess.STDOUT,
97
+ text=True,
98
+ bufsize=1
99
+ )
100
+
101
+ # Stream automation output in a separate thread
102
+ def stream_output():
103
+ for line in automation_process.stdout:
104
+ print(f"[AUTOMATION] {line}", end='')
105
+
106
+ threading.Thread(target=stream_output, daemon=True).start()
107
+ print("โœ… Automation runner started\n")
108
+ except Exception as e:
109
+ print(f"โŒ Failed to start automation: {e}\n")
110
+
111
+
112
+ def cleanup(signum=None, frame=None):
113
+ """Cleanup function to terminate background processes"""
114
+ global automation_process
115
+ print("\n\n๐Ÿ›‘ Shutting down...")
116
+
117
+ if automation_process:
118
+ print("๐Ÿงน Stopping automation runner...")
119
+ automation_process.terminate()
120
+ automation_process.wait(timeout=5)
121
+
122
+ print("โœ… Cleanup complete\n")
123
+ sys.exit(0)
124
+
125
+
126
+ class CustomHTTPRequestHandler(SimpleHTTPRequestHandler):
127
+ """Custom handler to serve from the correct directory"""
128
+
129
+ def __init__(self, *args, **kwargs):
130
+ super().__init__(*args, directory=str(BASE_DIR), **kwargs)
131
+
132
+ def log_message(self, format, *args):
133
+ """Custom logging to show requests"""
134
+ print(f"[WEB] {self.address_string()} - {format % args}")
135
+
136
+
137
+ def start_web_server():
138
+ """Start the HTTP server to serve the static files"""
139
+ print(f"๐ŸŒ Starting web server on port {PORT}...")
140
+
141
+ handler = CustomHTTPRequestHandler
142
+ httpd = HTTPServer(('0.0.0.0', PORT), handler)
143
+
144
+ print(f"โœ… Web server running at http://0.0.0.0:{PORT}")
145
+ print(f"๐Ÿ“Š Open the map: http://0.0.0.0:{PORT}/index.html")
146
+ print(f"๐Ÿ’ก Press Ctrl+C to stop\n")
147
+
148
+ try:
149
+ httpd.serve_forever()
150
+ except KeyboardInterrupt:
151
+ pass
152
+ finally:
153
+ httpd.shutdown()
154
+
155
+
156
+ def main():
157
+ """Main entry point"""
158
+ print("=" * 70)
159
+ print("๐Ÿ—บ๏ธ Telugu Dialect Map - Hugging Face Spaces")
160
+ print("=" * 70)
161
+ print()
162
+
163
+ # Register signal handlers for graceful shutdown
164
+ signal.signal(signal.SIGINT, cleanup)
165
+ signal.signal(signal.SIGTERM, cleanup)
166
+
167
+ # Load secrets from environment variables
168
+ load_secrets_from_env()
169
+
170
+ # Start background automation
171
+ start_automation()
172
+
173
+ # Give automation a moment to start
174
+ time.sleep(2)
175
+
176
+ # Start web server (blocks here)
177
+ start_web_server()
178
+
179
+ # Cleanup (if we ever get here)
180
+ cleanup()
181
+
182
+
183
+ if __name__ == "__main__":
184
+ main()
config.json.example ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "google_sheets": {
3
+ "enabled": true,
4
+ "sync_interval_minutes": 5,
5
+ "credentials_file": "credentials.json",
6
+ "spreadsheets": [
7
+ {
8
+ "id": "YOUR_SPREADSHEET_ID_1_HERE",
9
+ "sheet_name": "processed_dialects",
10
+ "output_file": "sheets_output/processed_dialects.csv"
11
+ },
12
+ {
13
+ "id": "YOUR_SPREADSHEET_ID_2_HERE",
14
+ "sheet_name": "digiwords_grouped",
15
+ "output_file": "sheets_output/digiwords_grouped.csv"
16
+ }
17
+ ]
18
+ },
19
+ "file_watcher": {
20
+ "enabled": true,
21
+ "watch_directory": "sheets_output",
22
+ "file_patterns": ["*.csv"]
23
+ },
24
+ "output": {
25
+ "json_directory": "data/processed"
26
+ }
27
+ }
credentials.json.example ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "type": "service_account",
3
+ "project_id": "your-project-id",
4
+ "private_key_id": "your-private-key-id",
5
+ "private_key": "-----BEGIN PRIVATE KEY-----\nYOUR_PRIVATE_KEY_HERE\n-----END PRIVATE KEY-----\n",
6
+ "client_email": "your-service-account@your-project.iam.gserviceaccount.com",
7
+ "client_id": "your-client-id",
8
+ "auth_uri": "https://accounts.google.com/o/oauth2/auth",
9
+ "token_uri": "https://oauth2.googleapis.com/token",
10
+ "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
11
+ "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/your-service-account%40your-project.iam.gserviceaccount.com"
12
+ }