Spaces:

mohanbot799s
/

civicconnect-ai-engine

Sleeping

App Files Files Community

MOHAN799S commited on Mar 18

Commit

0c4bd2e

1 Parent(s): 3d3f5e1

fix: register BERT model folders as proper git submodules

Browse files

Files changed (6) hide show

.gitmodules +12 -0
README.md +939 -8
civicconnect-bert-en +1 -0
civicconnect-bert-indic +1 -0
civicconnect-urgency-en +1 -0
civicconnect-urgency-indic +1 -0

.gitmodules ADDED Viewed

	@@ -0,0 +1,12 @@

+[submodule "civicconnect-bert-en"]
+	path = civicconnect-bert-en
+	url = https://huggingface.co/mohanbot799s/civicconnect-bert-en
+[submodule "civicconnect-bert-indic"]
+	path = civicconnect-bert-indic
+	url = https://huggingface.co/mohanbot799s/civicconnect-bert-indic
+[submodule "civicconnect-urgency-en"]
+	path = civicconnect-urgency-en
+	url = https://huggingface.co/mohanbot799s/civicconnect-urgency-en
+[submodule "civicconnect-urgency-indic"]
+	path = civicconnect-urgency-indic
+	url = https://huggingface.co/mohanbot799s/civicconnect-urgency-indic

README.md CHANGED Viewed

@@ -1,11 +1,942 @@
 ---
-title: Civicconnect Ai Engine
-emoji: 😻
-colorFrom: yellow
-colorTo: green
-sdk: docker
-pinned: false
-license: mit
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# CivicConnect AI Engine
+> Multilingual Civic Grievance Classification API
+> Deployed on Hugging Face Spaces · Built for Kakinada Municipal Corporation
+---
+## Table of Contents
+1. [Project Overview](#1-project-overview)
+2. [System Architecture](#2-system-architecture)
+3. [Folder Structure](#3-folder-structure)
+4. [API Endpoints](#4-api-endpoints)
+5. [Input Modes](#5-input-modes)
+6. [Grievance Categories](#6-grievance-categories)
+7. [Grievance Validation Logic](#7-grievance-validation-logic)
+8. [Image Pipeline](#8-image-pipeline)
+9. [Audio Pipeline](#9-audio-pipeline)
+10. [Language Support](#10-language-support)
+11. [Priority Engine (XPE)](#11-priority-engine-xpe)
+12. [Explainability (Integrated Gradients)](#12-explainability-integrated-gradients)
+13. [Fairness Audit (GFAS)](#13-fairness-audit-gfas)
+14. [Hotspot Forecasting](#14-hotspot-forecasting)
+15. [Location Validation](#15-location-validation)
+16. [Ward Bounding Boxes](#16-ward-bounding-boxes)
+17. [Models Used](#17-models-used)
+18. [Requirements](#18-requirements)
+19. [Environment Variables](#19-environment-variables)
+20. [Running Locally](#20-running-locally)
+21. [Deploying to Hugging Face Spaces](#21-deploying-to-hugging-face-spaces)
+22. [API Request & Response Examples](#22-api-request--response-examples)
+23. [Error Codes Reference](#23-error-codes-reference)
+24. [Testing Grievance Inputs](#24-testing-grievance-inputs)
+25. [Known Limitations](#25-known-limitations)
+---
+## 1. Project Overview
+CivicConnect AI Engine is the machine learning backend for the CivicConnect platform — a civic grievance management system built for Kakinada Municipal Corporation, Andhra Pradesh, India.
+Citizens submit complaints about public infrastructure issues via text, voice, or photo. This engine:
+- Classifies the grievance into one of 8 civic categories
+- Detects urgency level
+- Computes a priority score for routing
+- Validates the image location against Kakinada ward boundaries
+- Generates explainability tokens showing why the classification was made
+- Detects and rejects non-grievance inputs (greetings, observations, off-topic messages)
+- Supports English, Hindi, and Telugu
+The API is consumed by the CivicConnect Node.js/Express backend. MongoDB storage and Cloudinary media handling are managed by the Express layer — this engine handles only ML inference.
+---
+## 2. System Architecture
+```
+Citizen App (React Native / Web)
+           │
+           ▼
+  Express / Node.js Backend
+  (MongoDB · Cloudinary · Auth)
+           │
+           ▼  HTTP POST
+  ┌─────────────────────────────────────────┐
+  │         CivicConnect AI Engine          │
+  │              (Flask API)                │
+  │                                         │
+  │  ┌─────────────────────────────────┐    │
+  │  │     /predict  (main endpoint)   │    │
+  │  │                                 │    │
+  │  │  Input Mode Detection           │    │
+  │  │    A: Image only                │    │
+  │  │    B: Audio only                │    │
+  │  │    C: Text only                 │    │
+  │  │    D: Text + Image (evidence)   │    │
+  │  │    E: Audio + Image (evidence)  │    │
+  │  │                                 │    │
+  │  │  Grievance Validation Gate      │    │
+  │  │  ├─ Reject greetings/fillers    │    │
+  │  │  ├─ Detect civic topic          │    │
+  │  │  └─ Detect animal harm          │    │
+  │  │                                 │    │
+  │  │  OpenCV + GIT-large (image)     │    │
+  │  │  Whisper (audio)                │    │
+  │  │                                 │    │
+  │  │  BERT / IndicBERT               │    │
+  │  │  ├─ Category classification     │    │
+  │  │  └─ Urgency classification      │    │
+  │  │                                 │    │
+  │  │  XPE Priority Engine            │    │
+  │  │  IG Explainability              │    │
+  │  └─────────────────────────────────┘    │
+  │                                         │
+  │  /fairness-audit  (GFAS)                │
+  │  /hotspot-forecast  (Prophet)           │
+  └─────────────────────────────────────────┘
+```
+---
+## 3. Folder Structure
+```
+civicconnect-ai-engine/
+│
+├── app.py                          # Main Flask API — all endpoints
+│
+├── classification/
+│   ├── bert_classify.py            # English BERT category classifier
+│   └── indic_bert_classify.py      # Hindi/Telugu IndicBERT classifier
+│
+├── sentiment_analysis/
+│   ├── bert_predict.py             # English BERT urgency classifier
+│   └── indic_bert_predict.py       # Hindi/Telugu urgency classifier
+│
+├── multi_modal/
+│   ├── image_to_text.py            # OpenCV preprocessing + GIT-large captioning + EasyOCR
+│   └── audio_to_text.py            # Whisper audio transcription
+│
+├── xpe/
+│   ├── priority_engine.py          # Computes priority score + band
+│   ├── integrated_gradients_explainer.py  # IG token attribution
+│   └── hybrid_explainer.py         # Generates human-readable explanation
+│
+├── gfas/
+│   └── __init__.py                 # Grievance Fairness Audit System
+├── requirements.txt                # Python dependencies
+└── README.md                       # This file
+```
+---
+## 4. API Endpoints
+### `GET /`
+Health check. Returns API version, status, and available endpoints.
+### `GET /health`
+Lightweight liveness probe. Returns `{"status": "ok"}`.
+### `POST /predict`
+Main inference endpoint. Accepts text, audio, image, or combinations.
+Content-Type: `multipart/form-data` or `application/json`.
+### `POST /fairness-audit`
+Runs GFAS fairness audit on a batch of grievances.
+Content-Type: `application/json`.
+### `POST /hotspot-forecast`
+Runs Prophet time-series forecasting to predict civic issue hotspots.
+Content-Type: `application/json`.
+---
+## 5. Input Modes
+The `/predict` endpoint auto-detects which mode to use based on what fields are present in the request.
+| Mode | Fields Sent | Description |
+|------|-------------|-------------|
+| A | `image` only | Image with GPS — location validated, GIT caption extracted |
+| B | `audio` only | Audio file — transcribed via Whisper, then classified |
+| C | `text` only | Plain text complaint — validated and classified |
+| D | `text` + `image` | Text is the grievance, image is location evidence |
+| E | `audio` + `image` | Audio is the grievance, image is location evidence |
+**Mode A** — GPS location is validated against Kakinada Municipal Corporation boundaries. Hard reject if outside jurisdiction or no GPS data.
+**Modes D & E** — Text/audio is the primary grievance. Image is evidence only — soft-flagged if non-civic, never a hard reject.
+---
+## 6. Grievance Categories
+The classifier outputs one of these 8 categories:
+| Category | Examples |
+|----------|---------|
+| `electricity` | Broken streetlight, fallen pole, dangling wire, power cut |
+| `garbage` | Uncollected waste, overflowing bin, garbage dumped on road |
+| `pollution` | Factory smoke, burning garbage, chemical spill |
+| `public transport` | Broken bus stop, auto stand encroachment, accident scene |
+| `roads` | Pothole, road crack, footpath broken, road excavation |
+| `sanitation` | Open manhole, blocked drain, sewage overflow, open defecation |
+| `stray animals` | Dogs biting residents, cattle blocking road, animal carcass |
+| `water` | No water supply, pipe burst, waterlogging, flooded road |
+---
+## 7. Grievance Validation Logic
+User-typed text (Mode C and Mode D) goes through a three-stage validation gate before classification. Machine-generated text (GIT captions, Whisper transcripts) skips the intent check.
+### Stage 1 — Conversational rejection
+Full-string anchored pattern. Fires only when the **entire** input is a greeting, filler, or non-content phrase.
+```
+"Good morning"     → REJECTED  (full match)
+"Hi"               → REJECTED  (full match)
+"Namaste"          → REJECTED  (full match)
+"Good morning, pothole on road"  → PASSES  (has content after greeting)
+```
+### Stage 2a — Animal harm pattern
+Self-contained check. Fires when animal + harm verb + victim are all present within 50 characters of each other. No separate civic noun required.
+```
+"dogs biting people"                     → GRIEVANCE ✅
+"stray dogs attacked my child"           → GRIEVANCE ✅
+"there are lots of dogs in the area biting people"  → GRIEVANCE ✅
+"dogs are barking at night"              → NOT (no harm + victim)
+"there are dogs in the area"             → NOT (no harm signal)
+```
+### Stage 2b — Civic topic presence
+A civic infrastructure term alone is sufficient. The user observing a civic issue IS reporting it — formal complaint language is not required.
+```
+"Hello, I can see garbage on the road"   → GRIEVANCE ✅
+"Hi, the road has a pothole"             → GRIEVANCE ✅
+"there is water on the street"           → GRIEVANCE ✅ (waterlogging)
+"I see a broken pipe nearby"             → GRIEVANCE ✅
+"I notice the streetlight is off"        → GRIEVANCE ✅
+"dogs are barking at night"              → NOT ❌ (not a civic topic)
+"there are people on the road"           → NOT ❌ (no civic topic)
+```
+### Civic topic terms (selected)
+Roads: `pothole`, `road damage`, `footpath broken`, `pavement crack`
+Water: `waterlogging`, `pipe burst`, `drain overflow`, `sewage overflow`, `water on the road/street`
+Electricity: `streetlight`, `fallen electric pole`, `live wire`, `dangling wire`
+Garbage: `garbage`, `waste`, `overflowing bin`, `garbage dump`
+Sanitation: `manhole`, `drain blocked`, `sewage`, `open sewer`
+Animals: `stray dogs`, `cattle blocking`, `stray animal`
+Pollution: `smoke`, `pollution`, `burning garbage`, `chemical spill`
+### What gets rejected
+| Input | Reason |
+|-------|--------|
+| `Good morning` | Pure greeting (Stage 1) |
+| `hi` | Pure greeting (Stage 1) |
+| `test` | Test input (Stage 1) |
+| `thank you` | Filler (Stage 1) |
+| `dogs are barking at night` | No civic topic (Stage 2b) |
+| `there are people on the road` | No civic topic (Stage 2b) |
+| `I see a car on the street` | No civic topic (Stage 2b) |
+| `nice day today` | No civic topic (Stage 2b) |
+---
+## 8. Image Pipeline
+**File:** `multi_modal/image_to_text.py`
+Images are processed through a 5-step pipeline. The output is a natural language description sent to BERT for classification.
+### Step 1 — OpenCV Preprocessing (9 techniques)
+OpenCV is used for all image processing before model inference.
+| # | Technique | Purpose |
+|---|-----------|---------|
+| 1 | EXIF auto-orientation | Fixes sideways/upside-down phone photos |
+| 2 | Resize LANCZOS4 (≤1024px) | Optimal input size for GIT model |
+| 3 | NL-means denoising | Removes phone camera sensor noise |
+| 4 | Gray-world white balance | Corrects colour casts (tungsten, fluorescent, overcast) |
+| 5 | CLAHE on LAB L-channel | Adaptive contrast for dark/overexposed shots |
+| 6 | Adaptive gamma correction | Brightens night shots, dampens overexposed ones |
+| 7 | Bilateral filter | Edge-preserving smooth (keeps structural edges sharp) |
+| 8 | Unsharp mask sharpening | Recovers blurry edges from phone camera motion |
+| 9 | Percentile contrast stretch | Eliminates washed-out highlights |
+### Step 2 — EasyOCR (EN + HI + TE)
+Extracts any printed or handwritten text visible in the image. Useful when the photo contains a complaint notice, signboard, or label. Returns empty string if nothing meaningful is found (minimum 6 characters).
+### Step 3 — Microsoft GIT-large-coco Captioning
+Model: `microsoft/git-large-coco` (~700 MB)
+GIT (Generative Image-to-text Transformer) generates an unconditional visual description of what the image contains. No text prompt is used — the model describes freely based on what it sees.
+**Why GIT over BLIP-base:**
+| Model | Caption for pothole image | Problem |
+|-------|--------------------------|---------|
+| BLIP-base | "a road with cars on it" | Too generic |
+| GIT-large-coco | "a large hole in the middle of a cracked road surface" | Specific and accurate |
+BLIP-base was trained on web images and produces one-line generic captions. GIT-large is more accurate for real-world outdoor civic scenes including roads, drains, garbage piles, and broken infrastructure.
+### Step 4 — Civic Grievance Scorer
+Scores the GIT caption against a weighted civic keyword lexicon:
+- Primary terms (specific problem language): **score +2**
+- Secondary terms (supporting context): **score +1**
+- Minimum threshold: **score ≥ 2** to flag as civic
+Non-civic captions (selfies, food, nature, indoor scenes) are detected by override patterns and flagged. This score populates the `civic_score` and `evidence_relevant` fields in the response. It never modifies the text sent to BERT.
+### Step 5 — Clean Fusion (OCR + Caption)
+```
+OCR > 20 chars  → OCR is primary (actual text from image)
+                   Caption appended only if it adds new information
+OCR short/none  → GIT caption is the full output
+Both empty      → return ""  (image unreadable)
+```
+### HF API fallback
+When `IMAGE_BACKEND=hf_api`, the preprocessed image is sent to HuggingFace Inference API (`blip-image-captioning-large`). GIT is not available on the HF Inference API. OpenCV preprocessing still runs before the API call.
+---
+## 9. Audio Pipeline
+**File:** `multi_modal/audio_to_text.py`
+Audio files are transcribed using OpenAI Whisper. The transcript is treated as machine-generated text — the grievance intent check is skipped, only length and junk validation applies.
+Supported formats: WAV, MP3, M4A, OGG, FLAC (via `pydub` conversion).
+---
+## 10. Language Support
+| Language | Script detection | Models used |
+|----------|-----------------|-------------|
+| English | Default (no script match) | `civicconnect-bert-en`, `civicconnect-urgency-en` |
+| Hindi | Unicode range U+0900–U+097F | `civicconnect-bert-indic`, `civicconnect-urgency-indic` |
+| Telugu | Unicode range U+0C00–U+0C7F | `civicconnect-bert-indic`, `civicconnect-urgency-indic` |
+Language is auto-detected from the grievance text. The correct model pair is selected automatically — no language parameter needed in the request.
+**Hindi grievance validation keywords (sample):**
+`समस्या`, `शिकायत`, `बिजली`, `पानी`, `सड़क`, `कचरा`, `नाली`
+**Telugu grievance validation keywords (sample):**
+`సమస్య`, `ఫిర్యాదు`, `విద్యుత్`, `నీరు`, `రోడ్డు`, `చెత్త`, `మురుగు`
+---
+## 11. Priority Engine (XPE)
+**File:** `xpe/priority_engine.py`
+Computes a numeric priority score (0–100) and a priority band for routing the grievance to the right department queue.
+Inputs: `category`, `urgency`, `urgency_confidence`
+| Priority Band | Score Range | Meaning |
+|---------------|-------------|---------|
+| `Critical` | 75–100 | Immediate action required |
+| `High` | 50–74 | Resolve within 24 hours |
+| `Medium` | 25–49 | Resolve within 3 days |
+| `Low` | 0–24 | Routine queue |
+Certain category + urgency combinations automatically elevate priority — for example, `stray animals` + `high urgency` (biting incident) or `electricity` + `high urgency` (live wire on road).
+---
+## 12. Explainability (Integrated Gradients)
+**Files:** `xpe/integrated_gradients_explainer.py`, `xpe/hybrid_explainer.py`
+When `explain=true` is sent in the request, Integrated Gradients attribution is computed for both the category and urgency predictions.
+**What it returns:**
+```json
+"explanation": {
+  "category_tokens": [
+    {"token": "pothole", "score": 0.87},
+    {"token": "road", "score": 0.64}
+  ],
+  "urgency_tokens": [
+    {"token": "since", "score": 0.71},
+    {"token": "3", "score": 0.68},
+    {"token": "days", "score": 0.65}
+  ],
+  "category_decision": "Classified as 'roads' because of strong signals: pothole, road damage",
+  "urgency_decision": "Urgency is 'high' because complaint has been pending for a duration",
+  "priority_summary": "High priority — road infrastructure issue with time-based urgency",
+  "final_reason": "Grievance about road damage (pothole) pending since 3 days. Routed as High priority."
+}
+```
+Integrated Gradients computes the contribution of each input token to the final prediction by interpolating between a baseline (zero embedding) and the actual input. It is the only explainability method used — SHAP was evaluated and removed due to BERT incompatibility.
 ---
+## 13. Fairness Audit (GFAS)
+**File:** `gfas/__init__.py`
+**Endpoint:** `POST /fairness-audit`
+GFAS (Grievance Fairness Audit System) audits a batch of grievance records for demographic or geographic bias in classification and priority assignment.
+**Request body:**
+```json
+{
+  "grievances": [
+    {
+      "id": "abc123",
+      "text": "Pothole on main road",
+      "category": "roads",
+      "urgency": "high",
+      "priority_score": 72,
+      "area": "Gandhi Nagar",
+      "language": "english"
+    }
+  ]
+}
+```
+**Returns:** Fairness metrics, disparity scores by area/language, and flagged anomalies.
+---
+## 14. Hotspot Forecasting
+**Endpoint:** `POST /hotspot-forecast`
+Uses Facebook Prophet time-series forecasting to predict which area+category combinations are likely to see increased grievance volumes.
+**Request body:**
+```json
+{
+  "grievances": [...],
+  "horizon_days": 7,
+  "top_n": 10,
+  "source_window_days": 45
+}
+```
+**How the risk score is computed:**
+```
+raw_risk = 0.5 × (growth%) + 0.3 × (avg_priority) + 0.2 × (recent_avg / 5)
+risk_100 = 100 / (1 + e^(-raw_risk))   ← sigmoid normalisation to 0–100
+```
+| Risk Level | Score |
+|------------|-------|
+| Critical | ≥ 75 |
+| High | ≥ 50 |
+| Medium | ≥ 25 |
+| Low | < 25 |
+Prophet requires a minimum of 2 unique dates per area+category group. Groups with fewer data points are skipped. Forecasting runs in parallel via `ThreadPoolExecutor` (default 4 workers, configurable via `PROPHET_MAX_WORKERS`).
+---
+## 15. Location Validation
+**File:** `app.py` — `resolve_location_status()`
+All images are validated for GPS location before processing.
+### Validation flow
+```
+1. Extract GPS from EXIF metadata (piexif)
+      ↓ if no EXIF
+2. Read lat/lng from form fields (latitude, longitude)
+      ↓ if none supplied
+3. Return status="no_gps" → request rejected (Mode A)
+4. Kakinada boundary check:
+   16.85°N–17.10°N, 82.00°E–82.35°E
+      ↓ if outside
+5. Return status="invalid" → request rejected
+6. Ward bounding box check (if area field supplied)
+   Tolerance: ±0.015° (~1.5 km)
+      ↓ if GPS doesn't match declared ward
+7. Return status="invalid" with specific ward mismatch message
+```
+### Location behaviour by mode
+| Mode | Location failure | Action |
+|------|-----------------|--------|
+| A (image only) | Invalid or no GPS | **Hard reject** — 403 response |
+| D (text + image) | Invalid GPS | **Soft flag** — `location: "invalid"` in response, grievance still processed |
+| E (audio + image) | Invalid GPS | **Soft flag** — same as Mode D |
+---
+## 16. Ward Bounding Boxes
+49 Kakinada Municipal Corporation wards are defined with bounding box coordinates (lat_min, lat_max, lon_min, lon_max). A ±0.015° tolerance (~1.5 km) is applied to account for GPS drift.
+Sample wards defined:
+| Ward | Lat Range | Lon Range |
+|------|-----------|-----------|
+| Suryaraopeta | 16.980–17.010 | 82.230–82.260 |
+| Gandhi Nagar | 16.975–17.005 | 82.240–82.270 |
+| Old Town | 16.990–17.020 | 82.220–82.250 |
+| Kakinada Port Area | 16.940–16.970 | 82.260–82.300 |
+| Surampalem | 17.075–17.105 | 82.050–82.085 |
+| JNTU Kakinada Area | 16.950–16.980 | 82.260–82.300 |
+| ... | ... | ... |
+Full list of all 49 wards is defined in `WARD_BOUNDS` in `app.py`.
+---
+## 17. Models Used
+| Model | Purpose | Size | Source |
+|-------|---------|------|--------|
+| `civicconnect-bert-en` | English category classification | ~440 MB | Fine-tuned BERT (HF submodule) |
+| `civicconnect-bert-indic` | Hindi/Telugu category classification | ~580 MB | Fine-tuned IndicBERT (HF submodule) |
+| `civicconnect-urgency-en` | English urgency classification | ~440 MB | Fine-tuned BERT (HF submodule) |
+| `civicconnect-urgency-indic` | Hindi/Telugu urgency classification | ~580 MB | Fine-tuned IndicBERT (HF submodule) |
+| `microsoft/git-large-coco` | Image captioning | ~700 MB | HuggingFace Hub |
+| EasyOCR (en+hi+te) | OCR from images | ~400 MB | PyPI |
+| Whisper | Audio transcription | varies | OpenAI via HF |
+| Prophet | Hotspot time-series forecast | lightweight | Meta / PyPI |
+---
+## 18. Requirements
+```
+# Core ML
+torch
+transformers>=4.47.0,<4.50.0    # Pin — 4.50+ breaks GIT trust_remote_code
+tokenizers>=0.20.3,<0.22
+accelerate>=1.1.0
+safetensors>=0.4.3
+huggingface-hub>=0.26.0
+# Image
+opencv-python-headless           # 9-technique preprocessing pipeline
+Pillow
+piexif                           # EXIF GPS extraction
+easyocr                          # OCR (EN + HI + TE)
+# Audio
+pydub
+soundfile
+scipy
+# NLP
+sentencepiece
+tiktoken
+protobuf>=5.28.0
+regex
+nltk
+indic-nlp-library
+stopwordsiso
+# Explainability
+captum                           # Integrated Gradients
+shap>=0.44
+# Forecasting
+prophet
+# Data
+pandas
+numpy
+scikit-learn==1.5.2
+matplotlib
+seaborn
+# Backend
+flask
+flask-cors
+gunicorn
+werkzeug
+python-dotenv
+requests
+```
+> **Note:** `transformers` is pinned to `<4.50.0`. Versions 4.50 and above changed `GenerationMixin` inheritance in a way that breaks GIT's remote code loading, causing `AttributeError: _supports_sdpa`.
+---
+## 19. Environment Variables
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `PORT` | `7860` | Flask server port (HF Spaces uses 7860) |
+| `FLASK_DEBUG` | `false` | Enable Flask debug mode |
+| `MAX_UPLOAD_MB` | `32` | Maximum image/audio upload size in MB |
+| `IMAGE_BACKEND` | `local` | `local` = GIT runs on server, `hf_api` = HF Inference API |
+| `HF_TOKEN` | `""` | HuggingFace token (required when `IMAGE_BACKEND=hf_api`) |
+| `GIT_MODEL` | `microsoft/git-large-coco` | GIT model repo ID |
+| `PROPHET_MAX_WORKERS` | `4` | Thread pool size for hotspot forecasting |
+| `APP_VERSION` | `1.0.0` | Shown in health check response |
+---
+## 20. Running Locally
+### Prerequisites
+- Python 3.10+
+- pip
+### Setup
+```bash
+# Clone the repo
+git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
+cd civicconnect-ai-engine
+# Install dependencies
+pip install -r requirements.txt
+# Pull submodules (BERT model weights)
+git submodule update --init --recursive
+```
+### Run
+```bash
+python app.py
+```
+API will be available at `http://localhost:7860`
+### Test
+```bash
+# Health check
+curl http://localhost:7860/health
+# Text grievance
+curl -X POST http://localhost:7860/predict \
+  -H "Content-Type: application/json" \
+  -d '{"text": "There is a pothole on main road not fixed since 3 weeks"}'
+# Image + text
+curl -X POST http://localhost:7860/predict \
+  -F "text=Garbage not collected since 5 days" \
+  -F "image=@/path/to/photo.jpg" \
+  -F "area=gandhi nagar"
+```
+---
+## 21. Deploying to Hugging Face Spaces
+### One-time setup
+```bash
+# Install HF CLI
+pip install huggingface_hub
+# Login (get token from https://huggingface.co/settings/tokens)
+huggingface-cli login
+# Add HF remote (if not already set)
+git remote add origin https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
+```
+### Push changes
+```bash
+# Stage changed files
+git add app.py
+git add multi_modal/image_to_text.py
+git add requirements.txt
+# Commit
+git commit -m "your commit message"
+# Push
+git push origin main
+```
+### Push to specific commit
+```bash
+# Reset to a specific commit and force push
+git reset --hard COMMIT_HASH
+git push origin main --force
+```
+### Troubleshooting
+| Problem | Fix |
+|---------|-----|
+| `Authentication failed` | Run `huggingface-cli login` again with a Write token |
+| `rejected — non-fast-forward` | Run `git pull origin main --rebase` first |
+| Space stuck on Building | Go to Space → Settings → Factory Reboot |
+| `_supports_sdpa` error | Ensure `transformers<4.50.0` in requirements.txt |
+---
+## 22. API Request & Response Examples
+### Mode C — Text only
+**Request:**
+```json
+POST /predict
+Content-Type: application/json
+{
+  "text": "There is a pothole on main road in Gandhi Nagar not repaired since 3 weeks",
+  "explain": true
+}
+```
+**Response:**
+```json
+{
+  "status": "success",
+  "input_mode": "text",
+  "text": "There is a pothole on main road in Gandhi Nagar not repaired since 3 weeks",
+  "language": "english",
+  "category": "roads",
+  "category_confidence": 0.9423,
+  "urgency": "high",
+  "urgency_confidence": 0.8761,
+  "priority_score": 74.2,
+  "priority_band": "High",
+  "explanation": {
+    "category_tokens": [
+      {"token": "pothole", "score": 0.91},
+      {"token": "road", "score": 0.67},
+      {"token": "repaired", "score": 0.54}
+    ],
+    "urgency_tokens": [
+      {"token": "since", "score": 0.78},
+      {"token": "3", "score": 0.71},
+      {"token": "weeks", "score": 0.69}
+    ],
+    "category_decision": "Classified as roads due to: pothole, road, repaired",
+    "urgency_decision": "High urgency due to duration signal: since 3 weeks",
+    "priority_summary": "Road damage with high urgency — pending for weeks",
+    "final_reason": "Pothole on main road in Gandhi Nagar unresolved for 3 weeks. Routed as High priority."
+  }
+}
+```
+---
+### Mode D — Text + Image
+**Request:**
+```
+POST /predict
+Content-Type: multipart/form-data
+text=Garbage not collected since 5 days
+image=<photo.jpg>
+area=ashok nagar
+explain=false
+```
+**Response:**
+```json
+{
+  "status": "success",
+  "input_mode": "text+image",
+  "text": "Garbage not collected since 5 days",
+  "language": "english",
+  "category": "garbage",
+  "category_confidence": 0.9812,
+  "urgency": "high",
+  "urgency_confidence": 0.8934,
+  "priority_score": 78.5,
+  "priority_band": "High",
+  "location": "valid",
+  "evidence_relevant": true,
+  "evidence_note": "Image contains civic content related to garbage (visual relevance score: 6). GIT scores the image visually; BERT classifies the complaint text.",
+  "civic_score": 6,
+  "image_caption": "a large pile of garbage on the side of a road near residential buildings",
+  "explanation": { ... }
+}
+```
+---
+### Rejected — not a grievance
+**Request:**
+```json
+{"text": "Good morning"}
+```
+**Response (422):**
+```json
+{
+  "status": "failed",
+  "code": "not_a_grievance",
+  "message": "Your message does not appear to be a grievance or civic complaint. Please describe the issue you are facing — for example: pothole on the road, water supply disruption, electricity outage, garbage not collected, stray dogs biting residents, or any other civic problem."
+}
+```
+---
+### Rejected — outside Kakinada
+**Response (403):**
+```json
+{
+  "status": "failed",
+  "code": "location_invalid",
+  "message": "Image location is outside Kakinada Municipal Corporation limits. Only grievances within Kakinada jurisdiction are accepted.",
+  "location": "invalid"
+}
+```
+---
+## 23. Error Codes Reference
+| Code | HTTP | Meaning |
+|------|------|---------|
+| `missing_input` | 400 | No text, audio, or image provided |
+| `too_short` | 422 | Text is fewer than 5 characters |
+| `junk_input` | 422 | Input contains only numbers or symbols |
+| `not_a_grievance` | 422 | Text does not contain a civic grievance signal |
+| `image_unreadable` | 422 | GIT/OCR could not extract content from image |
+| `audio_unreadable` | 422 | Whisper could not transcribe audio |
+| `location_invalid` | 403 | Image GPS outside Kakinada limits |
+| `payload_too_large` | 413 | Upload exceeds size limit (default 32 MB) |
+| `not_found` | 404 | Endpoint does not exist |
+| `method_not_allowed` | 405 | Wrong HTTP method |
+| `internal_error` | 500 | Unhandled server exception (trace included) |
+---
+## 24. Testing Grievance Inputs
+### Should be accepted ✅
+**English — civic observation (no complaint language needed):**
+```
+Hello, I can see garbage on the road
+Hi, the road has a pothole
+Good morning, there are stray dogs near my house
+there is water on the street
+I see a broken pipe nearby
+I notice the streetlight is off
+there is sewage on the road
+I can see a manhole without cover
+```
+**English — with complaint intent:**
+```
+There is a big pothole on the main road near Gandhi Nagar
+Road is completely broken in Suryaraopeta ward
+No water supply since 3 days in our area
+Garbage not collected in our area since 5 days
+Power cut since 2 days no response from electricity board
+Streetlight not working since last month
+Stray dogs biting residents in our colony
+Dogs attacking my child near school
+Drain is blocked and sewage is overflowing
+Manhole is open on the main road
+```
+**Hindi:**
+```
+हमारे इलाके में पानी नहीं आ रहा है
+सड़क बहुत खराब है कृपया ठीक करें
+कचरा नहीं उठाया जा रहा है
+बिजली कल से नहीं है
+नाली बंद है और पानी भर गया है
+```
+**Telugu:**
+```
+మా కాలనీలో నీళ్ళు రావడం లేదు
+రోడ్డు పాడైంది దయచేసి సరిచేయండి
+చెత్త తీయడం లేదు చాలా రోజులు అయింది
+విద్యుత్ సమస్య ఉంది
+మురుగు పొంగి రోడ్డు మీద పడుతోంది
+```
+### Should be rejected ❌
+```
+Good morning
+Hi
+Hello
+Namaste
+How are you
+test
+ok
+thank you
+Dogs are barking at night
+There are people on the road
+I see a car on the street
+Nice day today
+Happy Diwali
+```
+### Tricky edge cases
+| Input | Expected | Reason |
+|-------|----------|--------|
+| `There are lots of dogs in the area` | ❌ NOT | No civic topic, no harm signal |
+| `There are lots of dogs in the area biting people` | ✅ GRIEVANCE | Animal harm pattern |
+| `Good morning, garbage on the road` | ✅ GRIEVANCE | Greeting + civic topic |
+| `The road looks bad today` | ❌ NOT | Vague — no specific civic term |
+| `Road is damaged` | ✅ GRIEVANCE | Civic topic match |
+| `Pothole` | ❌ NOT | Too short (< 8 characters) |
+| `Big pothole` | ✅ GRIEVANCE | ≥ 8 chars + civic topic |
+| `Dogs are roaming in the colony` | ❌ NOT | Roaming ≠ civic harm |
+| `Stray cattle on the highway` | ✅ GRIEVANCE | `stray cattle` = civic topic |
+---
+## 25. Known Limitations
+**Image captioning accuracy**
+GIT-large-coco was trained on general web images (COCO dataset), not specifically on civic infrastructure damage. It performs significantly better than BLIP-base for outdoor scenes but may occasionally produce vague captions for very dark, blurry, or low-contrast photos. OpenCV preprocessing mitigates most of these cases.
+**Grievance validation false negatives**
+Unusual phrasing not covered by `_CIVIC_TOPIC` patterns may be rejected. Users can always rephrase using standard civic terminology. The pattern set is designed to be expanded over time.
+**Hotspot forecasting minimum data**
+Prophet requires at least 2 unique dates per area+category group. New wards or newly emerging issue categories with insufficient history will be skipped in forecasting output.
+**Language detection**
+Language is detected by Unicode script range. Mixed-script inputs (e.g., Romanised Hindi/Telugu + English) default to English models. Code-switching ("Kachra nahi utha hai") may reduce classification accuracy.
+**GPS tolerance**
+Ward boundary validation uses ±0.015° tolerance (~1.5 km). GPS drift from indoor locations, tunnels, or weak signal may cause valid grievances to be flagged as outside the ward boundary.
+**Transformer version pinning**
+`transformers<4.50.0` is required for GIT model loading. Upgrading to 4.50+ will break the image pipeline with `AttributeError: _supports_sdpa`. This limitation will be resolved when GIT is migrated to the native `Florence2ForConditionalGeneration` class available in transformers 5.x.
 ---
+*CivicConnect AI Engine — Built for Kakinada Municipal Corporation*
+*Multilingual · Multimodal · Explainable · Fair*

civicconnect-bert-en ADDED Viewed

	@@ -0,0 +1 @@


1	+ Subproject commit 55fee65aab4f41d4a584b1177facdd54c6f1dbcd

civicconnect-bert-indic ADDED Viewed

	@@ -0,0 +1 @@


1	+ Subproject commit f852884dceff475ee499adf5994f765af5658455

civicconnect-urgency-en ADDED Viewed

	@@ -0,0 +1 @@


1	+ Subproject commit ed301fe2c3c864cd431c50363d068f1b4dfefce0

civicconnect-urgency-indic ADDED Viewed

	@@ -0,0 +1 @@


1	+ Subproject commit 63d31c859a86fe027522b20cfa147c9cbde15c09