Spaces:
Running
title: BharatGraph
emoji: π
colorFrom: red
colorTo: green
sdk: docker
app_file: app.py
pinned: false
BharatGraph
India's open-source investigation intelligence platform
What is BharatGraph
BharatGraph aggregates 575+ official and verified government data sources into a Neo4j knowledge graph and runs 15 parallel AI investigators to surface corruption patterns, procurement irregularities, conflicts of interest, proxy ownership structures, and institutional risk indicators across Indian public life.
Every output includes a traceable citation to its primary source document. The system never makes legal findings or accusations -- it identifies structural patterns in public data.
Live
| Service | URL |
|---|---|
| Frontend | https://abinaze.github.io/bharatgraph |
| API | https://abinazebinoly-bharatgraph.hf.space |
| API docs | https://abinazebinoly-bharatgraph.hf.space/docs |
| Graph DB | neo4j+s://1a34e3b8.databases.neo4j.io |
| GitHub | https://github.com/abinaze/bharatgraph |
Who can use this
| User | Purpose |
|---|---|
| Investigative journalists | Search entities, generate sourced dossiers, export PDF |
| Academic researchers | Query the knowledge graph, run pattern detection |
| Civil society organisations | Monitor procurement, track affidavit wealth trajectories |
| Government agencies | Deploy internally with full audit trails and RBAC |
| Developers | Extend via the plugin system, add new data sources |
| Students | Study the codebase -- each component has documented theory |
Architecture
Browser (frontend/index.html -- vanilla HTML/CSS/JS, no build step)
|
v REST + WebSocket
FastAPI (api/main.py)
|
+-- 19 route modules
+-- 15 parallel AI investigators (ThreadPoolExecutor)
+-- 6-layer DeepInvestigator
+-- ConnectionMapper (shortest path + WHY explanations)
+-- RuntimeProfile (auto-detects hardware, assigns LOW/MEDIUM/HIGH)
|
v
Neo4j AuraDB / Custom Graph Engine (planned)
20 node types, 12 relationship types, MERGE with SHA-256 IDs
Data Sources (575+)
Official Government Sources (India)
| Source | What it provides | Official URL |
|---|---|---|
| MyNeta / ECI | Candidate affidavits, assets, criminal cases | myneta.info |
| MCA21 | Company registrations, directorships, CIN | mca.gov.in |
| GeM | Government e-Marketplace contract awards | gem.gov.in |
| CAG | Comptroller audit reports and irregularities | cag.gov.in |
| PIB | Press Information Bureau press releases | pib.gov.in |
| Lok Sabha | Parliamentary questions, division votes | loksabha.nic.in |
| SEBI | Securities enforcement orders | sebi.gov.in |
| ED | Enforcement Directorate press releases | enforcementdirectorate.gov.in |
| CVC | Central Vigilance Commission circulars | cvc.gov.in |
| Electoral Bonds | Bond transaction data (post-SC order 2024) | eci.gov.in |
| IBBI | Insolvency and Bankruptcy Board orders | ibbi.gov.in |
| NGO Darpan | NGO registration and CSR receipts | ngodarpan.gov.in |
| CPPP | Central Public Procurement Portal tenders | eprocure.gov.in |
| NCRB | Crime statistics by state and year | ncrb.gov.in |
| LGD | Local Government Directory entity codes | lgdirectory.gov.in |
| DataGov | Unified government dataset portal | data.gov.in |
| NJDG | Court pendency statistics | njdg.ecourts.gov.in |
| RTI Online | Filed RTI applications and outcomes | rtionline.gov.in |
| SFIO | Serious Fraud Investigation Office orders | sfio.nic.in |
| DGGI | GST fraud enforcement cases | cbic.gov.in |
| RBI | Bank enforcement actions and NPA data | rbi.org.in |
| IRDAI | Insurance regulatory orders | irdai.gov.in |
| DRI | Directorate of Revenue Intelligence seizures | dri.nic.in |
| BENAMI | Benami Prohibition Unit attachment orders | incometaxindia.gov.in |
| RERA (28 states) | Real estate project registrations | varies by state |
| MOSPI | Statistical data via microdata portal | microdata.gov.in |
| Lok Sabha votes | Division vote records per MP | loksabha.nic.in/Loksabha/Divisions |
| ECI Form 24A | Audited party accounts | eci.gov.in/disclosure-of-accounts |
| CPGRAMS | Citizen grievances by ministry | pgportal.gov.in |
| Social Audit | MGNREGS wage theft, ghost worker data | socialaudit.mahatma.net.in |
| NDAP | NITI Aayog district SDG and scheme data | ndap.nic.in |
| MIB registry | Media channel license holders | mib.gov.in |
| TRAI | Telecom spectrum allocations | trai.gov.in |
| NIC eBhumi | Land records digitization | dilrmp.gov.in |
| PM Kisan | Scheme disbursement data | pmkisan.gov.in |
| MGNREGS | Wage disbursements and job cards | nrega.nic.in |
| PM Awas | Housing scheme completions | pmaymis.gov.in |
International / Cross-Border Sources (Official)
| Source | What it provides | Official URL |
|---|---|---|
| ICIJ Offshore Leaks | Panama Papers, Pandora Papers, HSBC files | offshoreleaks.icij.org |
| OpenSanctions | Global sanctions and PEP lists | opensanctions.org |
| World Bank India | Project disbursements and loan data | projects.worldbank.org/en/api |
| ADB India | Asian Development Bank project portfolio | adb.org/projects/india |
| FATF India | AML/CFT risk evaluation | fatf-gafi.org |
| Wikidata | Structured biographical data for politicians | wikidata.org |
| OpenTimestamps | Bitcoin-anchored document timestamps | opentimestamps.org |
Supplementary Sources
These sources are used for corroboration only and are marked as NON_OFFICIAL_RECORD in the graph. All findings from these sources require confirmation from at least one official source before being included in a HIGH-confidence finding.
| Source | What it provides | Note |
|---|---|---|
| TCPD / LokDhaba | Historical election data 1962-2024 | Academic research dataset |
| github.com/tcpd/ppi | Politician biographical records | Academic, crowdsourced |
| github.com/in-rolls | Indian politician bios (8000+) | Community dataset |
| github.com/datameet | Constituency-level election data | Community dataset |
| IndiaVotes.com | Constituency results 1952-2024 | Private aggregator |
| Praja.org | Legislator attendance and questions | Civil society monitoring |
| Global Witness India | Natural resource corruption reports | NGO reports |
| India Election Affidavits | Parsed affidavit data (github) | github.com/Vonter |
| dataforindia.com | Multi-source India indicators | Aggregator |
| Wayback Machine CDX | Historical snapshots of government pages | Archive.org |
Capabilities
Knowledge Graph
The graph models 20 node types and 12 relationship types with stable SHA-256-derived canonical IDs. All writes use MERGE semantics -- no duplicate nodes across pipeline runs.
Node types: Politician, Company, Contract, AuditReport, Ministry, Party, Scheme, PressRelease, Tender, RegulatoryOrder, EnforcementAction, ElectoralBond, InsolvencyOrder, NGO, ParliamentQuestion, VigilanceCircular, ICIJEntity, SanctionedEntity, CourtCase, LocalBody
Relationship types: MEMBER_OF, DIRECTOR_OF, CONTESTED_IN, WON_CONTRACT, AWARDED_BY, FLAGS, MENTIONS, ISSUED_BY, ASSOCIATED_WITH, TARGETS, AUDITS, SUBJECT_OF
AI Investigation Engine
15 parallel investigators each query the graph independently. Findings confirmed by 3+ independent investigators = HIGH confidence. Findings confirmed by 2 investigators = MODERATE confidence.
| Investigator | Focus | Weight |
|---|---|---|
| Financial | Asset growth anomaly, Benford's Law | 0.12 |
| Political | Party-contract overlap, electoral proximity | 0.10 |
| Corporate | Director networks, shell company patterns | 0.10 |
| Judicial | Court cases, FIR patterns, PMLA exposure | 0.08 |
| Procurement | Bid rigging, cartel rotation, cover bids | 0.12 |
| Network | Graph centrality, bridge entities, Fiedler | 0.08 |
| Asset | Affidavit trajectory, Kalman filter | 0.10 |
| International | ICIJ links, sanctions, offshore jurisdictions | 0.10 |
| Media | PIB mentions, controversy timeline | 0.06 |
| Historical | 5-election career trajectory | 0.08 |
| Public Interest | RTI outcome, CPGRAMS complaint rate | 0.08 |
| Doubt | Forced counterevidence, adversarial probing | 0.08 |
| Math | Fourier FFT, spectral Fiedler, Benford | 0.08 |
| Affidavit | Kalman filter wealth trajectory | 0.10 |
| Benami | 5-factor proxy ownership score | 0.09 |
6-layer deep investigation:
- Direct evidence (depth 1 graph traversal)
- Relationship expansion (depth 2)
- Structural patterns (circular ownership, ghost company)
- Temporal investigation (timeline construction, burst detection)
- Network influence (betweenness, authority/hub, community)
- Cross-source validation (multi-dataset corroboration)
Forensic Modules
Benami proxy detection: Director age anomaly + surname clustering + address clustering + pre-contract formation + single-director structure. Score >= 65 = HIGH.
Affidavit wealth trajectory: Kalman filter on 5-election asset series. Expected growth = 8% FD return + 60% salary savings. Residual > 5x = VERY HIGH.
Procurement DNA: TF-IDF cosine >= 0.72 between bid documents from separate vendors = cover-bid signal. Cartel detection via award rotation.
Revolving door: Government-to-private career moves within 365-day cooling-off window. Pre-employment benefit scoring.
TBML indicators: Contract price 2.5-sigma anomaly, subcontract loop detection, director changes within 90 days of award.
Linguistic fingerprinting: Burrows Delta authorship attribution, template reuse detection, ghost-writing similarity scoring.
Policy-benefit causal analysis: Granger causality (lags 1-6 months), transfer entropy, CACA cross-ministry benefit chain.
Dark pattern detection: PrefixSpan on administrative event sequences. 6 pre-defined high-risk sequence patterns.
Runtime Auto-Scaling
At startup, BharatGraph detects CPU cores, RAM, GPU availability, free disk space, Docker environment, and Neo4j URI location. It assigns one of three profiles:
| Profile | CPU | RAM | Workers | Batch | Depth |
|---|---|---|---|---|---|
| LOW | 1-2 | <8GB | 2 | 25 | 2 |
| MEDIUM | 4 | 8GB | 4 | 100 | 3 |
| HIGH | 8+ | 16GB | 8 | 500 | 5 |
Force a profile: BHARATGRAPH_PROFILE=low|medium|high
Check active profile: GET /runtime
Security
- Sliding window rate limiter: 100/min search, 30/min investigation
- IP addresses stored as SHA-256 hashes only -- never plain text
- HTTP security headers: CSP, HSTS, X-Frame-Options
- Input validator blocks Cypher injection patterns
- Append-only SHA-256 hash-chained audit log at
logs/audit.jsonl - CORS origins via
CORS_ORIGINSenvironment variable - All outputs pass
validate_language()-- no accusatory vocabulary
Multilingual
22 Indian scheduled languages supported across all API endpoints. Language auto-detection via Unicode block analysis. Helsinki-NLP translation models. Cross-script entity matching for all 22 languages.
Quick Start
Prerequisites
Python 3.10+
Neo4j AuraDB account (free tier: 50K nodes / 175K relationships)
Local setup
git clone https://github.com/abinaze/bharatgraph.git
cd bharatgraph
pip install -r requirements.txt
# Copy environment template and fill in secrets
cp .env.example .env
# Edit .env: set NEO4J_URI, NEO4J_USER, NEO4J_PASSWORD
# Start the API
uvicorn api.main:app --reload --port 8000
# Open the frontend
open frontend/index.html
# (or visit http://localhost:8000/docs for the API)
Seed sample data
curl -X POST http://localhost:8000/admin/seed
Run the full pipeline
curl -X POST http://localhost:8000/admin/pipeline
# Check status:
curl http://localhost:8000/admin/pipeline/status
API Reference
Core routes
| Method | Route | Description |
|---|---|---|
| GET | /search?q= | Full-text + label-scan search across all 20 node types |
| GET | /profile/{id} | Complete entity profile with all linked data |
| GET | /risk/{id} | Composite 0-100 risk score with factor breakdown |
| GET | /investigate/{id} | 6-layer deep investigation report |
| GET | /affidavit/{id} | Kalman filter wealth trajectory |
| GET | /benami/{id} | 5-factor proxy ownership score |
| GET | /debate/{id} | 7-agent structured debate |
| GET | /adversarial/{id} | Forced counterevidence report |
| GET | /connection-map?a=&b= | Shortest path with WHY explanations |
| GET | /export/pdf/{id} | Download SHA-256-signed PDF dossier |
| GET | /runtime | Hardware profile and active settings |
| GET | /health | Service health check |
| GET | /stats | Node and relationship counts |
| POST | /admin/seed | Load sample data |
| POST | /admin/pipeline | Trigger full 21-scraper pipeline |
| WS | /ws/feed | Real-time high-signal entity feed |
Search example
curl "https://abinazebinoly-bharatgraph.hf.space/search?q=Adani&limit=5"
Investigation example
curl "https://abinazebinoly-bharatgraph.hf.space/investigate/pol_001"
Project Structure
bharatgraph/
+-- api/
| +-- main.py # FastAPI app, middleware, WS feed
| +-- routes/ # 19 route modules
| +-- middleware/ # Rate limiter, security headers, audit logger
| +-- models.py # Pydantic response models
| +-- dependencies.py # Neo4j driver injection
+-- ai/
| +-- multi_investigator.py # 15 parallel investigators + synthesis
| +-- deep_investigator.py # 6-layer recursive investigation
| +-- risk_scorer.py # Composite 0-100 risk score
| +-- investigators/ # 15 specialist investigator modules
| +-- forensics/ # Benami, TBML, cartel, linguistic, policy
| +-- self_learning/ # Pattern learner, weight optimiser, case memory
| +-- graph_analytics.py # PageRank, Louvain, centrality
| +-- explainer.py # validate_language() enforcement
+-- config/
| +-- settings.py # Environment config
| +-- runtime_profile.py # Hardware detector + profile assignment
| +-- model_selector.py # Profile-aware model selection
+-- graph/
| +-- loader.py # Neo4j loader for all 20 node types
| +-- schema.py # Constraints and indexes
| +-- seed.py # Sample data for /admin/seed
+-- processing/
| +-- pipeline.py # Parallel orchestrator (20 scrapers)
| +-- cleaner.py # Indian name normalisation
| +-- entity_resolver.py # Jaccard deduplication
+-- scrapers/ # 21 scrapers for all data sources
+-- frontend/
| +-- index.html # Single-page app entry point
| +-- js/ # app.js, api.js, components.js, timeline.js
| +-- css/ # main.css, themes
| +-- sw.js # Service worker (cache-first)
+-- blockchain/
| +-- audit_chain.py # Append-only SHA-256 hash chain
+-- tests/ # pytest test suite
+-- issues/ # GitHub issue templates per phase
+-- .github/workflows/ # CI, daily scrape, weekly learning
Deployment
HuggingFace Spaces (production)
# Set these secrets in HuggingFace Space settings:
NEO4J_URI=neo4j+s://...
NEO4J_USER=neo4j
NEO4J_PASSWORD=...
CORS_ORIGINS=https://abinaze.github.io
# Deploy
git remote add hf https://huggingface.co/spaces/abinazebinoly/bharatgraph
git push hf main --force
Docker
docker build -t bharatgraph .
docker run -p 8000:8000 \
-e NEO4J_URI=... \
-e NEO4J_PASSWORD=... \
bharatgraph
Legal and Ethics
BharatGraph analyses structural patterns in official public data. It does not make legal findings, accusations, or moral judgements about any individual or organisation.
All outputs use neutral analytical language enforced programmatically by
validate_language(). Forbidden words include: corrupt, guilty, criminal,
fraud, accused (as a judgement), fraudster, and similar accusatory terms.
Every finding is labelled as a "structural indicator" and includes:
- The specific data sources consulted
- The confidence level (HIGH/MODERATE/LOW/INSUFFICIENT)
- A disclaimer that this is an analytical report and not a legal finding
Users are responsible for verifying all findings through independent research before publishing. The platform is designed to assist investigation, not replace journalistic verification.
Data sourced from official government portals is used under the National Data Sharing and Accessibility Policy (NDSAP) and Open Government Data (OGD) platform terms. International data (ICIJ, OpenSanctions) is used under their respective open-data licenses.
Contributing
See CONTRIBUTING.md for the full contribution guide.
Branch naming: feature/phase-N-description or fix/issue-description
Every phase has:
- A GitHub issue created first (template in
issues/) - A feature branch
- Commits with descriptive messages referencing the issue number
- A PR with the standard description template
- A version tag after merge
All Python files must be pure ASCII (no Unicode in comments or strings).
Run python3 -m py_compile <file> before every commit.
License
MIT License. See LICENSE.
Reference documents
- DEPLOYMENT.md β deployment guide for all environments
- PHASE_ROADMAP.md β full development roadmap with phase summaries
- CONTRIBUTING.md β contribution guidelines
- SECURITY.md β security policy and responsible disclosure
Developed by Abinaze Binoy