Spaces:
Running on CPU Upgrade
Running on CPU Upgrade
| sidebar_position: 5 | |
| # Enterprise Tech Integration Guide | |
| This guide documents the enterprise technology platforms and programs that support Open Navigator's data infrastructure. | |
| ## Implementation Status Legend | |
| - β **Active** - Fully implemented and in production use | |
| - π **Recommended** - Implementation recommended for enhancement | |
| - π **Reference** - Used as inspiration for data modeling | |
| - π **Evaluation** - Under consideration for future adoption | |
| ## 1. Cloud & Data Platforms | |
| ### β Microsoft: Tech for Social Impact | |
| **Status:** ACTIVE - Nonprofit CDM fully implemented | |
| **What we use:** | |
| - Nonprofit Common Data Model (CDM) for constituent management | |
| - 8 core entities: CONSTITUENT, DONATION, CAMPAIGN, DESIGNATION, MEMBERSHIP, VOLUNTEER_ACTIVITY, PROGRAM_DELIVERY, PROGRAM_OUTCOME | |
| **Files:** | |
| - See [Nonprofit & Philanthropy](/docs/data-sources/citations#nonprofit--philanthropy) section | |
| - ERD: [Data Model](/data-sources/data-model-erd) | |
| **Resources:** | |
| - GitHub: https://github.com/microsoft/Industry-Accelerator-Nonprofit | |
| - License: MIT | |
| --- | |
| ### π Google: Data Commons | |
| **Status:** RECOMMENDED - Implementation available, not yet deployed | |
| **What we use:** | |
| - Knowledge Graph API for jurisdiction demographics | |
| - 100+ variables per jurisdiction (income, education, health, housing) | |
| - Simplifies Census Bureau data access | |
| **Implementation:** | |
| - Code: `discovery/google_data_commons.py` | |
| - Install: `pip install datacommons datacommons-pandas` | |
| - Documentation: https://docs.datacommons.org/api/ | |
| **Next Steps:** | |
| 1. Install dependencies: `pip install datacommons datacommons-pandas` | |
| 2. Update `discovery/census_ingestion.py` to use Data Commons client | |
| 3. Replace manual Census API calls with simplified DC API | |
| 4. Add time-series enrichment for historical trends | |
| **Example Usage:** | |
| ```python | |
| from discovery.google_data_commons import DataCommonsClient | |
| client = DataCommonsClient() | |
| # Enrich a single jurisdiction | |
| data = client.enrich_jurisdiction("01073") # Jefferson County, AL | |
| print(data["Median_Income_Household"]) # $65,000 | |
| # Bulk enrich multiple jurisdictions | |
| fips_codes = ["01073", "01089", "01097"] | |
| df = client.enrich_jurisdictions_bulk(fips_codes) | |
| # Get time series | |
| df_ts = client.get_time_series("01073", start_year=2015) | |
| ``` | |
| **Benefits:** | |
| - β Simpler API than raw Census Bureau | |
| - β 100+ pre-integrated variables | |
| - β Automatic data quality validation | |
| - β Time series support | |
| - β No API key required (free tier) | |
| --- | |
| ### π AWS: Open Data for Good | |
| **Status:** PLANNED - Best practices for dataset exports | |
| **What we use:** | |
| - Parquet format best practices | |
| - S3 storage patterns | |
| - AWS Glue Data Catalog | |
| **Recommendations for `/exports` folder:** | |
| 1. **Format:** Use Parquet with Snappy compression | |
| 2. **Partitioning:** Partition by `state/county/year` | |
| 3. **Versioning:** Enable S3 versioning for lineage | |
| 4. **Catalog:** Use AWS Glue for schema management | |
| 5. **Querying:** Athena for SQL without ETL | |
| **Next Steps:** | |
| 1. Review AWS Registry examples: https://registry.opendata.aws | |
| 2. Update export scripts to generate Parquet | |
| 3. Document partitioning strategy | |
| 4. Consider AWS Glue for metadata | |
| --- | |
| ## 2. Data Engineering Platforms | |
| ### β Databricks: Databricks for Good | |
| **Status:** ACTIVE - Full implementation | |
| **What we use:** | |
| - **Unity Catalog:** Model registry and data governance | |
| - **Delta Lake:** Bronze/Silver/Gold lakehouse architecture | |
| - **MLflow:** Agent deployment and experiment tracking | |
| - **Model Serving:** Auto-scaling REST endpoints for agents | |
| - **Agent Bricks:** Mosaic AI Agent Framework | |
| **Files:** | |
| - `pipeline/delta_lake.py` - Delta Lake pipeline | |
| - `agents/mlflow_classifier.py` - Policy classifier agent | |
| - `agents/mlflow_base.py` - Base MLflow agent class | |
| - `databricks/deployment.py` - Unity Catalog deployment | |
| - `databricks/evaluation.py` - Agent evaluation framework | |
| - `databricks/notebooks/01_agent_bricks_quickstart.py` - Quickstart notebook | |
| **Resources:** | |
| - Documentation: https://docs.databricks.com/ | |
| - Unity Catalog: https://docs.databricks.com/en/data-governance/unity-catalog/ | |
| - Solution Accelerators: https://www.databricks.com/solutions/accelerators | |
| **Delta Sharing for Public Exports:** | |
| ```python | |
| from databricks import delta_sharing | |
| # Share Gold layer tables | |
| share = delta_sharing.SharingClient() | |
| share.create_share( | |
| name="one_civic_data", | |
| tables=["gold.jurisdictions", "gold.meetings", "gold.nonprofits"] | |
| ) | |
| ``` | |
| --- | |
| ### π Snowflake: Snowflake for Good | |
| **Status:** EVALUATION - Consider for enterprise data sharing | |
| **What we use:** | |
| - Data Marketplace for Census/ESG data | |
| - Data sharing capabilities | |
| **Evaluation Criteria:** | |
| - Cost vs. Databricks | |
| - Data Marketplace value-add | |
| - Enterprise collaboration needs | |
| --- | |
| ### π Oracle: NetSuite Social Impact | |
| **Status:** REFERENCE - Inspiration for nonprofit accounting | |
| **What we use:** | |
| - Fund accounting model patterns | |
| - Grant tracking workflows | |
| **Resources:** | |
| - https://netsuite.com/social-impact | |
| --- | |
| ### π Salesforce: Nonprofit Success Pack (NPSP) | |
| **Status:** REFERENCE - Inspiration for constituent management | |
| **What we use:** | |
| - Household accounts model | |
| - Recurring donations pattern | |
| - Program engagement tracking | |
| **NPSP β ONE Mappings:** | |
| | NPSP Object | Our Entity | Use Case | | |
| |-------------|------------|----------| | |
| | Contact | CONSTITUENT | Donor, volunteer, beneficiary | | |
| | Opportunity | DONATION | Financial contributions | | |
| | Campaign | CAMPAIGN | Fundraising campaigns | | |
| | Engagement Plan | VOLUNTEER_ACTIVITY | Volunteer tracking | | |
| | Program Cohort | PROGRAM_DELIVERY | Program participants | | |
| **Resources:** | |
| - GitHub: https://github.com/SalesforceFoundation/NPSP | |
| - License: BSD-3-Clause | |
| --- | |
| ## 3. Infrastructure & AI | |
| ### π Cisco: Crisis Response | |
| **Status:** REFERENCE - Inspiration for platform resilience | |
| **Focus:** | |
| - Network connectivity during emergencies | |
| - System resilience patterns | |
| **Resources:** | |
| - https://cisco.com/crisis-response | |
| --- | |
| ### π IBM: Science for Social Good | |
| **Status:** REFERENCE - AI/ML use case patterns | |
| **Focus:** | |
| - Watson AI for civic applications | |
| - Blockchain for transparency | |
| - Quantum computing potential | |
| **Resources:** | |
| - https://ibm.com/social-good | |
| --- | |
| ### π Meta: Data for Good | |
| **Status:** EVALUATION - Population mapping potential | |
| **What we use:** | |
| - High-Resolution Population Density Maps | |
| - Social Connectedness Index | |
| **Evaluation:** | |
| - Integration with demographics | |
| - Use for underserved area identification | |
| **Resources:** | |
| - https://dataforgood.facebook.com | |
| --- | |
| ## Summary: Current vs. Planned Integrations | |
| | Platform | Status | Priority | Effort | Value | | |
| |----------|--------|----------|--------|-------| | |
| | Microsoft CDM | β Active | - | - | HIGH | | |
| | Databricks | β Active | - | - | HIGH | | |
| | Google Data Commons | π Recommended | HIGH | Low | HIGH | | |
| | AWS Best Practices | π Planned | MEDIUM | Medium | MEDIUM | | |
| | Snowflake | π Evaluation | LOW | Medium | MEDIUM | | |
| | Meta Data for Good | π Evaluation | LOW | Medium | MEDIUM | | |
| | Salesforce NPSP | π Reference | - | - | - | | |
| | Oracle NetSuite | π Reference | - | - | - | | |
| | Cisco | π Reference | - | - | - | | |
| | IBM | π Reference | - | - | - | | |
| ## Recommended Implementation Order | |
| 1. **Google Data Commons** (Immediate - Low effort, High value) | |
| - Install dependencies | |
| - Update census ingestion | |
| - Test with sample jurisdictions | |
| - Deploy to production | |
| 2. **AWS Export Optimization** (Next sprint - Medium effort, Medium value) | |
| - Convert exports to Parquet | |
| - Implement partitioning | |
| - Document patterns | |
| 3. **Databricks Delta Sharing** (Future - Medium effort, Medium value) | |
| - Configure sharing | |
| - Create public share | |
| - Document access | |
| 4. **Snowflake/Meta Evaluation** (Backlog - TBD) | |
| - POC evaluation | |
| - Cost-benefit analysis | |
| - Decision by end of quarter | |
| --- | |
| ## How to Cite These Partnerships | |
| All enterprise technology partnerships are properly cited in: | |
| **[Citations & Data Sources - Enterprise Tech for Social Good](/docs/data-sources/citations#-enterprise-tech-for-social-good)** | |
| Includes: | |
| - Full program URLs | |
| - Implementation status | |
| - License information | |
| - BibTeX citations (where applicable) | |
| - Code examples | |