open-navigator / website /docs /guides /enterprise-tech-integration.md
jcbowyer's picture
Clean HuggingFace deployment without binary files
61d29fc
---
sidebar_position: 5
---
# Enterprise Tech Integration Guide
This guide documents the enterprise technology platforms and programs that support Open Navigator's data infrastructure.
## Implementation Status Legend
- βœ… **Active** - Fully implemented and in production use
- πŸ”„ **Recommended** - Implementation recommended for enhancement
- πŸ“š **Reference** - Used as inspiration for data modeling
- πŸ” **Evaluation** - Under consideration for future adoption
## 1. Cloud & Data Platforms
### βœ… Microsoft: Tech for Social Impact
**Status:** ACTIVE - Nonprofit CDM fully implemented
**What we use:**
- Nonprofit Common Data Model (CDM) for constituent management
- 8 core entities: CONSTITUENT, DONATION, CAMPAIGN, DESIGNATION, MEMBERSHIP, VOLUNTEER_ACTIVITY, PROGRAM_DELIVERY, PROGRAM_OUTCOME
**Files:**
- See [Nonprofit & Philanthropy](/docs/data-sources/citations#nonprofit--philanthropy) section
- ERD: [Data Model](/data-sources/data-model-erd)
**Resources:**
- GitHub: https://github.com/microsoft/Industry-Accelerator-Nonprofit
- License: MIT
---
### πŸ”„ Google: Data Commons
**Status:** RECOMMENDED - Implementation available, not yet deployed
**What we use:**
- Knowledge Graph API for jurisdiction demographics
- 100+ variables per jurisdiction (income, education, health, housing)
- Simplifies Census Bureau data access
**Implementation:**
- Code: `discovery/google_data_commons.py`
- Install: `pip install datacommons datacommons-pandas`
- Documentation: https://docs.datacommons.org/api/
**Next Steps:**
1. Install dependencies: `pip install datacommons datacommons-pandas`
2. Update `discovery/census_ingestion.py` to use Data Commons client
3. Replace manual Census API calls with simplified DC API
4. Add time-series enrichment for historical trends
**Example Usage:**
```python
from discovery.google_data_commons import DataCommonsClient
client = DataCommonsClient()
# Enrich a single jurisdiction
data = client.enrich_jurisdiction("01073") # Jefferson County, AL
print(data["Median_Income_Household"]) # $65,000
# Bulk enrich multiple jurisdictions
fips_codes = ["01073", "01089", "01097"]
df = client.enrich_jurisdictions_bulk(fips_codes)
# Get time series
df_ts = client.get_time_series("01073", start_year=2015)
```
**Benefits:**
- βœ… Simpler API than raw Census Bureau
- βœ… 100+ pre-integrated variables
- βœ… Automatic data quality validation
- βœ… Time series support
- βœ… No API key required (free tier)
---
### πŸ”„ AWS: Open Data for Good
**Status:** PLANNED - Best practices for dataset exports
**What we use:**
- Parquet format best practices
- S3 storage patterns
- AWS Glue Data Catalog
**Recommendations for `/exports` folder:**
1. **Format:** Use Parquet with Snappy compression
2. **Partitioning:** Partition by `state/county/year`
3. **Versioning:** Enable S3 versioning for lineage
4. **Catalog:** Use AWS Glue for schema management
5. **Querying:** Athena for SQL without ETL
**Next Steps:**
1. Review AWS Registry examples: https://registry.opendata.aws
2. Update export scripts to generate Parquet
3. Document partitioning strategy
4. Consider AWS Glue for metadata
---
## 2. Data Engineering Platforms
### βœ… Databricks: Databricks for Good
**Status:** ACTIVE - Full implementation
**What we use:**
- **Unity Catalog:** Model registry and data governance
- **Delta Lake:** Bronze/Silver/Gold lakehouse architecture
- **MLflow:** Agent deployment and experiment tracking
- **Model Serving:** Auto-scaling REST endpoints for agents
- **Agent Bricks:** Mosaic AI Agent Framework
**Files:**
- `pipeline/delta_lake.py` - Delta Lake pipeline
- `agents/mlflow_classifier.py` - Policy classifier agent
- `agents/mlflow_base.py` - Base MLflow agent class
- `databricks/deployment.py` - Unity Catalog deployment
- `databricks/evaluation.py` - Agent evaluation framework
- `databricks/notebooks/01_agent_bricks_quickstart.py` - Quickstart notebook
**Resources:**
- Documentation: https://docs.databricks.com/
- Unity Catalog: https://docs.databricks.com/en/data-governance/unity-catalog/
- Solution Accelerators: https://www.databricks.com/solutions/accelerators
**Delta Sharing for Public Exports:**
```python
from databricks import delta_sharing
# Share Gold layer tables
share = delta_sharing.SharingClient()
share.create_share(
name="one_civic_data",
tables=["gold.jurisdictions", "gold.meetings", "gold.nonprofits"]
)
```
---
### πŸ” Snowflake: Snowflake for Good
**Status:** EVALUATION - Consider for enterprise data sharing
**What we use:**
- Data Marketplace for Census/ESG data
- Data sharing capabilities
**Evaluation Criteria:**
- Cost vs. Databricks
- Data Marketplace value-add
- Enterprise collaboration needs
---
### πŸ“š Oracle: NetSuite Social Impact
**Status:** REFERENCE - Inspiration for nonprofit accounting
**What we use:**
- Fund accounting model patterns
- Grant tracking workflows
**Resources:**
- https://netsuite.com/social-impact
---
### πŸ“š Salesforce: Nonprofit Success Pack (NPSP)
**Status:** REFERENCE - Inspiration for constituent management
**What we use:**
- Household accounts model
- Recurring donations pattern
- Program engagement tracking
**NPSP β†’ ONE Mappings:**
| NPSP Object | Our Entity | Use Case |
|-------------|------------|----------|
| Contact | CONSTITUENT | Donor, volunteer, beneficiary |
| Opportunity | DONATION | Financial contributions |
| Campaign | CAMPAIGN | Fundraising campaigns |
| Engagement Plan | VOLUNTEER_ACTIVITY | Volunteer tracking |
| Program Cohort | PROGRAM_DELIVERY | Program participants |
**Resources:**
- GitHub: https://github.com/SalesforceFoundation/NPSP
- License: BSD-3-Clause
---
## 3. Infrastructure & AI
### πŸ“š Cisco: Crisis Response
**Status:** REFERENCE - Inspiration for platform resilience
**Focus:**
- Network connectivity during emergencies
- System resilience patterns
**Resources:**
- https://cisco.com/crisis-response
---
### πŸ“š IBM: Science for Social Good
**Status:** REFERENCE - AI/ML use case patterns
**Focus:**
- Watson AI for civic applications
- Blockchain for transparency
- Quantum computing potential
**Resources:**
- https://ibm.com/social-good
---
### πŸ” Meta: Data for Good
**Status:** EVALUATION - Population mapping potential
**What we use:**
- High-Resolution Population Density Maps
- Social Connectedness Index
**Evaluation:**
- Integration with demographics
- Use for underserved area identification
**Resources:**
- https://dataforgood.facebook.com
---
## Summary: Current vs. Planned Integrations
| Platform | Status | Priority | Effort | Value |
|----------|--------|----------|--------|-------|
| Microsoft CDM | βœ… Active | - | - | HIGH |
| Databricks | βœ… Active | - | - | HIGH |
| Google Data Commons | πŸ”„ Recommended | HIGH | Low | HIGH |
| AWS Best Practices | πŸ”„ Planned | MEDIUM | Medium | MEDIUM |
| Snowflake | πŸ” Evaluation | LOW | Medium | MEDIUM |
| Meta Data for Good | πŸ” Evaluation | LOW | Medium | MEDIUM |
| Salesforce NPSP | πŸ“š Reference | - | - | - |
| Oracle NetSuite | πŸ“š Reference | - | - | - |
| Cisco | πŸ“š Reference | - | - | - |
| IBM | πŸ“š Reference | - | - | - |
## Recommended Implementation Order
1. **Google Data Commons** (Immediate - Low effort, High value)
- Install dependencies
- Update census ingestion
- Test with sample jurisdictions
- Deploy to production
2. **AWS Export Optimization** (Next sprint - Medium effort, Medium value)
- Convert exports to Parquet
- Implement partitioning
- Document patterns
3. **Databricks Delta Sharing** (Future - Medium effort, Medium value)
- Configure sharing
- Create public share
- Document access
4. **Snowflake/Meta Evaluation** (Backlog - TBD)
- POC evaluation
- Cost-benefit analysis
- Decision by end of quarter
---
## How to Cite These Partnerships
All enterprise technology partnerships are properly cited in:
**[Citations & Data Sources - Enterprise Tech for Social Good](/docs/data-sources/citations#-enterprise-tech-for-social-good)**
Includes:
- Full program URLs
- Implementation status
- License information
- BibTeX citations (where applicable)
- Code examples