Spaces:
Running on CPU Upgrade
Running on CPU Upgrade
File size: 8,175 Bytes
61d29fc | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 | ---
sidebar_position: 5
---
# Enterprise Tech Integration Guide
This guide documents the enterprise technology platforms and programs that support Open Navigator's data infrastructure.
## Implementation Status Legend
- β
**Active** - Fully implemented and in production use
- π **Recommended** - Implementation recommended for enhancement
- π **Reference** - Used as inspiration for data modeling
- π **Evaluation** - Under consideration for future adoption
## 1. Cloud & Data Platforms
### β
Microsoft: Tech for Social Impact
**Status:** ACTIVE - Nonprofit CDM fully implemented
**What we use:**
- Nonprofit Common Data Model (CDM) for constituent management
- 8 core entities: CONSTITUENT, DONATION, CAMPAIGN, DESIGNATION, MEMBERSHIP, VOLUNTEER_ACTIVITY, PROGRAM_DELIVERY, PROGRAM_OUTCOME
**Files:**
- See [Nonprofit & Philanthropy](/docs/data-sources/citations#nonprofit--philanthropy) section
- ERD: [Data Model](/data-sources/data-model-erd)
**Resources:**
- GitHub: https://github.com/microsoft/Industry-Accelerator-Nonprofit
- License: MIT
---
### π Google: Data Commons
**Status:** RECOMMENDED - Implementation available, not yet deployed
**What we use:**
- Knowledge Graph API for jurisdiction demographics
- 100+ variables per jurisdiction (income, education, health, housing)
- Simplifies Census Bureau data access
**Implementation:**
- Code: `discovery/google_data_commons.py`
- Install: `pip install datacommons datacommons-pandas`
- Documentation: https://docs.datacommons.org/api/
**Next Steps:**
1. Install dependencies: `pip install datacommons datacommons-pandas`
2. Update `discovery/census_ingestion.py` to use Data Commons client
3. Replace manual Census API calls with simplified DC API
4. Add time-series enrichment for historical trends
**Example Usage:**
```python
from discovery.google_data_commons import DataCommonsClient
client = DataCommonsClient()
# Enrich a single jurisdiction
data = client.enrich_jurisdiction("01073") # Jefferson County, AL
print(data["Median_Income_Household"]) # $65,000
# Bulk enrich multiple jurisdictions
fips_codes = ["01073", "01089", "01097"]
df = client.enrich_jurisdictions_bulk(fips_codes)
# Get time series
df_ts = client.get_time_series("01073", start_year=2015)
```
**Benefits:**
- β
Simpler API than raw Census Bureau
- β
100+ pre-integrated variables
- β
Automatic data quality validation
- β
Time series support
- β
No API key required (free tier)
---
### π AWS: Open Data for Good
**Status:** PLANNED - Best practices for dataset exports
**What we use:**
- Parquet format best practices
- S3 storage patterns
- AWS Glue Data Catalog
**Recommendations for `/exports` folder:**
1. **Format:** Use Parquet with Snappy compression
2. **Partitioning:** Partition by `state/county/year`
3. **Versioning:** Enable S3 versioning for lineage
4. **Catalog:** Use AWS Glue for schema management
5. **Querying:** Athena for SQL without ETL
**Next Steps:**
1. Review AWS Registry examples: https://registry.opendata.aws
2. Update export scripts to generate Parquet
3. Document partitioning strategy
4. Consider AWS Glue for metadata
---
## 2. Data Engineering Platforms
### β
Databricks: Databricks for Good
**Status:** ACTIVE - Full implementation
**What we use:**
- **Unity Catalog:** Model registry and data governance
- **Delta Lake:** Bronze/Silver/Gold lakehouse architecture
- **MLflow:** Agent deployment and experiment tracking
- **Model Serving:** Auto-scaling REST endpoints for agents
- **Agent Bricks:** Mosaic AI Agent Framework
**Files:**
- `pipeline/delta_lake.py` - Delta Lake pipeline
- `agents/mlflow_classifier.py` - Policy classifier agent
- `agents/mlflow_base.py` - Base MLflow agent class
- `databricks/deployment.py` - Unity Catalog deployment
- `databricks/evaluation.py` - Agent evaluation framework
- `databricks/notebooks/01_agent_bricks_quickstart.py` - Quickstart notebook
**Resources:**
- Documentation: https://docs.databricks.com/
- Unity Catalog: https://docs.databricks.com/en/data-governance/unity-catalog/
- Solution Accelerators: https://www.databricks.com/solutions/accelerators
**Delta Sharing for Public Exports:**
```python
from databricks import delta_sharing
# Share Gold layer tables
share = delta_sharing.SharingClient()
share.create_share(
name="one_civic_data",
tables=["gold.jurisdictions", "gold.meetings", "gold.nonprofits"]
)
```
---
### π Snowflake: Snowflake for Good
**Status:** EVALUATION - Consider for enterprise data sharing
**What we use:**
- Data Marketplace for Census/ESG data
- Data sharing capabilities
**Evaluation Criteria:**
- Cost vs. Databricks
- Data Marketplace value-add
- Enterprise collaboration needs
---
### π Oracle: NetSuite Social Impact
**Status:** REFERENCE - Inspiration for nonprofit accounting
**What we use:**
- Fund accounting model patterns
- Grant tracking workflows
**Resources:**
- https://netsuite.com/social-impact
---
### π Salesforce: Nonprofit Success Pack (NPSP)
**Status:** REFERENCE - Inspiration for constituent management
**What we use:**
- Household accounts model
- Recurring donations pattern
- Program engagement tracking
**NPSP β ONE Mappings:**
| NPSP Object | Our Entity | Use Case |
|-------------|------------|----------|
| Contact | CONSTITUENT | Donor, volunteer, beneficiary |
| Opportunity | DONATION | Financial contributions |
| Campaign | CAMPAIGN | Fundraising campaigns |
| Engagement Plan | VOLUNTEER_ACTIVITY | Volunteer tracking |
| Program Cohort | PROGRAM_DELIVERY | Program participants |
**Resources:**
- GitHub: https://github.com/SalesforceFoundation/NPSP
- License: BSD-3-Clause
---
## 3. Infrastructure & AI
### π Cisco: Crisis Response
**Status:** REFERENCE - Inspiration for platform resilience
**Focus:**
- Network connectivity during emergencies
- System resilience patterns
**Resources:**
- https://cisco.com/crisis-response
---
### π IBM: Science for Social Good
**Status:** REFERENCE - AI/ML use case patterns
**Focus:**
- Watson AI for civic applications
- Blockchain for transparency
- Quantum computing potential
**Resources:**
- https://ibm.com/social-good
---
### π Meta: Data for Good
**Status:** EVALUATION - Population mapping potential
**What we use:**
- High-Resolution Population Density Maps
- Social Connectedness Index
**Evaluation:**
- Integration with demographics
- Use for underserved area identification
**Resources:**
- https://dataforgood.facebook.com
---
## Summary: Current vs. Planned Integrations
| Platform | Status | Priority | Effort | Value |
|----------|--------|----------|--------|-------|
| Microsoft CDM | β
Active | - | - | HIGH |
| Databricks | β
Active | - | - | HIGH |
| Google Data Commons | π Recommended | HIGH | Low | HIGH |
| AWS Best Practices | π Planned | MEDIUM | Medium | MEDIUM |
| Snowflake | π Evaluation | LOW | Medium | MEDIUM |
| Meta Data for Good | π Evaluation | LOW | Medium | MEDIUM |
| Salesforce NPSP | π Reference | - | - | - |
| Oracle NetSuite | π Reference | - | - | - |
| Cisco | π Reference | - | - | - |
| IBM | π Reference | - | - | - |
## Recommended Implementation Order
1. **Google Data Commons** (Immediate - Low effort, High value)
- Install dependencies
- Update census ingestion
- Test with sample jurisdictions
- Deploy to production
2. **AWS Export Optimization** (Next sprint - Medium effort, Medium value)
- Convert exports to Parquet
- Implement partitioning
- Document patterns
3. **Databricks Delta Sharing** (Future - Medium effort, Medium value)
- Configure sharing
- Create public share
- Document access
4. **Snowflake/Meta Evaluation** (Backlog - TBD)
- POC evaluation
- Cost-benefit analysis
- Decision by end of quarter
---
## How to Cite These Partnerships
All enterprise technology partnerships are properly cited in:
**[Citations & Data Sources - Enterprise Tech for Social Good](/docs/data-sources/citations#-enterprise-tech-for-social-good)**
Includes:
- Full program URLs
- Implementation status
- License information
- BibTeX citations (where applicable)
- Code examples
|