# Architecture & Solution Blueprint

## High-level flow

1. **Ingest**: CSV files for local dev, Oracle Autonomous Database for enterprise deployments.
2. **Process**: Config-driven preprocessing with reusable Python package.
3. **Model**: Scikit-learn pipeline with TF-IDF + Logistic Regression.
4. **Serve**: Streamlit dashboard and CLI automation.
5. **Operate**: GitHub Actions CI, retraining script, and OCI deployment path.

```mermaid
sequenceDiagram
    participant User
    participant Streamlit
    participant Predictor
    participant Pipeline
    participant OracleDB

    User->>Streamlit: Input tweet
    Streamlit->>Predictor: call predict_with_threshold
    Predictor->>Pipeline: transform + predict_proba
    Pipeline-->>Predictor: labels & probabilities
    Predictor-->>Streamlit: curated response
    Streamlit-->>User: sentiment insights & KPIs
    Predictor->>OracleDB: (optional) pull latest training data
```

## Key metrics & KPIs

| KPI | Description | Target |
| --- | --- | --- |
| Macro F1 | Balanced view across positive/neutral/negative | ≥ 0.80 |
| Prediction latency | Streamlit inference response time | < 200 ms |
| Data freshness | Time since last Oracle sync | < 24 hours |
| Model drift PSI | Population stability index | < 0.2 |

## Extensibility roadmap

- Plug-in architecture for additional languages.
- OCI Data Science jobs for scheduled retraining.
- Oracle APEX dashboard embedding the Streamlit app.
- Integration with Deloitte's accelerators for risk & compliance logging.