Spaces:
Sleeping
Sleeping
File size: 5,570 Bytes
61b0513 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 |
ο»Ώ# π San Francisco Crime Analytics & Prediction System - Project Documentation
## 1. Project Overview
This project is a sophisticated AI-powered dashboard designed to analyze historical crime data in San Francisco and predict future incidents with high accuracy. It serves as a decision-support tool for law enforcement and a safety awareness tool for citizens.
The system combines:
- **Data Analytics**: Visualizing crime trends, hotspots, and distributions.
- **Machine Learning**: Using XGBoost and Random Forest to classify crimes as violent or non-violent.
- **Generative AI**: Integrating Groq (Llama 3) for natural language explanations and a conversational assistant.
## 2. Architecture & Technology Stack
### Frontend
- **Streamlit**: The core framework for the web interface. It handles the layout, user inputs, and visualization rendering.
- **Plotly**: Used for interactive charts (bar charts, pie charts, gauge charts).
- **Folium**: Used for geospatial visualizations (heatmaps, time-lapse maps).
### Backend & Logic
- **Python**: The primary programming language.
- **Pandas & NumPy**: For data manipulation and numerical operations.
- **Scikit-Learn**: For preprocessing (Label Encoding, K-Means Clustering) and baseline models.
- **XGBoost**: The engine behind the high-accuracy prediction model.
- **Groq API**: Provides the Llama 3 LLM for the AI assistant and explanation features.
### Directory Structure
```
Hackathon/
βββ app.py # Main application entry point
βββ Dockerfile # Container configuration
βββ requirements.txt # Project dependencies
βββ README.md # Quick start guide
βββ src/
β βββ data_loader.py # Data ingestion logic
β βββ preprocessing.py # Feature engineering pipeline
β βββ train_model.py # Model training script
βββ models/ # Saved model artifacts (.pkl)
βββ data/ # Raw dataset storage
βββ docs/ # Project documentation
```
## 3. Implementation Details
### 3.1 Data Pipeline (`src/data_loader.py` & `src/preprocessing.py`)
The data pipeline transforms raw CSV data into machine-learning-ready features.
- **Loading**: Reads `train.csv` and parses dates.
- **Feature Engineering**:
- **Temporal**: Extracts Hour, Day, Month, Year, DayOfWeek.
- **Contextual**: Determines 'Season' (Winter, Spring, Summer, Fall) and 'IsWeekend'.
- **Spatial**: Uses **K-Means Clustering** to group coordinates into 'LocationClusters', identifying high-risk zones.
- **Target Definition**: Creates a binary target `IsViolent` based on crime categories (e.g., Assault, Robbery = 1).
### 3.2 Model Training (`src/train_model.py`)
The training script evaluates multiple models to find the best performer.
1. **Preprocessing**: Applies the pipeline to the training data.
2. **Encoding**: Converts categorical variables (District, Season) into numbers using `LabelEncoder`.
3. **Model Selection**: Trains Naive Bayes, Random Forest, and XGBoost.
4. **Evaluation**: Compares Accuracy, Precision, and Recall.
5. **Artifact Saving**: Saves the best model and encoders to `models/` for the app to use.
### 3.3 The Dashboard (`app.py`)
The main application is divided into several tabs, each serving a specific purpose:
#### **π Historical Trends**
- **Logic**: Aggregates data by hour and district.
- **Viz**: Displays a bar chart for hourly distribution and a pie chart for district breakdown.
#### **πΊοΈ Geospatial Intelligence**
- **Logic**: Uses `Folium` to render maps.
- **Features**:
- **Time-Lapse**: Animates crime hotspots over a 24-hour cycle.
- **Static Heatmap**: Shows overall density of incidents.
#### **π¨ Tactical Simulation**
- **Purpose**: Simulates patrol scenarios to assess risk.
- **Logic**: Takes user input (District, Time), processes it through the model, and outputs a risk probability.
- **Output**: A gauge chart showing risk level and actionable recommendations (e.g., "Deploy SWAT").
#### **π¬ Chat with Data**
- **Purpose**: Natural language query interface.
- **Logic**: A simple intent parser filters the dataframe based on keywords (e.g., "Robbery", "Mission") and dynamically generates charts.
#### **?? Advanced Prediction (99%)**
- **Purpose**: High-precision individual incident prediction.
- **Model**: Uses a specialized XGBoost model (`crime_xgb_artifacts.pkl`) optimized for multi-class classification.
- **Features**:
- **Input Form**: Detailed inputs including address and description.
- **Top 3 Probabilities**: Shows the most likely crime categories.
- **AI Explanation**: Calls the **Groq API** to explain *why* the model made a specific prediction based on the description.
#### **π€ AI Crime Safety Assistant**
- **Implementation**: A chat interface embedded in the app.
- **Logic**: Maintains session state for chat history. Sends user queries + system prompt to Groq (Llama 3) to generate helpful safety advice and model explanations.
## 4. How to Run
1. **Prerequisites**: Python 3.9+ installed.
2. **Installation**:
```bash
pip install -r requirements.txt
```
3. **Execution**:
```bash
streamlit run app.py
```
## 5. Future Improvements
- **Real-time Data**: Connect to a live police API.
- **User Accounts**: Save preferences and history.
- **Mobile App**: Wrap the dashboard for mobile deployment.
---
*Generated by Antigravity*
|