Spaces:
Sleeping
Sleeping
| ο»Ώ# π San Francisco Crime Analytics & Prediction System - Project Documentation | |
| ## 1. Project Overview | |
| This project is a sophisticated AI-powered dashboard designed to analyze historical crime data in San Francisco and predict future incidents with high accuracy. It serves as a decision-support tool for law enforcement and a safety awareness tool for citizens. | |
| The system combines: | |
| - **Data Analytics**: Visualizing crime trends, hotspots, and distributions. | |
| - **Machine Learning**: Using XGBoost and Random Forest to classify crimes as violent or non-violent. | |
| - **Generative AI**: Integrating Groq (Llama 3) for natural language explanations and a conversational assistant. | |
| ## 2. Architecture & Technology Stack | |
| ### Frontend | |
| - **Streamlit**: The core framework for the web interface. It handles the layout, user inputs, and visualization rendering. | |
| - **Plotly**: Used for interactive charts (bar charts, pie charts, gauge charts). | |
| - **Folium**: Used for geospatial visualizations (heatmaps, time-lapse maps). | |
| ### Backend & Logic | |
| - **Python**: The primary programming language. | |
| - **Pandas & NumPy**: For data manipulation and numerical operations. | |
| - **Scikit-Learn**: For preprocessing (Label Encoding, K-Means Clustering) and baseline models. | |
| - **XGBoost**: The engine behind the high-accuracy prediction model. | |
| - **Groq API**: Provides the Llama 3 LLM for the AI assistant and explanation features. | |
| ### Directory Structure | |
| ``` | |
| Hackathon/ | |
| βββ app.py # Main application entry point | |
| βββ Dockerfile # Container configuration | |
| βββ requirements.txt # Project dependencies | |
| βββ README.md # Quick start guide | |
| βββ src/ | |
| β βββ data_loader.py # Data ingestion logic | |
| β βββ preprocessing.py # Feature engineering pipeline | |
| β βββ train_model.py # Model training script | |
| βββ models/ # Saved model artifacts (.pkl) | |
| βββ data/ # Raw dataset storage | |
| βββ docs/ # Project documentation | |
| ``` | |
| ## 3. Implementation Details | |
| ### 3.1 Data Pipeline (`src/data_loader.py` & `src/preprocessing.py`) | |
| The data pipeline transforms raw CSV data into machine-learning-ready features. | |
| - **Loading**: Reads `train.csv` and parses dates. | |
| - **Feature Engineering**: | |
| - **Temporal**: Extracts Hour, Day, Month, Year, DayOfWeek. | |
| - **Contextual**: Determines 'Season' (Winter, Spring, Summer, Fall) and 'IsWeekend'. | |
| - **Spatial**: Uses **K-Means Clustering** to group coordinates into 'LocationClusters', identifying high-risk zones. | |
| - **Target Definition**: Creates a binary target `IsViolent` based on crime categories (e.g., Assault, Robbery = 1). | |
| ### 3.2 Model Training (`src/train_model.py`) | |
| The training script evaluates multiple models to find the best performer. | |
| 1. **Preprocessing**: Applies the pipeline to the training data. | |
| 2. **Encoding**: Converts categorical variables (District, Season) into numbers using `LabelEncoder`. | |
| 3. **Model Selection**: Trains Naive Bayes, Random Forest, and XGBoost. | |
| 4. **Evaluation**: Compares Accuracy, Precision, and Recall. | |
| 5. **Artifact Saving**: Saves the best model and encoders to `models/` for the app to use. | |
| ### 3.3 The Dashboard (`app.py`) | |
| The main application is divided into several tabs, each serving a specific purpose: | |
| #### **π Historical Trends** | |
| - **Logic**: Aggregates data by hour and district. | |
| - **Viz**: Displays a bar chart for hourly distribution and a pie chart for district breakdown. | |
| #### **πΊοΈ Geospatial Intelligence** | |
| - **Logic**: Uses `Folium` to render maps. | |
| - **Features**: | |
| - **Time-Lapse**: Animates crime hotspots over a 24-hour cycle. | |
| - **Static Heatmap**: Shows overall density of incidents. | |
| #### **π¨ Tactical Simulation** | |
| - **Purpose**: Simulates patrol scenarios to assess risk. | |
| - **Logic**: Takes user input (District, Time), processes it through the model, and outputs a risk probability. | |
| - **Output**: A gauge chart showing risk level and actionable recommendations (e.g., "Deploy SWAT"). | |
| #### **π¬ Chat with Data** | |
| - **Purpose**: Natural language query interface. | |
| - **Logic**: A simple intent parser filters the dataframe based on keywords (e.g., "Robbery", "Mission") and dynamically generates charts. | |
| #### **?? Advanced Prediction (99%)** | |
| - **Purpose**: High-precision individual incident prediction. | |
| - **Model**: Uses a specialized XGBoost model (`crime_xgb_artifacts.pkl`) optimized for multi-class classification. | |
| - **Features**: | |
| - **Input Form**: Detailed inputs including address and description. | |
| - **Top 3 Probabilities**: Shows the most likely crime categories. | |
| - **AI Explanation**: Calls the **Groq API** to explain *why* the model made a specific prediction based on the description. | |
| #### **π€ AI Crime Safety Assistant** | |
| - **Implementation**: A chat interface embedded in the app. | |
| - **Logic**: Maintains session state for chat history. Sends user queries + system prompt to Groq (Llama 3) to generate helpful safety advice and model explanations. | |
| ## 4. How to Run | |
| 1. **Prerequisites**: Python 3.9+ installed. | |
| 2. **Installation**: | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| 3. **Execution**: | |
| ```bash | |
| streamlit run app.py | |
| ``` | |
| ## 5. Future Improvements | |
| - **Real-time Data**: Connect to a live police API. | |
| - **User Accounts**: Save preferences and history. | |
| - **Mobile App**: Wrap the dashboard for mobile deployment. | |
| --- | |
| *Generated by Antigravity* | |