Spaces:
Sleeping
Sleeping
ο»Ώ# π San Francisco Crime Analytics & Prediction System - Project Documentation
1. Project Overview
This project is a sophisticated AI-powered dashboard designed to analyze historical crime data in San Francisco and predict future incidents with high accuracy. It serves as a decision-support tool for law enforcement and a safety awareness tool for citizens.
The system combines:
- Data Analytics: Visualizing crime trends, hotspots, and distributions.
- Machine Learning: Using XGBoost and Random Forest to classify crimes as violent or non-violent.
- Generative AI: Integrating Groq (Llama 3) for natural language explanations and a conversational assistant.
2. Architecture & Technology Stack
Frontend
- Streamlit: The core framework for the web interface. It handles the layout, user inputs, and visualization rendering.
- Plotly: Used for interactive charts (bar charts, pie charts, gauge charts).
- Folium: Used for geospatial visualizations (heatmaps, time-lapse maps).
Backend & Logic
- Python: The primary programming language.
- Pandas & NumPy: For data manipulation and numerical operations.
- Scikit-Learn: For preprocessing (Label Encoding, K-Means Clustering) and baseline models.
- XGBoost: The engine behind the high-accuracy prediction model.
- Groq API: Provides the Llama 3 LLM for the AI assistant and explanation features.
Directory Structure
Hackathon/
βββ app.py # Main application entry point
βββ Dockerfile # Container configuration
βββ requirements.txt # Project dependencies
βββ README.md # Quick start guide
βββ src/
β βββ data_loader.py # Data ingestion logic
β βββ preprocessing.py # Feature engineering pipeline
β βββ train_model.py # Model training script
βββ models/ # Saved model artifacts (.pkl)
βββ data/ # Raw dataset storage
βββ docs/ # Project documentation
3. Implementation Details
3.1 Data Pipeline (src/data_loader.py & src/preprocessing.py)
The data pipeline transforms raw CSV data into machine-learning-ready features.
- Loading: Reads
train.csvand parses dates. - Feature Engineering:
- Temporal: Extracts Hour, Day, Month, Year, DayOfWeek.
- Contextual: Determines 'Season' (Winter, Spring, Summer, Fall) and 'IsWeekend'.
- Spatial: Uses K-Means Clustering to group coordinates into 'LocationClusters', identifying high-risk zones.
- Target Definition: Creates a binary target
IsViolentbased on crime categories (e.g., Assault, Robbery = 1).
3.2 Model Training (src/train_model.py)
The training script evaluates multiple models to find the best performer.
- Preprocessing: Applies the pipeline to the training data.
- Encoding: Converts categorical variables (District, Season) into numbers using
LabelEncoder. - Model Selection: Trains Naive Bayes, Random Forest, and XGBoost.
- Evaluation: Compares Accuracy, Precision, and Recall.
- Artifact Saving: Saves the best model and encoders to
models/for the app to use.
3.3 The Dashboard (app.py)
The main application is divided into several tabs, each serving a specific purpose:
π Historical Trends
- Logic: Aggregates data by hour and district.
- Viz: Displays a bar chart for hourly distribution and a pie chart for district breakdown.
πΊοΈ Geospatial Intelligence
- Logic: Uses
Foliumto render maps. - Features:
- Time-Lapse: Animates crime hotspots over a 24-hour cycle.
- Static Heatmap: Shows overall density of incidents.
π¨ Tactical Simulation
- Purpose: Simulates patrol scenarios to assess risk.
- Logic: Takes user input (District, Time), processes it through the model, and outputs a risk probability.
- Output: A gauge chart showing risk level and actionable recommendations (e.g., "Deploy SWAT").
π¬ Chat with Data
- Purpose: Natural language query interface.
- Logic: A simple intent parser filters the dataframe based on keywords (e.g., "Robbery", "Mission") and dynamically generates charts.
?? Advanced Prediction (99%)
- Purpose: High-precision individual incident prediction.
- Model: Uses a specialized XGBoost model (
crime_xgb_artifacts.pkl) optimized for multi-class classification. - Features:
- Input Form: Detailed inputs including address and description.
- Top 3 Probabilities: Shows the most likely crime categories.
- AI Explanation: Calls the Groq API to explain why the model made a specific prediction based on the description.
π€ AI Crime Safety Assistant
- Implementation: A chat interface embedded in the app.
- Logic: Maintains session state for chat history. Sends user queries + system prompt to Groq (Llama 3) to generate helpful safety advice and model explanations.
4. How to Run
- Prerequisites: Python 3.9+ installed.
- Installation:
pip install -r requirements.txt - Execution:
streamlit run app.py
5. Future Improvements
- Real-time Data: Connect to a live police API.
- User Accounts: Save preferences and history.
- Mobile App: Wrap the dashboard for mobile deployment.
Generated by Antigravity