Spaces:

MHuzaifaa
/

hackathonn

Sleeping

App Files Files Community

hackathonn / docs /PROJECT_DOCUMENTATION.md

MHuzaifaa

Upload project

61b0513 2 months ago

preview code

raw

history blame contribute delete

5.57 kB

	# 🚓 San Francisco Crime Analytics & Prediction System - Project Documentation

	## 1. Project Overview
	This project is a sophisticated AI-powered dashboard designed to analyze historical crime data in San Francisco and predict future incidents with high accuracy. It serves as a decision-support tool for law enforcement and a safety awareness tool for citizens.

	The system combines:
	- Data Analytics: Visualizing crime trends, hotspots, and distributions.
	- Machine Learning: Using XGBoost and Random Forest to classify crimes as violent or non-violent.
	- Generative AI: Integrating Groq (Llama 3) for natural language explanations and a conversational assistant.

	## 2. Architecture & Technology Stack

	### Frontend
	- Streamlit: The core framework for the web interface. It handles the layout, user inputs, and visualization rendering.
	- Plotly: Used for interactive charts (bar charts, pie charts, gauge charts).
	- Folium: Used for geospatial visualizations (heatmaps, time-lapse maps).

	### Backend & Logic
	- Python: The primary programming language.
	- Pandas & NumPy: For data manipulation and numerical operations.
	- Scikit-Learn: For preprocessing (Label Encoding, K-Means Clustering) and baseline models.
	- XGBoost: The engine behind the high-accuracy prediction model.
	- Groq API: Provides the Llama 3 LLM for the AI assistant and explanation features.

	### Directory Structure
	```
	Hackathon/
	├── app.py # Main application entry point
	├── Dockerfile # Container configuration
	├── requirements.txt # Project dependencies
	├── README.md # Quick start guide
	├── src/
	│ ├── data_loader.py # Data ingestion logic
	│ ├── preprocessing.py # Feature engineering pipeline
	│ └── train_model.py # Model training script
	├── models/ # Saved model artifacts (.pkl)
	├── data/ # Raw dataset storage
	└── docs/ # Project documentation
	```

	## 3. Implementation Details

	### 3.1 Data Pipeline (`src/data_loader.py` & `src/preprocessing.py`)
	The data pipeline transforms raw CSV data into machine-learning-ready features.

	- Loading: Reads `train.csv` and parses dates.
	- Feature Engineering:
	- Temporal: Extracts Hour, Day, Month, Year, DayOfWeek.
	- Contextual: Determines 'Season' (Winter, Spring, Summer, Fall) and 'IsWeekend'.
	- Spatial: Uses K-Means Clustering to group coordinates into 'LocationClusters', identifying high-risk zones.
	- Target Definition: Creates a binary target `IsViolent` based on crime categories (e.g., Assault, Robbery = 1).

	### 3.2 Model Training (`src/train_model.py`)
	The training script evaluates multiple models to find the best performer.

	1. Preprocessing: Applies the pipeline to the training data.
	2. Encoding: Converts categorical variables (District, Season) into numbers using `LabelEncoder`.
	3. Model Selection: Trains Naive Bayes, Random Forest, and XGBoost.
	4. Evaluation: Compares Accuracy, Precision, and Recall.
	5. Artifact Saving: Saves the best model and encoders to `models/` for the app to use.

	### 3.3 The Dashboard (`app.py`)
	The main application is divided into several tabs, each serving a specific purpose:

	#### 📊 Historical Trends
	- Logic: Aggregates data by hour and district.
	- Viz: Displays a bar chart for hourly distribution and a pie chart for district breakdown.

	#### 🗺️ Geospatial Intelligence
	- Logic: Uses `Folium` to render maps.
	- Features:
	- Time-Lapse: Animates crime hotspots over a 24-hour cycle.
	- Static Heatmap: Shows overall density of incidents.

	#### 🚨 Tactical Simulation
	- Purpose: Simulates patrol scenarios to assess risk.
	- Logic: Takes user input (District, Time), processes it through the model, and outputs a risk probability.
	- Output: A gauge chart showing risk level and actionable recommendations (e.g., "Deploy SWAT").

	#### 💬 Chat with Data
	- Purpose: Natural language query interface.
	- Logic: A simple intent parser filters the dataframe based on keywords (e.g., "Robbery", "Mission") and dynamically generates charts.

	#### ?? Advanced Prediction (99%)
	- Purpose: High-precision individual incident prediction.
	- Model: Uses a specialized XGBoost model (`crime_xgb_artifacts.pkl`) optimized for multi-class classification.
	- Features:
	- Input Form: Detailed inputs including address and description.
	- Top 3 Probabilities: Shows the most likely crime categories.
	- AI Explanation: Calls the Groq API to explain why the model made a specific prediction based on the description.

	#### 🤖 AI Crime Safety Assistant
	- Implementation: A chat interface embedded in the app.
	- Logic: Maintains session state for chat history. Sends user queries + system prompt to Groq (Llama 3) to generate helpful safety advice and model explanations.

	## 4. How to Run

	1. Prerequisites: Python 3.9+ installed.
	2. Installation:
	```bash
	pip install -r requirements.txt
	```
	3. Execution:
	```bash
	streamlit run app.py
	```

	## 5. Future Improvements
	- Real-time Data: Connect to a live police API.
	- User Accounts: Save preferences and history.
	- Mobile App: Wrap the dashboard for mobile deployment.

	---
	Generated by Antigravity

	# 🚓 San Francisco Crime Analytics & Prediction System - Project Documentation

	## 1. Project Overview
	This project is a sophisticated AI-powered dashboard designed to analyze historical crime data in San Francisco and predict future incidents with high accuracy. It serves as a decision-support tool for law enforcement and a safety awareness tool for citizens.

	The system combines:
	- Data Analytics: Visualizing crime trends, hotspots, and distributions.
	- Machine Learning: Using XGBoost and Random Forest to classify crimes as violent or non-violent.
	- Generative AI: Integrating Groq (Llama 3) for natural language explanations and a conversational assistant.

	## 2. Architecture & Technology Stack

	### Frontend
	- Streamlit: The core framework for the web interface. It handles the layout, user inputs, and visualization rendering.
	- Plotly: Used for interactive charts (bar charts, pie charts, gauge charts).
	- Folium: Used for geospatial visualizations (heatmaps, time-lapse maps).

	### Backend & Logic
	- Python: The primary programming language.
	- Pandas & NumPy: For data manipulation and numerical operations.
	- Scikit-Learn: For preprocessing (Label Encoding, K-Means Clustering) and baseline models.
	- XGBoost: The engine behind the high-accuracy prediction model.
	- Groq API: Provides the Llama 3 LLM for the AI assistant and explanation features.

	### Directory Structure
	```
	Hackathon/
	├── app.py # Main application entry point
	├── Dockerfile # Container configuration
	├── requirements.txt # Project dependencies
	├── README.md # Quick start guide
	├── src/
	│ ├── data_loader.py # Data ingestion logic
	│ ├── preprocessing.py # Feature engineering pipeline
	│ └── train_model.py # Model training script
	├── models/ # Saved model artifacts (.pkl)
	├── data/ # Raw dataset storage
	└── docs/ # Project documentation
	```

	## 3. Implementation Details

	### 3.1 Data Pipeline (`src/data_loader.py` & `src/preprocessing.py`)
	The data pipeline transforms raw CSV data into machine-learning-ready features.

	- Loading: Reads `train.csv` and parses dates.
	- Feature Engineering:
	- Temporal: Extracts Hour, Day, Month, Year, DayOfWeek.
	- Contextual: Determines 'Season' (Winter, Spring, Summer, Fall) and 'IsWeekend'.
	- Spatial: Uses K-Means Clustering to group coordinates into 'LocationClusters', identifying high-risk zones.
	- Target Definition: Creates a binary target `IsViolent` based on crime categories (e.g., Assault, Robbery = 1).

	### 3.2 Model Training (`src/train_model.py`)
	The training script evaluates multiple models to find the best performer.

	1. Preprocessing: Applies the pipeline to the training data.
	2. Encoding: Converts categorical variables (District, Season) into numbers using `LabelEncoder`.
	3. Model Selection: Trains Naive Bayes, Random Forest, and XGBoost.
	4. Evaluation: Compares Accuracy, Precision, and Recall.
	5. Artifact Saving: Saves the best model and encoders to `models/` for the app to use.

	### 3.3 The Dashboard (`app.py`)
	The main application is divided into several tabs, each serving a specific purpose:

	#### 📊 Historical Trends
	- Logic: Aggregates data by hour and district.
	- Viz: Displays a bar chart for hourly distribution and a pie chart for district breakdown.

	#### 🗺️ Geospatial Intelligence
	- Logic: Uses `Folium` to render maps.
	- Features:
	- Time-Lapse: Animates crime hotspots over a 24-hour cycle.
	- Static Heatmap: Shows overall density of incidents.

	#### 🚨 Tactical Simulation
	- Purpose: Simulates patrol scenarios to assess risk.
	- Logic: Takes user input (District, Time), processes it through the model, and outputs a risk probability.
	- Output: A gauge chart showing risk level and actionable recommendations (e.g., "Deploy SWAT").

	#### 💬 Chat with Data
	- Purpose: Natural language query interface.
	- Logic: A simple intent parser filters the dataframe based on keywords (e.g., "Robbery", "Mission") and dynamically generates charts.

	#### ?? Advanced Prediction (99%)
	- Purpose: High-precision individual incident prediction.
	- Model: Uses a specialized XGBoost model (`crime_xgb_artifacts.pkl`) optimized for multi-class classification.
	- Features:
	- Input Form: Detailed inputs including address and description.
	- Top 3 Probabilities: Shows the most likely crime categories.
	- AI Explanation: Calls the Groq API to explain why the model made a specific prediction based on the description.

	#### 🤖 AI Crime Safety Assistant
	- Implementation: A chat interface embedded in the app.
	- Logic: Maintains session state for chat history. Sends user queries + system prompt to Groq (Llama 3) to generate helpful safety advice and model explanations.

	## 4. How to Run

	1. Prerequisites: Python 3.9+ installed.
	2. Installation:
	```bash
	pip install -r requirements.txt
	```
	3. Execution:
	```bash
	streamlit run app.py
	```

	## 5. Future Improvements
	- Real-time Data: Connect to a live police API.
	- User Accounts: Save preferences and history.
	- Mobile App: Wrap the dashboard for mobile deployment.

	---
	Generated by Antigravity