Spaces:

MHuzaifaa
/

hackathonn

Sleeping

App Files Files Community

hackathonn / docs /PROJECT_DOCUMENTATION.md

MHuzaifaa

Upload project

61b0513 2 months ago

preview code

raw

history blame contribute delete

5.57 kB

# 🚓 San Francisco Crime Analytics & Prediction System - Project Documentation

1. Project Overview

This project is a sophisticated AI-powered dashboard designed to analyze historical crime data in San Francisco and predict future incidents with high accuracy. It serves as a decision-support tool for law enforcement and a safety awareness tool for citizens.

The system combines:

Data Analytics: Visualizing crime trends, hotspots, and distributions.
Machine Learning: Using XGBoost and Random Forest to classify crimes as violent or non-violent.
Generative AI: Integrating Groq (Llama 3) for natural language explanations and a conversational assistant.

2. Architecture & Technology Stack

Frontend

Streamlit: The core framework for the web interface. It handles the layout, user inputs, and visualization rendering.
Plotly: Used for interactive charts (bar charts, pie charts, gauge charts).
Folium: Used for geospatial visualizations (heatmaps, time-lapse maps).

Backend & Logic

Python: The primary programming language.
Pandas & NumPy: For data manipulation and numerical operations.
Scikit-Learn: For preprocessing (Label Encoding, K-Means Clustering) and baseline models.
XGBoost: The engine behind the high-accuracy prediction model.
Groq API: Provides the Llama 3 LLM for the AI assistant and explanation features.

Directory Structure

Hackathon/
├── app.py                 # Main application entry point
├── Dockerfile             # Container configuration
├── requirements.txt       # Project dependencies
├── README.md              # Quick start guide
├── src/
│   ├── data_loader.py     # Data ingestion logic
│   ├── preprocessing.py   # Feature engineering pipeline
│   └── train_model.py     # Model training script
├── models/                # Saved model artifacts (.pkl)
├── data/                  # Raw dataset storage
└── docs/                  # Project documentation

3. Implementation Details

3.1 Data Pipeline (`src/data_loader.py` & `src/preprocessing.py`)

The data pipeline transforms raw CSV data into machine-learning-ready features.

Loading: Reads train.csv and parses dates.
Feature Engineering:
- Temporal: Extracts Hour, Day, Month, Year, DayOfWeek.
- Contextual: Determines 'Season' (Winter, Spring, Summer, Fall) and 'IsWeekend'.
- Spatial: Uses K-Means Clustering to group coordinates into 'LocationClusters', identifying high-risk zones.
- Target Definition: Creates a binary target IsViolent based on crime categories (e.g., Assault, Robbery = 1).

3.2 Model Training (`src/train_model.py`)

The training script evaluates multiple models to find the best performer.

Preprocessing: Applies the pipeline to the training data.
Encoding: Converts categorical variables (District, Season) into numbers using LabelEncoder.
Model Selection: Trains Naive Bayes, Random Forest, and XGBoost.
Evaluation: Compares Accuracy, Precision, and Recall.
Artifact Saving: Saves the best model and encoders to models/ for the app to use.

3.3 The Dashboard (`app.py`)

The main application is divided into several tabs, each serving a specific purpose:

📊 Historical Trends

Logic: Aggregates data by hour and district.
Viz: Displays a bar chart for hourly distribution and a pie chart for district breakdown.

🗺️ Geospatial Intelligence

Logic: Uses Folium to render maps.
Features:
- Time-Lapse: Animates crime hotspots over a 24-hour cycle.
- Static Heatmap: Shows overall density of incidents.

🚨 Tactical Simulation

Purpose: Simulates patrol scenarios to assess risk.
Logic: Takes user input (District, Time), processes it through the model, and outputs a risk probability.
Output: A gauge chart showing risk level and actionable recommendations (e.g., "Deploy SWAT").

💬 Chat with Data

Purpose: Natural language query interface.
Logic: A simple intent parser filters the dataframe based on keywords (e.g., "Robbery", "Mission") and dynamically generates charts.

?? Advanced Prediction (99%)

Purpose: High-precision individual incident prediction.
Model: Uses a specialized XGBoost model (crime_xgb_artifacts.pkl) optimized for multi-class classification.
Features:
- Input Form: Detailed inputs including address and description.
- Top 3 Probabilities: Shows the most likely crime categories.
- AI Explanation: Calls the Groq API to explain why the model made a specific prediction based on the description.

🤖 AI Crime Safety Assistant

Implementation: A chat interface embedded in the app.
Logic: Maintains session state for chat history. Sends user queries + system prompt to Groq (Llama 3) to generate helpful safety advice and model explanations.

4. How to Run

Prerequisites: Python 3.9+ installed.
Installation:
```
pip install -r requirements.txt
```
Execution:
```
streamlit run app.py
```

5. Future Improvements

Real-time Data: Connect to a live police API.
User Accounts: Save preferences and history.
Mobile App: Wrap the dashboard for mobile deployment.

Generated by Antigravity