File size: 4,302 Bytes
8808059
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
# AI-Powered Excel Data Analysis App 

A Streamlit application that automates Excel data processing, provides intelligent analysis using Google's Gemini AI, and offers interactive visualizations. Perfect for analyzing EOC (Emergency Operations Center) data with automated designation-to-cadre mapping.

## Features 

- **File Upload & Processing**
  - Supports CSV, XLS, XLSX formats
  - Automatic data cleaning
  - Smart designation to cadre mapping
  - Handles multi-level headers

- **Interactive Data Preview**
  - Column selection
  - Global search functionality
  - Advanced column-specific filters
  - Customizable row display
  - Hide/show index options

- **AI-Powered Analysis**
  - Intelligent data insights using Gemini AI
  - Natural language queries
  - Automated data summaries
  - Pattern recognition
  - Follow-up question suggestions

- **Data Visualization**
  - Dynamic charts and graphs
  - Cadre distribution analysis
  - District-wise visualizations
  - Interactive dashboards
  - Correlation analysis

## Setup & Installation 

1. **Clone the repository**
   ```bash
   git clone https://github.com/HussainM899/AI-Data-Processing-Analytics.git
   cd AI-Data-Processing-Analytics
   ```

2. **Create and activate virtual environment**
   ```bash
   python -m venv venv
   source venv/bin/activate  # For Linux/Mac
   venv\Scripts\activate     # For Windows
   ```

3. **Install dependencies**
   ```bash
   pip install -r requirements.txt
   ```

4. **Set up environment variables**
   - Create a `.env` file in the root directory
   - Add required credentials (see `.env.example`)

## Required Environment Variables 
   ```.env
   env
   GOOGLE_APPLICATION_CREDENTIALS=path/to/credentials.json
   GOOGLE_API_KEY=your_api_key_here
   ```

## Usage 

1. **Start the application**
   ```bash
   streamlit run app.py
   ```

2. **Upload Data**
   - Use the file uploader to import your Excel/CSV file
   - The app automatically processes and cleans the data
   - Multi-level headers are automatically handled

3. **Analyze Data**
   - Use the navigation sidebar to switch between modes:
     - Data Processing
     - Analysis & Visualization
     - About
   - Ask questions in natural language
   - View automated insights and visualizations

4. **Export Results**
   - Download processed data in Excel format
   - Export updated designation mappings
   - Save analysis reports

## Project Structure 
```
AI-Data-Processing-Analytics/
β”œβ”€β”€ app.py # Main application file
β”œβ”€β”€ requirements.txt # Project dependencies
β”œβ”€β”€ .env.example # Example environment variables
β”œβ”€β”€ .gitignore # Git ignore rules
└── README.md # Project documentation
```


## Dependencies 

- `streamlit`: Web application framework
- `pandas`: Data manipulation and analysis
- `plotly`: Interactive visualizations
- `google-generativeai`: Gemini AI integration
- `langchain-google-genai`: LangChain integration
- `python-dotenv`: Environment variable management
- `openpyxl`: Excel file handling

## Security Notes 

- Never commit sensitive credentials
- Use environment variables for API keys
- Keep service account JSON file secure
- Regularly rotate credentials
- Avoid sharing API keys publicly

## Features in Detail 

### Data Processing
- Automatic cleaning of data
- Handling of missing values
- Removal of duplicates
- Smart string cleaning
- Multi-level header handling

### AI Analysis
- District-wise analysis
- Cadre distribution insights
- Trend identification
- Anomaly detection
- Custom query handling

### Visualization
- Pie charts for distributions
- Bar charts for comparisons
- Histograms for numerical data
- Correlation matrices
- Interactive filters

## Contributing 

1. Fork the repository
2. Create your feature branch (`git checkout -b feature/AmazingFeature`)
3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the branch (`git push origin feature/AmazingFeature`)
5. Open a Pull Request

## License 

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Contact 

Hussain - hussainmurtaza899@gmail.com
Project Link: [https://github.com/HussainM899/AI-Data-Processing-Analytics](https://github.com/HussainM899/AI-Data-Processing-Analytics)

---
Built using Streamlit and Gemini AI