Mahmoud-adel25 commited on
Commit
ca855c4
·
verified ·
1 Parent(s): 5e2aaa0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -422
README.md CHANGED
@@ -1,423 +1,10 @@
1
- # 🛍️ Customer Segmentation Analysis
2
-
3
- [![Streamlit App](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://customer-segmentation-mqnhet38emja8xtgffpzjt.streamlit.app/)
4
- [![Python](https://img.shields.io/badge/Python-3.8+-blue.svg)](https://www.python.org/downloads/)
5
- [![License](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)
6
- [![Streamlit](https://img.shields.io/badge/Streamlit-1.28+-red.svg)](https://streamlit.io/)
7
-
8
- > **🎯 Live Application**: [Customer Segmentation Analysis](https://customer-segmentation-mqnhet38emja8xtgffpzjt.streamlit.app/)
9
-
10
- A comprehensive, interactive web application for customer segmentation analysis using machine learning clustering algorithms. This project provides an end-to-end solution for identifying distinct customer groups based on purchasing behavior and demographic characteristics.
11
-
12
- ## 🌟 Live Demo
13
-
14
- **🚀 Try the application now:** [Customer Segmentation Analysis](https://customer-segmentation-mqnhet38emja8xtgffpzjt.streamlit.app/)
15
-
16
- The live application features:
17
- - ✨ **Interactive Data Exploration** with real-time visualizations
18
- - 🎯 **K-Means & DBSCAN Clustering** with optimal parameter selection
19
- - 📊 **Beautiful Visualizations** with dark theme and modern UI
20
- - 💡 **Business Insights** and actionable recommendations
21
- - 📱 **Responsive Design** that works on all devices
22
-
23
  ---
24
-
25
- ## 📋 Table of Contents
26
-
27
- - [🎯 Project Overview](#-project-overview)
28
- - [✨ Key Features](#-key-features)
29
- - [📊 Dataset Information](#-dataset-information)
30
- - [🛠️ Technology Stack](#️-technology-stack)
31
- - [🚀 Quick Start](#-quick-start)
32
- - [📁 Project Structure](#-project-structure)
33
- - [🔍 Analysis Workflow](#-analysis-workflow)
34
- - [📈 Results & Insights](#-results--insights)
35
- - [🎨 Screenshots](#-screenshots)
36
- - [⚙️ Configuration](#️-configuration)
37
- - [🤝 Contributing](#-contributing)
38
- - [📝 License](#-license)
39
-
40
- ---
41
-
42
- ## 🎯 Project Overview
43
-
44
- This project implements advanced customer segmentation using unsupervised machine learning techniques. It provides a complete solution for businesses to understand their customer base through data-driven insights and actionable recommendations.
45
-
46
- ### 🎯 Business Value
47
-
48
- - **Customer Understanding**: Identify distinct customer segments based on behavior patterns
49
- - **Targeted Marketing**: Develop personalized marketing strategies for each segment
50
- - **Resource Optimization**: Allocate marketing budgets more effectively
51
- - **Product Development**: Tailor products and services to specific customer needs
52
- - **Customer Retention**: Implement segment-specific retention strategies
53
-
54
- ---
55
-
56
- ## ✨ Key Features
57
-
58
- ### 🎨 **Modern User Interface**
59
- - **Dark Theme**: Beautiful, modern dark interface with gradient accents
60
- - **Responsive Design**: Works seamlessly on desktop, tablet, and mobile
61
- - **Interactive Elements**: Hover effects, animations, and smooth transitions
62
- - **Real-time Updates**: Dynamic visualizations that update instantly
63
-
64
- ### 📊 **Comprehensive Data Analysis**
65
- - **Data Exploration**: Interactive histograms, scatter plots, and correlation matrices
66
- - **Statistical Summary**: Detailed descriptive statistics and data quality checks
67
- - **Feature Relationships**: Visual analysis of correlations between variables
68
- - **Missing Value Detection**: Automatic identification and handling of data issues
69
-
70
- ### 🎯 **Advanced Clustering Algorithms**
71
- - **K-Means Clustering**: With optimal cluster determination using multiple metrics
72
- - **DBSCAN Clustering**: Density-based clustering for comparison
73
- - **Parameter Optimization**: Automatic selection of optimal clustering parameters
74
- - **Performance Metrics**: Silhouette score, Calinski-Harabasz score, and inertia
75
-
76
- ### 📈 **Rich Visualizations**
77
- - **2D Cluster Plots**: Interactive scatter plots with cluster assignments
78
- - **Distribution Analysis**: Box plots and histograms for each segment
79
- - **Comparative Analysis**: Side-by-side comparison of different algorithms
80
- - **Business Metrics**: Spending analysis and customer profile visualizations
81
-
82
- ### 💡 **Business Intelligence**
83
- - **Customer Profiles**: Detailed characteristics of each segment
84
- - **Spending Analysis**: Average spending patterns and trends
85
- - **Actionable Recommendations**: Specific strategies for each customer segment
86
- - **Download Results**: Export analysis results for further processing
87
-
88
- ---
89
-
90
- ## 📊 Dataset Information
91
-
92
- The application uses the **Mall Customer Segmentation** dataset, which simulates real-world customer data with the following features:
93
-
94
- | Feature | Description | Type | Range |
95
- |---------|-------------|------|-------|
96
- | **CustomerID** | Unique customer identifier | Integer | 1-200 |
97
- | **Gender** | Customer gender | Categorical | Male/Female |
98
- | **Age** | Customer age in years | Integer | 18-70 |
99
- | **Annual Income (k$)** | Annual income in thousands | Integer | 15-137 |
100
- | **Spending Score (1-100)** | Mall-assigned spending score | Integer | 1-100 |
101
-
102
- ### 📈 **Dataset Characteristics**
103
- - **Size**: 200 customers
104
- - **Features**: 5 variables (3 numeric, 2 categorical)
105
- - **Quality**: Clean data with no missing values
106
- - **Realism**: Simulates realistic customer behavior patterns
107
-
108
- ---
109
-
110
- ## 🛠️ Technology Stack
111
-
112
- ### **Core Technologies**
113
- - **Python 3.8+**: Primary programming language
114
- - **Streamlit 1.28+**: Interactive web application framework
115
- - **Pandas**: Data manipulation and analysis
116
- - **NumPy**: Numerical computing and array operations
117
-
118
- ### **Machine Learning**
119
- - **Scikit-learn**: Clustering algorithms (K-Means, DBSCAN)
120
- - **Silhouette Analysis**: Cluster quality evaluation
121
- - **StandardScaler**: Feature normalization
122
-
123
- ### **Visualization**
124
- - **Plotly**: Interactive charts and graphs
125
- - **Custom CSS**: Modern dark theme styling
126
- - **Responsive Design**: Mobile-friendly interface
127
-
128
- ### **Development Tools**
129
- - **YAML**: Configuration management
130
- - **Git**: Version control
131
- - **Streamlit Cloud**: Deployment platform
132
-
133
- ---
134
-
135
- ## 🚀 Quick Start
136
-
137
- ### **Option 1: Use the Live Application**
138
- 1. Visit [Customer Segmentation Analysis](https://customer-segmentation-mqnhet38emja8xtgffpzjt.streamlit.app/)
139
- 2. Start exploring the data immediately
140
- 3. No installation required!
141
-
142
- ### **Option 2: Run Locally**
143
-
144
- #### **Prerequisites**
145
- ```bash
146
- # Ensure you have Python 3.8+ installed
147
- python --version
148
-
149
- # Install Git (if not already installed)
150
- git --version
151
- ```
152
-
153
- #### **Installation Steps**
154
-
155
- 1. **Clone the repository**
156
- ```bash
157
- git clone https://github.com/yourusername/customer-segmentation.git
158
- cd customer-segmentation
159
- ```
160
-
161
- 2. **Install dependencies**
162
- ```bash
163
- pip install -r requirements.txt
164
- ```
165
-
166
- 3. **Launch the application**
167
- ```bash
168
- python run_app.py
169
- ```
170
-
171
- Or directly with Streamlit:
172
- ```bash
173
- streamlit run streamlit_app/main.py
174
- ```
175
-
176
- 4. **Access the application**
177
- - Open your browser and navigate to `http://localhost:8501`
178
- - The application will automatically load the sample dataset
179
- - Start exploring the different analysis sections
180
-
181
- ---
182
-
183
- ## 📁 Project Structure
184
-
185
- ```
186
- Customer segmentation/
187
- ├── 📁 streamlit_app/
188
- │ └── 🐍 main.py # Main Streamlit application
189
- ├── 📁 src/
190
- │ ├── 🐍 __init__.py # Package initialization
191
- │ ├── 🐍 data_loader.py # Data loading and preprocessing
192
- │ ├── 🐍 clustering.py # Clustering algorithms
193
- │ └── 🐍 visualizations.py # Visualization components
194
- ├── 📁 utils/
195
- │ ├── 🐍 __init__.py # Utilities package
196
- │ └── 🐍 data_generator.py # Sample data generation
197
- ├── 📁 config/
198
- │ └── ⚙️ config.yaml # Configuration settings
199
- ├── 📁 data/
200
- │ └── 📊 Mall_Customers.csv # Main dataset
201
- ├── 📁 .streamlit/
202
- │ └── ⚙️ config.toml # Streamlit configuration
203
- ├── 📋 requirements.txt # Python dependencies
204
- ├── 🚀 run_app.py # Application launcher
205
- └── 📖 README.md # Project documentation
206
- ```
207
-
208
- ---
209
-
210
- ## 🔍 Analysis Workflow
211
-
212
- ### **1. Data Exploration** 📊
213
- - **Dataset Overview**: Basic statistics and data quality assessment
214
- - **Distribution Analysis**: Histograms and density plots for all features
215
- - **Correlation Analysis**: Heatmaps showing feature relationships
216
- - **Visual Exploration**: Interactive scatter plots and box plots
217
-
218
- ### **2. Data Preprocessing** ⚙️
219
- - **Feature Selection**: Choose relevant variables for clustering
220
- - **Data Scaling**: Normalize features using StandardScaler
221
- - **Missing Value Handling**: Automatic detection and treatment
222
- - **Data Validation**: Ensure data quality and consistency
223
-
224
- ### **3. Optimal Cluster Determination** 🎯
225
- - **Elbow Method**: Find optimal number of clusters using inertia
226
- - **Silhouette Analysis**: Evaluate cluster quality and separation
227
- - **Calinski-Harabasz Score**: Alternative cluster evaluation metric
228
- - **Visual Assessment**: Interactive plots for parameter selection
229
-
230
- ### **4. K-Means Clustering** 🔵
231
- - **Algorithm Application**: Apply K-Means with optimal parameters
232
- - **Cluster Assignment**: Generate labels for each customer
233
- - **Performance Metrics**: Calculate silhouette and Calinski scores
234
- - **Center Visualization**: Plot cluster centroids
235
-
236
- ### **5. DBSCAN Clustering** 🌟
237
- - **Density-Based Clustering**: Apply DBSCAN algorithm
238
- - **Parameter Tuning**: Adjust epsilon and min_samples
239
- - **Noise Detection**: Identify outlier points
240
- - **Comparison Analysis**: Compare with K-Means results
241
-
242
- ### **6. Visualization & Analysis** 📈
243
- - **2D Cluster Plots**: Interactive scatter plots with cluster assignments
244
- - **Distribution Analysis**: Box plots showing feature distributions per cluster
245
- - **Spending Analysis**: Detailed spending patterns for each segment
246
- - **Comparative Visualizations**: Side-by-side algorithm comparison
247
-
248
- ### **7. Business Intelligence** 💡
249
- - **Customer Profiling**: Detailed characteristics of each segment
250
- - **Spending Patterns**: Average spending and variance analysis
251
- - **Actionable Insights**: Specific recommendations for each segment
252
- - **Export Results**: Download analysis results for further use
253
-
254
- ---
255
-
256
- ## 📈 Results & Insights
257
-
258
- ### **Typical Customer Segments Identified**
259
-
260
- | Segment | Characteristics | Business Strategy |
261
- |---------|----------------|-------------------|
262
- | **💎 High Value** | High income, high spending | Premium products, VIP services |
263
- | **💼 Conservative** | High income, low spending | Upselling, value propositions |
264
- | **🎯 Budget Spenders** | Low income, high spending | Value-based offerings, loyalty programs |
265
- | **📉 Low Engagement** | Low income, low spending | Retention strategies, engagement campaigns |
266
- | **⚖️ Balanced** | Moderate income and spending | Personalized marketing, core offerings |
267
-
268
- ### **Performance Metrics**
269
-
270
- The analysis provides comprehensive evaluation metrics:
271
-
272
- - **Silhouette Score**: Measures cluster cohesion and separation (0-1, higher is better)
273
- - **Calinski-Harabasz Score**: Evaluates cluster definition quality
274
- - **Inertia**: Within-cluster sum of squares for K-Means
275
- - **Number of Clusters**: Optimal cluster count determined automatically
276
- - **Noise Points**: Outlier detection in DBSCAN
277
-
278
- ### **Business Recommendations**
279
-
280
- Based on clustering results, the application provides:
281
-
282
- - **Marketing Strategies**: Segment-specific campaign recommendations
283
- - **Product Positioning**: Align products with cluster preferences
284
- - **Pricing Strategies**: Dynamic pricing based on segment characteristics
285
- - **Customer Retention**: Targeted programs for each segment
286
- - **Growth Opportunities**: Cross-selling and upselling strategies
287
-
288
- ---
289
-
290
- ## 🎨 Screenshots
291
-
292
- ### **Main Dashboard**
293
- ![Dashboard](https://via.placeholder.com/800x400/0F172A/E5E7EB?text=Main+Dashboard)
294
-
295
- ### **Data Exploration**
296
- ![Data Exploration](https://via.placeholder.com/800x400/0F172A/E5E7EB?text=Data+Exploration)
297
-
298
- ### **Clustering Results**
299
- ![Clustering](https://via.placeholder.com/800x400/0F172A/E5E7EB?text=Clustering+Results)
300
-
301
- ### **Business Insights**
302
- ![Insights](https://via.placeholder.com/800x400/0F172A/E5E7EB?text=Business+Insights)
303
-
304
- ---
305
-
306
- ## ⚙️ Configuration
307
-
308
- ### **Customizing Clustering Parameters**
309
-
310
- #### **K-Means Parameters**
311
- ```python
312
- # In the application interface
313
- n_clusters = 5 # Number of clusters
314
- random_state = 42 # For reproducible results
315
- ```
316
-
317
- #### **DBSCAN Parameters**
318
- ```python
319
- eps = 0.5 # Neighborhood distance
320
- min_samples = 5 # Minimum points per cluster
321
- ```
322
-
323
- ### **Feature Selection**
324
- ```python
325
- # Default features for clustering
326
- features = ['Annual Income (k$)', 'Spending Score (1-100)']
327
-
328
- # Custom feature selection
329
- features = ['Age', 'Annual Income (k$)', 'Spending Score (1-100)']
330
- ```
331
-
332
- ### **Visualization Settings**
333
- ```python
334
- # Color schemes
335
- colors = ['#FF6B6B', '#4ECDC4', '#45B7D1', '#96CEB4', '#FFEAA7']
336
-
337
- # Chart dimensions
338
- height = 450
339
- width = '100%'
340
- ```
341
-
342
- ---
343
-
344
- ## 🤝 Contributing
345
-
346
- We welcome contributions! Here's how you can help:
347
-
348
- ### **How to Contribute**
349
-
350
- 1. **Fork the repository**
351
- 2. **Create a feature branch**
352
- ```bash
353
- git checkout -b feature/amazing-feature
354
- ```
355
- 3. **Make your changes**
356
- 4. **Test thoroughly**
357
- 5. **Commit your changes**
358
- ```bash
359
- git commit -m 'Add amazing feature'
360
- ```
361
- 6. **Push to the branch**
362
- ```bash
363
- git push origin feature/amazing-feature
364
- ```
365
- 7. **Open a Pull Request**
366
-
367
- ### **Areas for Improvement**
368
-
369
- - **Additional Algorithms**: Hierarchical clustering, Gaussian Mixture Models
370
- - **Enhanced Visualizations**: 3D plots, interactive dashboards
371
- - **Advanced Analytics**: Customer lifetime value, churn prediction
372
- - **Performance Optimization**: Faster processing for large datasets
373
- - **Mobile Experience**: Improved mobile interface
374
- - **API Integration**: REST API for programmatic access
375
-
376
- ### **Bug Reports**
377
-
378
- Please use the [GitHub Issues](https://github.com/yourusername/customer-segmentation/issues) page to report bugs or request features.
379
-
380
- ---
381
-
382
- ## 📝 License
383
-
384
- This project is licensed under the **MIT License** - see the [LICENSE](LICENSE) file for details.
385
-
386
- ### **MIT License Summary**
387
- - ✅ **Commercial Use**: Allowed
388
- - ✅ **Modification**: Allowed
389
- - ✅ **Distribution**: Allowed
390
- - ✅ **Private Use**: Allowed
391
- - ❌ **Liability**: Limited
392
- - ❌ **Warranty**: None
393
-
394
- ---
395
-
396
- ## 🙏 Acknowledgments
397
-
398
- - **Dataset Source**: [Kaggle Mall Customer Segmentation](https://www.kaggle.com/vjchoudhary7/customer-segmentation-tutorial-in-python)
399
- - **Streamlit**: For the amazing web application framework
400
- - **Scikit-learn**: For robust machine learning algorithms
401
- - **Plotly**: For beautiful interactive visualizations
402
- - **Open Source Community**: For inspiration and support
403
-
404
- ---
405
-
406
- ## 📞 Support & Contact
407
-
408
- - **Live Application**: [Customer Segmentation Analysis](https://customer-segmentation-mqnhet38emja8xtgffpzjt.streamlit.app/)
409
- - **GitHub Repository**: [Customer Segmentation](https://github.com/yourusername/customer-segmentation)
410
- - **Issues**: [GitHub Issues](https://github.com/yourusername/customer-segmentation/issues)
411
- - **Email**: your.email@example.com
412
-
413
- ---
414
-
415
- <div align="center">
416
-
417
- **🎯 Happy Clustering! 📊**
418
-
419
- [![Streamlit App](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://customer-segmentation-mqnhet38emja8xtgffpzjt.streamlit.app/)
420
-
421
- *Made with ❤️ using Streamlit and Python*
422
-
423
- </div>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Customer Segmentation Analysis
3
+ emoji: 📊
4
+ colorFrom: blue
5
+ colorTo: purple
6
+ sdk: streamlit
7
+ sdk_version: 1.28.0
8
+ app_file: streamlit_app/main.py
9
+ pinned: false
10
+ ---