Guilherme Favaron commited on
Commit
9968889
·
1 Parent(s): 4253f33

Add Streamlit data analytics app

Browse files
Files changed (7) hide show
  1. .gitattributes +0 -35
  2. .streamlit/config.toml +6 -0
  3. Dockerfile +4 -13
  4. README.md +49 -20
  5. app.py +489 -0
  6. requirements.txt +4 -3
  7. src/streamlit_app.py +0 -40
.gitattributes DELETED
@@ -1,35 +0,0 @@
1
- *.7z filter=lfs diff=lfs merge=lfs -text
2
- *.arrow filter=lfs diff=lfs merge=lfs -text
3
- *.bin filter=lfs diff=lfs merge=lfs -text
4
- *.bz2 filter=lfs diff=lfs merge=lfs -text
5
- *.ckpt filter=lfs diff=lfs merge=lfs -text
6
- *.ftz filter=lfs diff=lfs merge=lfs -text
7
- *.gz filter=lfs diff=lfs merge=lfs -text
8
- *.h5 filter=lfs diff=lfs merge=lfs -text
9
- *.joblib filter=lfs diff=lfs merge=lfs -text
10
- *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
- *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
- *.model filter=lfs diff=lfs merge=lfs -text
13
- *.msgpack filter=lfs diff=lfs merge=lfs -text
14
- *.npy filter=lfs diff=lfs merge=lfs -text
15
- *.npz filter=lfs diff=lfs merge=lfs -text
16
- *.onnx filter=lfs diff=lfs merge=lfs -text
17
- *.ot filter=lfs diff=lfs merge=lfs -text
18
- *.parquet filter=lfs diff=lfs merge=lfs -text
19
- *.pb filter=lfs diff=lfs merge=lfs -text
20
- *.pickle filter=lfs diff=lfs merge=lfs -text
21
- *.pkl filter=lfs diff=lfs merge=lfs -text
22
- *.pt filter=lfs diff=lfs merge=lfs -text
23
- *.pth filter=lfs diff=lfs merge=lfs -text
24
- *.rar filter=lfs diff=lfs merge=lfs -text
25
- *.safetensors filter=lfs diff=lfs merge=lfs -text
26
- saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
- *.tar.* filter=lfs diff=lfs merge=lfs -text
28
- *.tar filter=lfs diff=lfs merge=lfs -text
29
- *.tflite filter=lfs diff=lfs merge=lfs -text
30
- *.tgz filter=lfs diff=lfs merge=lfs -text
31
- *.wasm filter=lfs diff=lfs merge=lfs -text
32
- *.xz filter=lfs diff=lfs merge=lfs -text
33
- *.zip filter=lfs diff=lfs merge=lfs -text
34
- *.zst filter=lfs diff=lfs merge=lfs -text
35
- *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
.streamlit/config.toml ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ [theme]
2
+ base = "light"
3
+ primaryColor = "#ffbe00"
4
+
5
+ [server]
6
+ maxUploadSize = 20000
Dockerfile CHANGED
@@ -2,20 +2,11 @@ FROM python:3.9-slim
2
 
3
  WORKDIR /app
4
 
5
- RUN apt-get update && apt-get install -y \
6
- build-essential \
7
- curl \
8
- software-properties-common \
9
- git \
10
- && rm -rf /var/lib/apt/lists/*
11
 
12
- COPY requirements.txt ./
13
- COPY src/ ./src/
14
-
15
- RUN pip3 install -r requirements.txt
16
 
17
  EXPOSE 8501
18
 
19
- HEALTHCHECK CMD curl --fail http://localhost:8501/_stcore/health
20
-
21
- ENTRYPOINT ["streamlit", "run", "src/streamlit_app.py", "--server.port=8501", "--server.address=0.0.0.0"]
 
2
 
3
  WORKDIR /app
4
 
5
+ COPY requirements.txt .
6
+ RUN pip install -r requirements.txt
 
 
 
 
7
 
8
+ COPY . .
 
 
 
9
 
10
  EXPOSE 8501
11
 
12
+ CMD ["streamlit", "run", "app.py", "--server.port=8501", "--server.address=0.0.0.0"]
 
 
README.md CHANGED
@@ -1,20 +1,49 @@
1
- ---
2
- title: Data Analytics
3
- emoji: 🚀
4
- colorFrom: red
5
- colorTo: red
6
- sdk: docker
7
- app_port: 8501
8
- tags:
9
- - streamlit
10
- pinned: false
11
- short_description: Analise e visualize dados com gráficos interativos
12
- license: mit
13
- ---
14
-
15
- # Welcome to Streamlit!
16
-
17
- Edit `/src/streamlit_app.py` to customize this app to your heart's desire. :heart:
18
-
19
- If you have any questions, checkout our [documentation](https://docs.streamlit.io) and [community
20
- forums](https://discuss.streamlit.io).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Dashboard de Análise de Dados
2
+
3
+ Uma aplicação Streamlit para análise e visualização de dados com recursos interativos (inclui transformação de números para padrão brasileiro).
4
+
5
+ **Desenvolvido por:** Guilherme Favaron
6
+ **Website:** [guilhermefavaron.com.br](https://guilhermefavaron.com.br)
7
+
8
+ ## Funcionalidades
9
+
10
+ ### 📁 Gerenciamento de Dados
11
+ - **Upload de CSV**: Interface de arrastar e soltar para uploads fáceis
12
+ - **Dados de Exemplo**: Dataset de demonstração integrado para exploração imediata
13
+ - **Visualização de Dados**: Visão geral rápida da estrutura do dataset
14
+ - **Tratamento de Erros**: Validação robusta para formatos de arquivo e tipos de dados
15
+
16
+ ### 📊 Visualizações
17
+ - **Gráficos de Barras**: Compare categorias e valores
18
+ - **Gráficos de Linha**: Acompanhe tendências ao longo do tempo
19
+ - **Gráficos de Dispersão**: Explore relacionamentos entre variáveis
20
+ - **Histogramas**: Analise distribuições de dados
21
+ - **Box Plots**: Compare distribuições entre grupos
22
+
23
+ ### 🔍 Recursos Interativos
24
+ - **Filtragem Dinâmica**: Filtragem de dados em tempo real com sliders e seleção múltipla
25
+ - **Seleção de Colunas**: Escolha eixos X/Y e variáveis de agrupamento
26
+ - **Atualizações em Tempo Real**: Gráficos atualizam automaticamente baseados nas seleções
27
+ - **Design Responsivo**: Otimizado para diferentes tamanhos de tela
28
+
29
+ ### 💾 Opções de Exportação
30
+ - **Download de Dados Filtrados**: Exporte dados processados como CSV
31
+ - **Salvar Visualizações**: Baixe gráficos como arquivos HTML
32
+ - **Arquivos com Timestamp**: Nomenclatura automática com timestamps
33
+
34
+ ### 📈 Analytics
35
+ - **Resumo Estatístico**: Estatísticas descritivas para colunas numéricas
36
+ - **Análise de Dados Ausentes**: Identifique e quantifique valores ausentes
37
+ - **Insights Rápidos**: Métricas principais e visão geral dos dados
38
+ - **Verificações de Qualidade**: Validação e relatórios automáticos
39
+
40
+ ### 🇧🇷 Suporte a Formato Brasileiro
41
+ - **Conversão Automática**: Detecta e converte números no formato brasileiro (xx.xxx.xxx,xx)
42
+ - **Compatibilidade Regional**: Suporte nativo para padrões numéricos brasileiros
43
+
44
+ ## Instalação
45
+
46
+ 1. **Clone ou baixe** este repositório
47
+ 2. **Instale as dependências**:
48
+ ```bash
49
+ pip install -r requirements.txt
app.py ADDED
@@ -0,0 +1,489 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+ import pandas as pd
3
+ import plotly.express as px
4
+ import plotly.graph_objects as go
5
+ import numpy as np
6
+ import io
7
+ import base64
8
+ import re
9
+ from datetime import datetime
10
+
11
+ # Page configuration
12
+ st.set_page_config(
13
+ page_title="Data Analysis Dashboard",
14
+ page_icon="📊",
15
+ layout="wide",
16
+ initial_sidebar_state="expanded"
17
+ )
18
+
19
+ # Custom CSS for better styling
20
+ st.markdown("""
21
+ <style>
22
+ .main-header {
23
+ font-size: 2.5rem;
24
+ font-weight: bold;
25
+ color: #1f77b4;
26
+ text-align: center;
27
+ margin-bottom: 2rem;
28
+ }
29
+ .metric-container {
30
+ background-color: #f0f2f6;
31
+ padding: 1rem;
32
+ border-radius: 0.5rem;
33
+ margin: 0.5rem 0;
34
+ }
35
+ .stSelectbox > div > div {
36
+ background-color: white;
37
+ }
38
+ .upload-section {
39
+ border: 2px dashed #cccccc;
40
+ border-radius: 10px;
41
+ padding: 2rem;
42
+ text-align: center;
43
+ margin: 1rem 0;
44
+ }
45
+ </style>
46
+ """, unsafe_allow_html=True)
47
+
48
+ def convert_brazilian_number(value):
49
+ """Convert Brazilian number format (xx.xxx.xxx,xx) to float"""
50
+ if pd.isna(value) or value == '':
51
+ return np.nan
52
+
53
+ # Convert to string if not already
54
+ str_value = str(value).strip()
55
+
56
+ # Check if it's already a number
57
+ try:
58
+ return float(str_value)
59
+ except ValueError:
60
+ pass
61
+
62
+ # Brazilian number pattern: can have dots as thousand separators and comma as decimal
63
+ # Examples: "1.234.567,89", "1.234,56", "1234,56", "1234"
64
+ brazilian_pattern = r'^-?\d{1,3}(?:\.\d{3})*(?:,\d+)?$'
65
+
66
+ if re.match(brazilian_pattern, str_value):
67
+ # Remove thousand separators (dots) and replace decimal comma with dot
68
+ converted = str_value.replace('.', '').replace(',', '.')
69
+ try:
70
+ return float(converted)
71
+ except ValueError:
72
+ return np.nan
73
+
74
+ return np.nan
75
+
76
+ def detect_and_convert_brazilian_numbers(df):
77
+ """Detect and convert Brazilian number format columns to numeric"""
78
+ converted_columns = []
79
+ df_converted = df.copy()
80
+
81
+ for col in df.columns:
82
+ if df[col].dtype == 'object': # Only check string columns
83
+ # Sample some non-null values to check if they look like Brazilian numbers
84
+ sample_values = df[col].dropna().astype(str).head(10)
85
+
86
+ if len(sample_values) > 0:
87
+ # Check if most values match Brazilian number pattern
88
+ brazilian_count = 0
89
+ total_count = 0
90
+
91
+ for value in sample_values:
92
+ value = str(value).strip()
93
+ if value and value != 'nan':
94
+ total_count += 1
95
+ # Brazilian number pattern
96
+ if re.match(r'^-?\d{1,3}(?:\.\d{3})*(?:,\d+)?$', value) or re.match(r'^-?\d+,\d+$', value):
97
+ brazilian_count += 1
98
+
99
+ # If more than 70% of values look like Brazilian numbers, convert the column
100
+ if total_count > 0 and (brazilian_count / total_count) > 0.7:
101
+ converted_series = df[col].apply(convert_brazilian_number)
102
+
103
+ # Only convert if we successfully converted most values
104
+ non_null_original = df[col].notna().sum()
105
+ non_null_converted = converted_series.notna().sum()
106
+
107
+ if non_null_converted >= (non_null_original * 0.8): # At least 80% conversion success
108
+ df_converted[col] = converted_series
109
+ converted_columns.append(col)
110
+
111
+ return df_converted, converted_columns
112
+
113
+ def load_sample_data():
114
+ """Generate sample data for demonstration"""
115
+ np.random.seed(42)
116
+ n_samples = 1000
117
+
118
+ data = {
119
+ 'Date': pd.date_range('2023-01-01', periods=n_samples, freq='D'),
120
+ 'Sales': np.random.normal(1000, 200, n_samples),
121
+ 'Profit': np.random.normal(150, 50, n_samples),
122
+ 'Category': np.random.choice(['Electronics', 'Clothing', 'Books', 'Home'], n_samples),
123
+ 'Region': np.random.choice(['North', 'South', 'East', 'West'], n_samples),
124
+ 'Customer_Age': np.random.randint(18, 80, n_samples),
125
+ 'Rating': np.random.uniform(1, 5, n_samples)
126
+ }
127
+
128
+ df = pd.DataFrame(data)
129
+ df['Sales'] = np.where(df['Sales'] < 0, abs(df['Sales']), df['Sales'])
130
+ df['Profit'] = np.where(df['Category'] == 'Electronics', df['Profit'] * 1.5, df['Profit'])
131
+
132
+ # Add some Brazilian formatted numbers for demonstration
133
+ df['Vendas_BR'] = df['Sales'].apply(lambda x: f"{x:,.2f}".replace(',', 'X').replace('.', ',').replace('X', '.'))
134
+ df['Lucro_BR'] = df['Profit'].apply(lambda x: f"{x:,.2f}".replace(',', 'X').replace('.', ',').replace('X', '.'))
135
+
136
+ return df
137
+
138
+ def get_numeric_columns(df):
139
+ """Get numeric columns from dataframe"""
140
+ return df.select_dtypes(include=[np.number]).columns.tolist()
141
+
142
+ def get_categorical_columns(df):
143
+ """Get categorical columns from dataframe"""
144
+ return df.select_dtypes(include=['object', 'category']).columns.tolist()
145
+
146
+ def create_download_link(df, filename="filtered_data.csv"):
147
+ """Create download link for dataframe"""
148
+ csv = df.to_csv(index=False)
149
+ b64 = base64.b64encode(csv.encode()).decode()
150
+ href = f'<a href="data:file/csv;base64,{b64}" download="{filename}">Download CSV File</a>'
151
+ return href
152
+
153
+ def main():
154
+ # Header
155
+ st.markdown('<h1 class="main-header">📊 Data Analysis Dashboard</h1>', unsafe_allow_html=True)
156
+
157
+ # Sidebar
158
+ st.sidebar.title("🔧 Controls")
159
+ st.sidebar.markdown("---")
160
+
161
+ # File upload section
162
+ st.sidebar.subheader("📁 Data Upload")
163
+ uploaded_file = st.sidebar.file_uploader(
164
+ "Choose a CSV file",
165
+ type="csv",
166
+ help="Upload a CSV file to analyze your data"
167
+ )
168
+
169
+ use_sample = st.sidebar.checkbox(
170
+ "Use Sample Data",
171
+ value=True if uploaded_file is None else False,
172
+ help="Check this to use built-in sample data for demonstration"
173
+ )
174
+
175
+ # Brazilian number conversion option
176
+ convert_brazilian = st.sidebar.checkbox(
177
+ "🇧🇷 Auto-convert Brazilian Numbers",
178
+ value=True,
179
+ help="Automatically detect and convert Brazilian number format (xx.xxx.xxx,xx) to numeric"
180
+ )
181
+
182
+ # Load data
183
+ try:
184
+ if uploaded_file is not None:
185
+ df = pd.read_csv(uploaded_file)
186
+ st.sidebar.success(f"✅ File uploaded successfully! ({len(df)} rows)")
187
+ elif use_sample:
188
+ df = load_sample_data()
189
+ st.sidebar.info("📋 Using sample data")
190
+ else:
191
+ st.warning("Please upload a CSV file or use sample data to get started.")
192
+ st.markdown("""
193
+ ### 🚀 Welcome to the Data Analysis Dashboard!
194
+
195
+ This app helps you analyze and visualize your data with:
196
+ - **Interactive charts** (bar, line, scatter, histogram)
197
+ - **Dynamic filtering** and data exploration
198
+ - **Statistical summaries** and insights
199
+ - **Export capabilities** for data and visualizations
200
+ - **🇧🇷 Brazilian number format support** (xx.xxx.xxx,xx)
201
+
202
+ **To get started:**
203
+ 1. Upload a CSV file using the sidebar, or
204
+ 2. Check "Use Sample Data" to explore with demo data
205
+ """)
206
+ return
207
+
208
+ # Apply Brazilian number conversion if enabled
209
+ if convert_brazilian:
210
+ df_original = df.copy()
211
+ df, converted_cols = detect_and_convert_brazilian_numbers(df)
212
+
213
+ if converted_cols:
214
+ st.sidebar.success(f"🇧🇷 Converted {len(converted_cols)} columns from Brazilian format: {', '.join(converted_cols)}")
215
+
216
+ except Exception as e:
217
+ st.error(f"❌ Error loading file: {str(e)}")
218
+ st.info("Please make sure your file is a valid CSV format.")
219
+ return
220
+
221
+ # Data preview section
222
+ st.subheader("📋 Data Preview")
223
+
224
+ col1, col2, col3, col4 = st.columns(4)
225
+ with col1:
226
+ st.metric("Total Rows", len(df))
227
+ with col2:
228
+ st.metric("Total Columns", len(df.columns))
229
+ with col3:
230
+ st.metric("Numeric Columns", len(get_numeric_columns(df)))
231
+ with col4:
232
+ st.metric("Text Columns", len(get_categorical_columns(df)))
233
+
234
+ # Show data preview
235
+ with st.expander("🔍 View Raw Data", expanded=False):
236
+ st.dataframe(df.head(100), use_container_width=True)
237
+
238
+ # Data summary
239
+ with st.expander("📊 Statistical Summary", expanded=False):
240
+ col1, col2 = st.columns(2)
241
+
242
+ with col1:
243
+ st.subheader("Numeric Columns")
244
+ numeric_cols = get_numeric_columns(df)
245
+ if numeric_cols:
246
+ st.dataframe(df[numeric_cols].describe())
247
+ else:
248
+ st.info("No numeric columns found")
249
+
250
+ with col2:
251
+ st.subheader("Categorical Columns")
252
+ cat_cols = get_categorical_columns(df)
253
+ if cat_cols:
254
+ for col in cat_cols[:5]: # Show first 5 categorical columns
255
+ st.write(f"**{col}:** {df[col].nunique()} unique values")
256
+ if df[col].nunique() <= 10:
257
+ st.write(df[col].value_counts().head())
258
+ else:
259
+ st.info("No categorical columns found")
260
+
261
+ # Show conversion info if Brazilian conversion was applied
262
+ if convert_brazilian and 'converted_cols' in locals() and converted_cols:
263
+ with st.expander("🇧🇷 Brazilian Number Conversion Details", expanded=False):
264
+ st.write("**Converted Columns:**")
265
+ for col in converted_cols:
266
+ original_sample = df_original[col].dropna().head(3).tolist()
267
+ converted_sample = df[col].dropna().head(3).tolist()
268
+ st.write(f"**{col}:**")
269
+ st.write(f" - Original: {original_sample}")
270
+ st.write(f" - Converted: {converted_sample}")
271
+
272
+ # Filtering section
273
+ st.sidebar.markdown("---")
274
+ st.sidebar.subheader("🔍 Data Filters")
275
+
276
+ # Create a copy for filtering
277
+ filtered_df = df.copy()
278
+
279
+ # Numeric filters
280
+ numeric_cols = get_numeric_columns(df)
281
+ for col in numeric_cols:
282
+ if df[col].dtype in ['int64', 'float64']:
283
+ min_val = float(df[col].min())
284
+ max_val = float(df[col].max())
285
+
286
+ if min_val != max_val:
287
+ selected_range = st.sidebar.slider(
288
+ f"{col} Range",
289
+ min_value=min_val,
290
+ max_value=max_val,
291
+ value=(min_val, max_val),
292
+ help=f"Filter data by {col} values"
293
+ )
294
+ filtered_df = filtered_df[
295
+ (filtered_df[col] >= selected_range[0]) &
296
+ (filtered_df[col] <= selected_range[1])
297
+ ]
298
+
299
+ # Categorical filters
300
+ cat_cols = get_categorical_columns(df)
301
+ for col in cat_cols:
302
+ unique_values = df[col].unique().tolist()
303
+ if len(unique_values) <= 50: # Only show filter for columns with reasonable number of unique values
304
+ selected_values = st.sidebar.multiselect(
305
+ f"Select {col}",
306
+ options=unique_values,
307
+ default=unique_values,
308
+ help=f"Filter data by {col} categories"
309
+ )
310
+ if selected_values:
311
+ filtered_df = filtered_df[filtered_df[col].isin(selected_values)]
312
+
313
+ # Show filtered data info
314
+ if len(filtered_df) != len(df):
315
+ st.sidebar.info(f"Filtered: {len(filtered_df)} of {len(df)} rows")
316
+
317
+ # Visualization section
318
+ st.markdown("---")
319
+ st.subheader("📈 Data Visualization")
320
+
321
+ # Chart type selection
322
+ chart_type = st.selectbox(
323
+ "Select Chart Type",
324
+ ["Bar Chart", "Line Chart", "Scatter Plot", "Histogram", "Box Plot"],
325
+ help="Choose the type of visualization"
326
+ )
327
+
328
+ col1, col2, col3 = st.columns(3)
329
+
330
+ with col1:
331
+ if chart_type in ["Bar Chart", "Line Chart", "Scatter Plot", "Box Plot"]:
332
+ x_column = st.selectbox(
333
+ "X-axis Column",
334
+ options=df.columns.tolist(),
335
+ help="Select column for X-axis"
336
+ )
337
+ else:
338
+ x_column = st.selectbox(
339
+ "Column to Analyze",
340
+ options=numeric_cols,
341
+ help="Select numeric column for histogram"
342
+ )
343
+
344
+ with col2:
345
+ if chart_type in ["Bar Chart", "Line Chart", "Scatter Plot", "Box Plot"]:
346
+ y_column = st.selectbox(
347
+ "Y-axis Column",
348
+ options=numeric_cols,
349
+ help="Select numeric column for Y-axis"
350
+ )
351
+ else:
352
+ y_column = None
353
+
354
+ with col3:
355
+ if chart_type in ["Bar Chart", "Scatter Plot", "Box Plot"]:
356
+ color_column = st.selectbox(
357
+ "Color/Group By (Optional)",
358
+ options=[None] + cat_cols,
359
+ help="Select column to group/color data"
360
+ )
361
+ else:
362
+ color_column = None
363
+
364
+ # Create visualization
365
+ if chart_type == "Bar Chart" and x_column and y_column:
366
+ if x_column in cat_cols:
367
+ # Aggregate data for categorical x-axis
368
+ agg_df = filtered_df.groupby(x_column)[y_column].mean().reset_index()
369
+ fig = px.bar(
370
+ agg_df,
371
+ x=x_column,
372
+ y=y_column,
373
+ title=f"Average {y_column} by {x_column}",
374
+ color=color_column if color_column and color_column in agg_df.columns else None
375
+ )
376
+ else:
377
+ fig = px.bar(
378
+ filtered_df,
379
+ x=x_column,
380
+ y=y_column,
381
+ title=f"{y_column} vs {x_column}",
382
+ color=color_column
383
+ )
384
+
385
+ elif chart_type == "Line Chart" and x_column and y_column:
386
+ fig = px.line(
387
+ filtered_df,
388
+ x=x_column,
389
+ y=y_column,
390
+ title=f"{y_column} vs {x_column}",
391
+ color=color_column
392
+ )
393
+
394
+ elif chart_type == "Scatter Plot" and x_column and y_column:
395
+ fig = px.scatter(
396
+ filtered_df,
397
+ x=x_column,
398
+ y=y_column,
399
+ title=f"{y_column} vs {x_column}",
400
+ color=color_column,
401
+ size=y_column if y_column in numeric_cols else None
402
+ )
403
+
404
+ elif chart_type == "Histogram" and x_column:
405
+ fig = px.histogram(
406
+ filtered_df,
407
+ x=x_column,
408
+ title=f"Distribution of {x_column}",
409
+ nbins=30
410
+ )
411
+
412
+ elif chart_type == "Box Plot" and x_column and y_column:
413
+ fig = px.box(
414
+ filtered_df,
415
+ x=x_column,
416
+ y=y_column,
417
+ title=f"{y_column} Distribution by {x_column}",
418
+ color=color_column
419
+ )
420
+
421
+ else:
422
+ st.warning("Please select appropriate columns for the chosen chart type.")
423
+ return
424
+
425
+ # Update layout for better appearance
426
+ fig.update_layout(
427
+ height=500,
428
+ showlegend=True,
429
+ title_x=0.5,
430
+ font=dict(size=12)
431
+ )
432
+
433
+ # Display chart
434
+ st.plotly_chart(fig, use_container_width=True)
435
+
436
+ # Download section
437
+ st.markdown("---")
438
+ st.subheader("💾 Download Options")
439
+
440
+ col1, col2 = st.columns(2)
441
+
442
+ with col1:
443
+ st.markdown("**Download Filtered Data**")
444
+ if st.button("Generate CSV Download Link"):
445
+ download_link = create_download_link(filtered_df, f"filtered_data_{datetime.now().strftime('%Y%m%d_%H%M%S')}.csv")
446
+ st.markdown(download_link, unsafe_allow_html=True)
447
+
448
+ with col2:
449
+ st.markdown("**Download Chart**")
450
+ if st.button("Download Chart as HTML"):
451
+ html_string = fig.to_html(include_plotlyjs='cdn')
452
+ st.download_button(
453
+ label="Download HTML",
454
+ data=html_string,
455
+ file_name=f"chart_{datetime.now().strftime('%Y%m%d_%H%M%S')}.html",
456
+ mime="text/html"
457
+ )
458
+
459
+ # Additional insights
460
+ if len(filtered_df) > 0:
461
+ st.markdown("---")
462
+ st.subheader("🔍 Quick Insights")
463
+
464
+ col1, col2 = st.columns(2)
465
+
466
+ with col1:
467
+ st.markdown("**Data Overview**")
468
+ st.write(f"• Total records: {len(filtered_df):,}")
469
+ st.write(f"• Columns: {len(filtered_df.columns)}")
470
+
471
+ if numeric_cols:
472
+ st.write(f"• Numeric columns: {len(numeric_cols)}")
473
+ for col in numeric_cols[:3]:
474
+ mean_val = filtered_df[col].mean()
475
+ st.write(f" - {col}: avg = {mean_val:.2f}")
476
+
477
+ with col2:
478
+ st.markdown("**Missing Data**")
479
+ missing_data = filtered_df.isnull().sum()
480
+ if missing_data.sum() > 0:
481
+ for col, missing in missing_data.items():
482
+ if missing > 0:
483
+ pct = (missing / len(filtered_df)) * 100
484
+ st.write(f"• {col}: {missing} ({pct:.1f}%)")
485
+ else:
486
+ st.write("✅ No missing data found")
487
+
488
+ if __name__ == "__main__":
489
+ main()
requirements.txt CHANGED
@@ -1,3 +1,4 @@
1
- altair
2
- pandas
3
- streamlit
 
 
1
+ streamlit>=1.28.0
2
+ pandas>=2.0.0
3
+ plotly>=5.15.0
4
+ numpy>=1.24.0
src/streamlit_app.py DELETED
@@ -1,40 +0,0 @@
1
- import altair as alt
2
- import numpy as np
3
- import pandas as pd
4
- import streamlit as st
5
-
6
- """
7
- # Welcome to Streamlit!
8
-
9
- Edit `/streamlit_app.py` to customize this app to your heart's desire :heart:.
10
- If you have any questions, checkout our [documentation](https://docs.streamlit.io) and [community
11
- forums](https://discuss.streamlit.io).
12
-
13
- In the meantime, below is an example of what you can do with just a few lines of code:
14
- """
15
-
16
- num_points = st.slider("Number of points in spiral", 1, 10000, 1100)
17
- num_turns = st.slider("Number of turns in spiral", 1, 300, 31)
18
-
19
- indices = np.linspace(0, 1, num_points)
20
- theta = 2 * np.pi * num_turns * indices
21
- radius = indices
22
-
23
- x = radius * np.cos(theta)
24
- y = radius * np.sin(theta)
25
-
26
- df = pd.DataFrame({
27
- "x": x,
28
- "y": y,
29
- "idx": indices,
30
- "rand": np.random.randn(num_points),
31
- })
32
-
33
- st.altair_chart(alt.Chart(df, height=700, width=700)
34
- .mark_point(filled=True)
35
- .encode(
36
- x=alt.X("x", axis=None),
37
- y=alt.Y("y", axis=None),
38
- color=alt.Color("idx", legend=None, scale=alt.Scale()),
39
- size=alt.Size("rand", legend=None, scale=alt.Scale(range=[1, 150])),
40
- ))