pilotstuki Claude commited on
Commit
002262c
·
0 Parent(s):

Initial commit: IIS Log Performance Analyzer

Browse files

Add complete Streamlit application for analyzing large IIS log files:
- High-performance log parsing with Polars
- Interactive web UI with Streamlit
- Comprehensive metrics and visualizations
- Support for multi-file analysis
- Smart filtering for monitoring requests

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

Files changed (7) hide show
  1. .gitignore +44 -0
  2. README.md +208 -0
  3. app.py +499 -0
  4. log_parser.py +419 -0
  5. requirements.txt +8 -0
  6. run.sh +20 -0
  7. test_parser.py +118 -0
.gitignore ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Python
2
+ __pycache__/
3
+ *.py[cod]
4
+ *$py.class
5
+ *.so
6
+ .Python
7
+ env/
8
+ venv/
9
+ ENV/
10
+ build/
11
+ develop-eggs/
12
+ dist/
13
+ downloads/
14
+ eggs/
15
+ .eggs/
16
+ lib/
17
+ lib64/
18
+ parts/
19
+ sdist/
20
+ var/
21
+ wheels/
22
+ *.egg-info/
23
+ .installed.cfg
24
+ *.egg
25
+
26
+ # Streamlit
27
+ .streamlit/
28
+
29
+ # Log files (example/sample files - users will upload their own)
30
+ *.log
31
+
32
+ # PDF files (example reports/analysis)
33
+ *.pdf
34
+
35
+ # IDE
36
+ .vscode/
37
+ .idea/
38
+ *.swp
39
+ *.swo
40
+ *~
41
+
42
+ # OS
43
+ .DS_Store
44
+ Thumbs.db
README.md ADDED
@@ -0,0 +1,208 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # IIS Log Performance Analyzer
2
+
3
+ High-performance web application for analyzing large IIS log files (200MB-1GB+). Built with Streamlit and Polars for fast, efficient processing.
4
+
5
+ **GitHub Repository**: [https://github.com/pilot-stuk/odata_log_parser](https://github.com/pilot-stuk/odata_log_parser)
6
+
7
+ **Live Demo**: Deploy on [Streamlit Cloud](https://streamlit.io/cloud)
8
+
9
+ ## Features
10
+
11
+ - **Fast Processing**: Uses Polars library for 10-100x faster parsing compared to pandas
12
+ - **Large File Support**: Efficiently handles files up to 1GB+
13
+ - **Comprehensive Metrics**:
14
+ - Total requests (before/after filtering)
15
+ - Error rates and breakdown by status code
16
+ - Response time statistics (min/max/avg)
17
+ - Slow request detection (configurable threshold)
18
+ - Peak RPS (Requests Per Second) with timestamp
19
+ - Top methods by request count and response time
20
+ - **Multi-File Analysis**: Upload and compare multiple log files side-by-side
21
+ - **Interactive Visualizations**: Charts and graphs using Plotly
22
+ - **Smart Filtering**: Automatically excludes monitoring requests (Zabbix HEAD) and 401 unauthorized
23
+
24
+ ## Requirements
25
+
26
+ - Python 3.8+
27
+ - See `requirements.txt` for package dependencies
28
+
29
+ ## Installation
30
+
31
+ ### Local Installation
32
+
33
+ 1. Clone the repository:
34
+ ```bash
35
+ git clone https://github.com/pilot-stuk/odata_log_parser.git
36
+ cd odata_log_parser
37
+ ```
38
+
39
+ 2. Install dependencies:
40
+ ```bash
41
+ pip install -r requirements.txt
42
+ ```
43
+
44
+ ### Deploy to Streamlit Cloud
45
+
46
+ 1. Fork or clone this repository to your GitHub account
47
+ 2. Go to [share.streamlit.io](https://share.streamlit.io/)
48
+ 3. Sign in with your GitHub account
49
+ 4. Click "New app"
50
+ 5. Select your repository: `pilot-stuk/odata_log_parser`
51
+ 6. Set the main file path: `app.py`
52
+ 7. Click "Deploy"
53
+
54
+ The app will be live at: `https://share.streamlit.io/pilot-stuki/odata_log_parser/main/app.py`
55
+
56
+ ## Usage
57
+
58
+ ### Run the Streamlit App
59
+
60
+ ```bash
61
+ streamlit run app.py
62
+ ```
63
+
64
+ The application will open in your browser at `http://localhost:8501`
65
+
66
+ ### Upload Log Files
67
+
68
+ 1. Click "Browse files" in the sidebar
69
+ 2. Select one or more IIS log files (.log or .txt)
70
+ 3. View the analysis results
71
+
72
+ ### Configuration Options
73
+
74
+ - **Upload Mode**: Single or Multiple files
75
+ - **Top N Methods**: Number of top methods to display (3-20)
76
+ - **Slow Request Threshold**: Configure what constitutes a "slow" request (default: 3000ms)
77
+
78
+ ## Log Format
79
+
80
+ This tool supports **IIS W3C Extended Log Format** with the following fields:
81
+
82
+ ```
83
+ date time s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip
84
+ cs(User-Agent) cs(Referer) sc-status sc-substatus sc-win32-status time-taken
85
+ ```
86
+
87
+ Example log line:
88
+ ```
89
+ 2025-09-22 00:00:46 10.21.31.42 GET /Service/Contact/Get sessionid='xxx' 443 - 212.233.92.232 - - 200 0 0 24
90
+ ```
91
+
92
+ ## Filtering Rules
93
+
94
+ The analyzer applies the following filters automatically:
95
+
96
+ 1. **Monitoring Exclusion**: Lines containing both `HEAD` method and `Zabbix` are excluded
97
+ 2. **401 Handling**: 401 Unauthorized responses are excluded from error counts (considered authentication attempts, not system errors)
98
+ 3. **Error Definition**: Errors are HTTP status codes ≠ 200 and ≠ 401
99
+
100
+ ## Metrics Explained
101
+
102
+ | Metric | Description |
103
+ |--------|-------------|
104
+ | **Total Requests (before filtering)** | Raw number of log entries |
105
+ | **Excluded Requests** | Lines filtered out (HEAD+Zabbix + 401) |
106
+ | **Processed Requests** | Valid requests included in analysis |
107
+ | **Errors** | Requests with status ≠ 200 and ≠ 401 |
108
+ | **Slow Requests** | Requests exceeding threshold (default: 3000ms) |
109
+ | **Peak RPS** | Maximum requests per second observed |
110
+ | **Avg/Max/Min Response Time** | Response time statistics in milliseconds |
111
+
112
+ ## Performance
113
+
114
+ - **Small files** (<50MB): Process in seconds
115
+ - **Medium files** (50-200MB): Process in 10-30 seconds
116
+ - **Large files** (200MB-1GB): Process in 1-3 minutes
117
+
118
+ Performance depends on:
119
+ - File size
120
+ - Number of log entries
121
+ - System CPU and RAM
122
+ - Disk I/O speed
123
+
124
+ ## Architecture
125
+
126
+ ```
127
+ app.py # Streamlit UI application
128
+ log_parser.py # Core parsing and analysis logic using Polars
129
+ requirements.txt # Python dependencies
130
+ README.md # This file
131
+ ```
132
+
133
+ ### Key Components
134
+
135
+ - **IISLogParser**: Parses IIS W3C log format into Polars DataFrame
136
+ - **LogAnalyzer**: Calculates metrics and statistics
137
+ - **Streamlit UI**: Interactive web interface with visualizations
138
+
139
+ ## Use Cases
140
+
141
+ - **Performance Analysis**: Identify slow endpoints and response time patterns
142
+ - **Error Investigation**: Track error rates and problematic methods
143
+ - **Capacity Planning**: Analyze peak load and RPS patterns
144
+ - **Service Comparison**: Compare performance across multiple services
145
+ - **Incident Review**: Analyze logs from specific time periods
146
+
147
+ ## Troubleshooting
148
+
149
+ ### Large File Upload Issues
150
+
151
+ If Streamlit has trouble with very large files (>500MB):
152
+
153
+ 1. Increase Streamlit's upload size limit:
154
+ ```bash
155
+ streamlit run app.py --server.maxUploadSize=1024
156
+ ```
157
+
158
+ 2. Or modify `.streamlit/config.toml`:
159
+ ```toml
160
+ [server]
161
+ maxUploadSize = 1024
162
+ ```
163
+
164
+ ### Memory Issues
165
+
166
+ For files >1GB, you may need to:
167
+ - Increase available system memory
168
+ - Process files in smaller chunks
169
+ - Use the CLI version (can be developed if needed)
170
+
171
+ ### Performance Tips
172
+
173
+ - Close other memory-intensive applications
174
+ - Process files one at a time for very large files
175
+ - Use SSD for faster I/O
176
+ - Ensure adequate RAM (8GB+ recommended for 1GB files)
177
+
178
+ ## Future Enhancements
179
+
180
+ Potential features for future versions:
181
+ - CLI tool for batch processing
182
+ - Export results to PDF/Excel
183
+ - Real-time log monitoring
184
+ - Custom metric definitions
185
+ - Time range filtering
186
+ - IP address analysis
187
+ - Session tracking
188
+
189
+ ## Example Output
190
+
191
+ The application generates:
192
+
193
+ 1. **Summary Table**: Key metrics for each log file
194
+ 2. **Top Methods Chart**: Most frequently called endpoints
195
+ 3. **Response Time Distribution**: Histogram of response times
196
+ 4. **Error Breakdown**: Pie chart of error types
197
+ 5. **Service Comparison**: Side-by-side comparison for multiple files
198
+
199
+ ## License
200
+
201
+ This tool is provided as-is for log analysis purposes.
202
+
203
+ ## Support
204
+
205
+ For issues or questions:
206
+ 1. Check log file format matches IIS W3C Extended format
207
+ 2. Verify all required fields are present
208
+ 3. Ensure Python and dependencies are correctly installed
app.py ADDED
@@ -0,0 +1,499 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ IIS Log Analyzer - Streamlit Application
3
+ High-performance log analysis tool for large IIS log files (200MB-1GB+)
4
+ """
5
+
6
+ import streamlit as st
7
+ import plotly.graph_objects as go
8
+ import plotly.express as px
9
+ from plotly.subplots import make_subplots
10
+ import pandas as pd
11
+ from pathlib import Path
12
+ import tempfile
13
+ from typing import List
14
+ import time
15
+
16
+ from log_parser import IISLogParser, LogAnalyzer, analyze_multiple_logs
17
+
18
+
19
+ # Page configuration
20
+ st.set_page_config(
21
+ page_title="IIS Log Analyzer",
22
+ page_icon="📊",
23
+ layout="wide",
24
+ initial_sidebar_state="expanded"
25
+ )
26
+
27
+ # Custom CSS
28
+ st.markdown("""
29
+ <style>
30
+ .metric-card {
31
+ background-color: #f0f2f6;
32
+ padding: 20px;
33
+ border-radius: 10px;
34
+ margin: 10px 0;
35
+ }
36
+ .error-metric {
37
+ background-color: #ffebee;
38
+ }
39
+ .success-metric {
40
+ background-color: #e8f5e9;
41
+ }
42
+ .warning-metric {
43
+ background-color: #fff3e0;
44
+ }
45
+ </style>
46
+ """, unsafe_allow_html=True)
47
+
48
+
49
+ def format_number(num: int) -> str:
50
+ """Format large numbers with thousand separators."""
51
+ return f"{num:,}"
52
+
53
+
54
+ def create_summary_table(stats: dict) -> pd.DataFrame:
55
+ """Create summary statistics table."""
56
+ data = {
57
+ "Metric": [
58
+ "Total Requests (before filtering)",
59
+ "Excluded Requests (HEAD+Zabbix + 401)",
60
+ "Processed Requests",
61
+ "Errors (≠200, ≠401)",
62
+ "Slow Requests (>3s)",
63
+ "Peak RPS",
64
+ "Peak Timestamp",
65
+ "Avg Response Time (ms)",
66
+ "Max Response Time (ms)",
67
+ "Min Response Time (ms)",
68
+ ],
69
+ "Value": [
70
+ format_number(stats["total_requests_before"]),
71
+ format_number(stats["excluded_requests"]),
72
+ format_number(stats["total_requests_after"]),
73
+ format_number(stats["errors"]),
74
+ format_number(stats["slow_requests"]),
75
+ format_number(stats["peak_rps"]),
76
+ stats["peak_timestamp"] or "N/A",
77
+ format_number(stats["avg_time_ms"]),
78
+ format_number(stats["max_time_ms"]),
79
+ format_number(stats["min_time_ms"]),
80
+ ]
81
+ }
82
+ return pd.DataFrame(data)
83
+
84
+
85
+ def create_response_time_chart(dist: dict, title: str) -> go.Figure:
86
+ """Create response time distribution chart."""
87
+ labels = list(dist.keys())
88
+ values = list(dist.values())
89
+
90
+ fig = go.Figure(data=[
91
+ go.Bar(
92
+ x=labels,
93
+ y=values,
94
+ marker_color='lightblue',
95
+ text=values,
96
+ textposition='auto',
97
+ )
98
+ ])
99
+
100
+ fig.update_layout(
101
+ title=title,
102
+ xaxis_title="Response Time Range",
103
+ yaxis_title="Request Count",
104
+ height=400,
105
+ showlegend=False
106
+ )
107
+
108
+ return fig
109
+
110
+
111
+ def create_top_methods_chart(methods: List[dict], title: str) -> go.Figure:
112
+ """Create top methods bar chart."""
113
+ if not methods:
114
+ return go.Figure()
115
+
116
+ df = pd.DataFrame(methods)
117
+
118
+ fig = make_subplots(
119
+ rows=1, cols=2,
120
+ subplot_titles=("Request Count", "Avg Response Time (ms)")
121
+ )
122
+
123
+ # Request count
124
+ fig.add_trace(
125
+ go.Bar(
126
+ x=df["method_name"],
127
+ y=df["count"],
128
+ name="Count",
129
+ marker_color='steelblue',
130
+ text=df["count"],
131
+ textposition='auto',
132
+ ),
133
+ row=1, col=1
134
+ )
135
+
136
+ # Average time
137
+ fig.add_trace(
138
+ go.Bar(
139
+ x=df["method_name"],
140
+ y=df["avg_time"].round(1),
141
+ name="Avg Time",
142
+ marker_color='coral',
143
+ text=df["avg_time"].round(1),
144
+ textposition='auto',
145
+ ),
146
+ row=1, col=2
147
+ )
148
+
149
+ fig.update_layout(
150
+ title_text=title,
151
+ height=400,
152
+ showlegend=False
153
+ )
154
+
155
+ return fig
156
+
157
+
158
+ def create_metrics_comparison(individual_stats: List[dict]) -> go.Figure:
159
+ """Create comparison chart for multiple services."""
160
+ services = [s["summary"]["service_name"] for s in individual_stats]
161
+ requests = [s["summary"]["total_requests_after"] for s in individual_stats]
162
+ errors = [s["summary"]["errors"] for s in individual_stats]
163
+ avg_times = [s["summary"]["avg_time_ms"] for s in individual_stats]
164
+
165
+ fig = make_subplots(
166
+ rows=1, cols=3,
167
+ subplot_titles=("Processed Requests", "Errors", "Avg Response Time (ms)"),
168
+ specs=[[{"type": "bar"}, {"type": "bar"}, {"type": "bar"}]]
169
+ )
170
+
171
+ fig.add_trace(
172
+ go.Bar(x=services, y=requests, marker_color='lightblue', text=requests, textposition='auto'),
173
+ row=1, col=1
174
+ )
175
+
176
+ fig.add_trace(
177
+ go.Bar(x=services, y=errors, marker_color='salmon', text=errors, textposition='auto'),
178
+ row=1, col=2
179
+ )
180
+
181
+ fig.add_trace(
182
+ go.Bar(x=services, y=avg_times, marker_color='lightgreen', text=avg_times, textposition='auto'),
183
+ row=1, col=3
184
+ )
185
+
186
+ fig.update_layout(
187
+ title_text="Service Comparison",
188
+ height=400,
189
+ showlegend=False
190
+ )
191
+
192
+ return fig
193
+
194
+
195
+ def process_log_file(file_path: str, service_name: str = None) -> dict:
196
+ """Process a single log file and return statistics."""
197
+ parser = IISLogParser(file_path)
198
+ if service_name:
199
+ parser.service_name = service_name
200
+
201
+ with st.spinner(f"Parsing {Path(file_path).name}..."):
202
+ df = parser.parse()
203
+
204
+ if df.height == 0:
205
+ st.error(f"No valid log entries found in {Path(file_path).name}")
206
+ return None
207
+
208
+ with st.spinner(f"Analyzing {parser.service_name}..."):
209
+ analyzer = LogAnalyzer(df, parser.service_name)
210
+
211
+ stats = {
212
+ "summary": analyzer.get_summary_stats(),
213
+ "top_methods": analyzer.get_top_methods(),
214
+ "error_breakdown": analyzer.get_error_breakdown(),
215
+ "errors_by_method": analyzer.get_errors_by_method(n=10),
216
+ "response_time_dist": analyzer.get_response_time_distribution(),
217
+ "analyzer": analyzer, # Keep reference for detailed error queries
218
+ }
219
+
220
+ return stats
221
+
222
+
223
+ def main():
224
+ st.title("📊 IIS Log Performance Analyzer")
225
+ st.markdown("High-performance analysis tool for large IIS log files (up to 1GB+)")
226
+
227
+ # Sidebar
228
+ st.sidebar.header("Configuration")
229
+
230
+ # File upload mode
231
+ upload_mode = st.sidebar.radio(
232
+ "Upload Mode",
233
+ ["Single File", "Multiple Files"],
234
+ help="Analyze one or multiple log files"
235
+ )
236
+
237
+ # File uploader
238
+ if upload_mode == "Single File":
239
+ uploaded_files = st.sidebar.file_uploader(
240
+ "Upload IIS Log File",
241
+ type=["log", "txt"],
242
+ help="Upload IIS W3C Extended format log file"
243
+ )
244
+ uploaded_files = [uploaded_files] if uploaded_files else []
245
+ else:
246
+ uploaded_files = st.sidebar.file_uploader(
247
+ "Upload IIS Log Files",
248
+ type=["log", "txt"],
249
+ accept_multiple_files=True,
250
+ help="Upload multiple IIS log files for comparison"
251
+ )
252
+
253
+ # Analysis options
254
+ st.sidebar.header("Analysis Options")
255
+ show_top_n = st.sidebar.slider("Top N Methods", 3, 20, 5)
256
+ slow_threshold = st.sidebar.number_input(
257
+ "Slow Request Threshold (ms)",
258
+ min_value=100,
259
+ max_value=10000,
260
+ value=3000,
261
+ step=100
262
+ )
263
+
264
+ # Process files
265
+ if uploaded_files:
266
+ st.info(f"Processing {len(uploaded_files)} file(s)...")
267
+
268
+ # Save uploaded files to temp directory
269
+ temp_files = []
270
+ for uploaded_file in uploaded_files:
271
+ with tempfile.NamedTemporaryFile(delete=False, suffix=".log") as tmp:
272
+ tmp.write(uploaded_file.getvalue())
273
+ temp_files.append(tmp.name)
274
+
275
+ start_time = time.time()
276
+
277
+ # Process each file
278
+ all_stats = []
279
+ for i, temp_file in enumerate(temp_files):
280
+ file_name = uploaded_files[i].name
281
+ st.subheader(f"📄 {file_name}")
282
+
283
+ stats = process_log_file(temp_file, None)
284
+ if stats:
285
+ all_stats.append(stats)
286
+
287
+ # Display summary metrics
288
+ col1, col2, col3, col4 = st.columns(4)
289
+ with col1:
290
+ st.metric(
291
+ "Total Requests",
292
+ format_number(stats["summary"]["total_requests_after"])
293
+ )
294
+ with col2:
295
+ st.metric(
296
+ "Errors",
297
+ format_number(stats["summary"]["errors"]),
298
+ delta=None,
299
+ delta_color="inverse"
300
+ )
301
+ with col3:
302
+ st.metric(
303
+ "Avg Time (ms)",
304
+ format_number(stats["summary"]["avg_time_ms"])
305
+ )
306
+ with col4:
307
+ st.metric(
308
+ "Peak RPS",
309
+ format_number(stats["summary"]["peak_rps"])
310
+ )
311
+
312
+ # Tabs for detailed analysis
313
+ tab1, tab2, tab3, tab4, tab5 = st.tabs([
314
+ "Summary", "Top Methods", "Response Time", "Error Breakdown", "Errors by Method"
315
+ ])
316
+
317
+ with tab1:
318
+ st.dataframe(
319
+ create_summary_table(stats["summary"]),
320
+ hide_index=True,
321
+ use_container_width=True
322
+ )
323
+
324
+ with tab2:
325
+ if stats["top_methods"]:
326
+ st.plotly_chart(
327
+ create_top_methods_chart(
328
+ stats["top_methods"][:show_top_n],
329
+ f"Top {show_top_n} Methods - {stats['summary']['service_name']}"
330
+ ),
331
+ use_container_width=True
332
+ )
333
+
334
+ # Show table
335
+ methods_df = pd.DataFrame(stats["top_methods"][:show_top_n])
336
+ methods_df["avg_time"] = methods_df["avg_time"].round(1)
337
+ st.dataframe(methods_df, hide_index=True, use_container_width=True)
338
+ else:
339
+ st.info("No method data available")
340
+
341
+ with tab3:
342
+ if stats["response_time_dist"]:
343
+ st.plotly_chart(
344
+ create_response_time_chart(
345
+ stats["response_time_dist"],
346
+ f"Response Time Distribution - {stats['summary']['service_name']}"
347
+ ),
348
+ use_container_width=True
349
+ )
350
+ else:
351
+ st.info("No response time distribution data")
352
+
353
+ with tab4:
354
+ if stats["error_breakdown"]:
355
+ error_df = pd.DataFrame(stats["error_breakdown"])
356
+ error_df.columns = ["Status Code", "Count"]
357
+ st.dataframe(error_df, hide_index=True, use_container_width=True)
358
+
359
+ # Pie chart
360
+ fig = px.pie(
361
+ error_df,
362
+ values="Count",
363
+ names="Status Code",
364
+ title=f"Error Distribution - {stats['summary']['service_name']}"
365
+ )
366
+ st.plotly_chart(fig, use_container_width=True)
367
+ else:
368
+ st.success("No errors found! ✓")
369
+
370
+ with tab5:
371
+ st.markdown("### 🔍 Errors by Method")
372
+ st.markdown("This view shows which specific methods are causing errors, with full context for debugging.")
373
+
374
+ if stats["errors_by_method"]:
375
+ # Display summary table
376
+ errors_method_df = pd.DataFrame(stats["errors_by_method"])
377
+ errors_method_df["error_rate_percent"] = errors_method_df["error_rate_percent"].round(2)
378
+ errors_method_df["avg_response_time_ms"] = errors_method_df["avg_response_time_ms"].round(1)
379
+
380
+ # Rename columns for better display
381
+ errors_method_df.columns = [
382
+ "Method Path", "Total Calls", "Error Count",
383
+ "Most Common Error", "Avg Response Time (ms)", "Error Rate (%)"
384
+ ]
385
+
386
+ st.dataframe(errors_method_df, hide_index=True, use_container_width=True)
387
+
388
+ # Bar chart of top error-prone methods
389
+ fig = go.Figure()
390
+ fig.add_trace(go.Bar(
391
+ x=errors_method_df["Method Path"],
392
+ y=errors_method_df["Error Count"],
393
+ marker_color='red',
394
+ text=errors_method_df["Error Count"],
395
+ textposition='auto',
396
+ name="Error Count"
397
+ ))
398
+
399
+ fig.update_layout(
400
+ title=f"Top Error-Prone Methods - {stats['summary']['service_name']}",
401
+ xaxis_title="Method Path",
402
+ yaxis_title="Error Count",
403
+ height=400,
404
+ showlegend=False
405
+ )
406
+ st.plotly_chart(fig, use_container_width=True)
407
+
408
+ # Allow users to drill down into specific methods
409
+ st.markdown("#### 🔎 Detailed Error Logs")
410
+ selected_method = st.selectbox(
411
+ "Select a method to view detailed error logs:",
412
+ options=["All"] + errors_method_df["Method Path"].tolist(),
413
+ key=f"method_select_{file_name}"
414
+ )
415
+
416
+ if selected_method and selected_method != "All":
417
+ error_details = stats["analyzer"].get_error_details(
418
+ method_path=selected_method,
419
+ limit=50
420
+ )
421
+ if error_details:
422
+ details_df = pd.DataFrame(error_details)
423
+ st.dataframe(details_df, hide_index=True, use_container_width=True)
424
+ st.info(f"Showing up to 50 most recent errors for {selected_method}")
425
+ else:
426
+ st.info(f"No error details found for {selected_method}")
427
+ elif selected_method == "All":
428
+ error_details = stats["analyzer"].get_error_details(limit=50)
429
+ if error_details:
430
+ details_df = pd.DataFrame(error_details)
431
+ st.dataframe(details_df, hide_index=True, use_container_width=True)
432
+ st.info("Showing up to 50 most recent errors across all methods")
433
+ else:
434
+ st.success("No errors found in any methods! ✓")
435
+
436
+ st.divider()
437
+
438
+ # Multi-file comparison
439
+ if len(all_stats) > 1:
440
+ st.header("📊 Service Comparison")
441
+ st.plotly_chart(
442
+ create_metrics_comparison(all_stats),
443
+ use_container_width=True
444
+ )
445
+
446
+ # Combined summary
447
+ st.subheader("Combined Statistics")
448
+ combined = {
449
+ "total_requests_before": sum(s["summary"]["total_requests_before"] for s in all_stats),
450
+ "excluded_requests": sum(s["summary"]["excluded_requests"] for s in all_stats),
451
+ "total_requests_after": sum(s["summary"]["total_requests_after"] for s in all_stats),
452
+ "errors": sum(s["summary"]["errors"] for s in all_stats),
453
+ "slow_requests": sum(s["summary"]["slow_requests"] for s in all_stats),
454
+ }
455
+
456
+ col1, col2, col3 = st.columns(3)
457
+ with col1:
458
+ st.metric("Total Requests (All Services)", format_number(combined["total_requests_after"]))
459
+ with col2:
460
+ st.metric("Total Errors (All Services)", format_number(combined["errors"]))
461
+ with col3:
462
+ st.metric("Total Slow Requests (All Services)", format_number(combined["slow_requests"]))
463
+
464
+ processing_time = time.time() - start_time
465
+ st.success(f"✓ Analysis completed in {processing_time:.2f} seconds")
466
+
467
+ # Clean up temp files
468
+ for temp_file in temp_files:
469
+ Path(temp_file).unlink(missing_ok=True)
470
+
471
+ else:
472
+ # Welcome screen
473
+ st.info("👆 Upload one or more IIS log files to begin analysis")
474
+
475
+ st.markdown("""
476
+ ### Features
477
+ - ⚡ **Fast processing** of large files (200MB-1GB+) using Polars
478
+ - 📊 **Comprehensive metrics**: RPS, response times, error rates
479
+ - 🔍 **Detailed analysis**: Top methods, error breakdown, time distribution
480
+ - 📈 **Visual reports**: Interactive charts with Plotly
481
+ - 🔄 **Multi-file support**: Compare multiple services side-by-side
482
+
483
+ ### Log Format
484
+ This tool supports **IIS W3C Extended Log Format** with the following fields:
485
+ ```
486
+ date time s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username
487
+ c-ip cs(User-Agent) cs(Referer) sc-status sc-substatus sc-win32-status time-taken
488
+ ```
489
+
490
+ ### Filtering Rules
491
+ - Excludes lines with both `HEAD` method and `Zabbix` in User-Agent
492
+ - 401 Unauthorized responses are excluded from error counts
493
+ - Errors are defined as status codes ≠ 200 and ≠ 401
494
+ - Slow requests are those with response time > 3000ms (configurable)
495
+ """)
496
+
497
+
498
+ if __name__ == "__main__":
499
+ main()
log_parser.py ADDED
@@ -0,0 +1,419 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ IIS Log Parser using Polars for high-performance processing.
3
+ Handles large log files (200MB-1GB+) efficiently with streaming.
4
+ """
5
+
6
+ import polars as pl
7
+ from pathlib import Path
8
+ from typing import Dict, List, Tuple, Optional
9
+ from datetime import datetime
10
+ import re
11
+
12
+
13
+ class IISLogParser:
14
+ """Parser for IIS W3C Extended Log Format."""
15
+
16
+ # IIS log column names
17
+ COLUMNS = [
18
+ "date", "time", "s_ip", "cs_method", "cs_uri_stem", "cs_uri_query",
19
+ "s_port", "cs_username", "c_ip", "cs_user_agent", "cs_referer",
20
+ "sc_status", "sc_substatus", "sc_win32_status", "time_taken"
21
+ ]
22
+
23
+ def __init__(self, file_path: str):
24
+ self.file_path = Path(file_path)
25
+ self.service_name = None # Will be determined from URI paths during parsing
26
+
27
+ def parse(self, chunk_size: Optional[int] = None) -> pl.DataFrame:
28
+ """
29
+ Parse IIS log file.
30
+
31
+ Args:
32
+ chunk_size: If provided, process in chunks (for very large files)
33
+
34
+ Returns:
35
+ Polars DataFrame with parsed log data
36
+ """
37
+ # Read file, skip comment lines
38
+ with open(self.file_path, 'r', encoding='utf-8', errors='ignore') as f:
39
+ lines = []
40
+ for line in f:
41
+ # Skip header/comment lines starting with #
42
+ if not line.startswith('#'):
43
+ lines.append(line.strip())
44
+
45
+ # Create DataFrame from lines
46
+ if not lines:
47
+ return pl.DataFrame()
48
+
49
+ # Split each line by space and create DataFrame
50
+ data = [line.split() for line in lines if line]
51
+
52
+ # Filter out lines that don't have correct number of columns
53
+ data = [row for row in data if len(row) == len(self.COLUMNS)]
54
+
55
+ if not data:
56
+ return pl.DataFrame()
57
+
58
+ df = pl.DataFrame(data, schema=self.COLUMNS, orient="row")
59
+
60
+ # Convert data types
61
+ df = df.with_columns([
62
+ pl.col("date").cast(pl.Utf8),
63
+ pl.col("time").cast(pl.Utf8),
64
+ pl.col("sc_status").cast(pl.Int32),
65
+ pl.col("sc_substatus").cast(pl.Int32),
66
+ pl.col("sc_win32_status").cast(pl.Int32),
67
+ pl.col("time_taken").cast(pl.Int32),
68
+ ])
69
+
70
+ # Create timestamp column
71
+ df = df.with_columns([
72
+ (pl.col("date") + " " + pl.col("time")).alias("timestamp")
73
+ ])
74
+
75
+ # Convert timestamp to datetime
76
+ df = df.with_columns([
77
+ pl.col("timestamp").str.strptime(pl.Datetime, format="%Y-%m-%d %H:%M:%S")
78
+ ])
79
+
80
+ # Extract service name and method name from URI
81
+ df = df.with_columns([
82
+ self._extract_service_name().alias("service_name"),
83
+ self._extract_method_name().alias("method_name"),
84
+ self._extract_full_method_path().alias("full_method_path")
85
+ ])
86
+
87
+ # Determine the primary service name for this log file
88
+ if df.height > 0:
89
+ # Get the most common service name
90
+ service_counts = df.group_by("service_name").agg([
91
+ pl.count().alias("count")
92
+ ]).sort("count", descending=True)
93
+
94
+ if service_counts.height > 0:
95
+ self.service_name = service_counts.row(0, named=True)["service_name"]
96
+ else:
97
+ self.service_name = "Unknown"
98
+ else:
99
+ self.service_name = "Unknown"
100
+
101
+ return df
102
+
103
+ def _extract_service_name(self) -> pl.Expr:
104
+ """Extract service name from URI stem (e.g., AdministratorOfficeService, CustomerOfficeService)."""
105
+ # Extract the first meaningful part after the leading slash
106
+ # Example: /AdministratorOfficeService/Contact/Get -> AdministratorOfficeService
107
+ return (
108
+ pl.col("cs_uri_stem")
109
+ .str.split("/")
110
+ .list.get(1) # Get first element after leading /
111
+ .fill_null("Unknown")
112
+ )
113
+
114
+ def _extract_full_method_path(self) -> pl.Expr:
115
+ """Extract full method path for better error tracking (e.g., Contact/Get, Order/Create)."""
116
+ # Extract everything after the service name
117
+ # Example: /AdministratorOfficeService/Contact/Get -> Contact/Get
118
+ return (
119
+ pl.col("cs_uri_stem")
120
+ .str.split("/")
121
+ .list.slice(2) # Skip leading / and service name
122
+ .list.join("/")
123
+ .fill_null("Unknown")
124
+ )
125
+
126
+ def _extract_method_name(self) -> pl.Expr:
127
+ """Extract method name from URI stem."""
128
+ # Extract last part of URI path (e.g., /Service/Contact/Get -> Get)
129
+ return pl.col("cs_uri_stem").str.split("/").list.last().fill_null("Unknown")
130
+
131
+
132
+ class LogAnalyzer:
133
+ """Analyze parsed IIS logs and generate performance metrics."""
134
+
135
+ def __init__(self, df: pl.DataFrame, service_name: str = "Unknown"):
136
+ self.df = df
137
+ self.service_name = service_name
138
+ self._filtered_df = None
139
+
140
+ def filter_logs(self) -> pl.DataFrame:
141
+ """
142
+ Apply filtering rules:
143
+ 1. Exclude lines with both HEAD and Zabbix
144
+ 2. Exclude 401 status codes (for error counting)
145
+
146
+ Returns:
147
+ Filtered DataFrame
148
+ """
149
+ if self._filtered_df is not None:
150
+ return self._filtered_df
151
+
152
+ # Filter out HEAD + Zabbix
153
+ filtered = self.df.filter(
154
+ ~(
155
+ (pl.col("cs_method") == "HEAD") &
156
+ (
157
+ pl.col("cs_user_agent").str.contains("Zabbix") |
158
+ pl.col("cs_uri_stem").str.contains("Zabbix")
159
+ )
160
+ )
161
+ )
162
+
163
+ self._filtered_df = filtered
164
+ return filtered
165
+
166
+ def get_summary_stats(self) -> Dict:
167
+ """Get overall summary statistics."""
168
+ df = self.filter_logs()
169
+
170
+ # Count requests
171
+ total_before = self.df.height
172
+ total_after = df.height
173
+ excluded = total_before - total_after
174
+
175
+ # Count 401s separately
176
+ count_401 = self.df.filter(pl.col("sc_status") == 401).height
177
+
178
+ # Count errors (status != 200 and != 401)
179
+ errors = df.filter(
180
+ (pl.col("sc_status") != 200) & (pl.col("sc_status") != 401)
181
+ ).height
182
+
183
+ # Count slow requests (>3000ms)
184
+ slow_requests = df.filter(pl.col("time_taken") > 3000).height
185
+
186
+ # Response time statistics
187
+ time_stats = df.select([
188
+ pl.col("time_taken").min().alias("min_time"),
189
+ pl.col("time_taken").max().alias("max_time"),
190
+ pl.col("time_taken").mean().alias("avg_time"),
191
+ ]).to_dicts()[0]
192
+
193
+ # Peak RPS
194
+ rps_data = self._calculate_peak_rps(df)
195
+
196
+ return {
197
+ "service_name": self.service_name,
198
+ "total_requests_before": total_before,
199
+ "excluded_requests": excluded,
200
+ "excluded_401": count_401,
201
+ "total_requests_after": total_after,
202
+ "errors": errors,
203
+ "slow_requests": slow_requests,
204
+ "min_time_ms": int(time_stats["min_time"]) if time_stats["min_time"] else 0,
205
+ "max_time_ms": int(time_stats["max_time"]) if time_stats["max_time"] else 0,
206
+ "avg_time_ms": int(time_stats["avg_time"]) if time_stats["avg_time"] else 0,
207
+ "peak_rps": rps_data["peak_rps"],
208
+ "peak_timestamp": rps_data["peak_timestamp"],
209
+ }
210
+
211
+ def _calculate_peak_rps(self, df: pl.DataFrame) -> Dict:
212
+ """Calculate peak requests per second."""
213
+ if df.height == 0:
214
+ return {"peak_rps": 0, "peak_timestamp": None}
215
+
216
+ # Group by second and count requests
217
+ rps = df.group_by("timestamp").agg([
218
+ pl.count().alias("count")
219
+ ]).sort("count", descending=True)
220
+
221
+ if rps.height == 0:
222
+ return {"peak_rps": 0, "peak_timestamp": None}
223
+
224
+ peak_row = rps.row(0, named=True)
225
+
226
+ return {
227
+ "peak_rps": peak_row["count"],
228
+ "peak_timestamp": str(peak_row["timestamp"])
229
+ }
230
+
231
+ def get_top_methods(self, n: int = 5) -> List[Dict]:
232
+ """Get top N methods by request count."""
233
+ df = self.filter_logs()
234
+
235
+ if df.height == 0:
236
+ return []
237
+
238
+ # Group by method name
239
+ method_stats = df.group_by("method_name").agg([
240
+ pl.count().alias("count"),
241
+ pl.col("time_taken").mean().alias("avg_time"),
242
+ pl.col("sc_status").filter(
243
+ (pl.col("sc_status") != 200) & (pl.col("sc_status") != 401)
244
+ ).count().alias("errors")
245
+ ]).sort("count", descending=True).limit(n)
246
+
247
+ return method_stats.to_dicts()
248
+
249
+ def get_error_breakdown(self) -> List[Dict]:
250
+ """Get breakdown of errors by status code."""
251
+ df = self.filter_logs()
252
+
253
+ errors = df.filter(
254
+ (pl.col("sc_status") != 200) & (pl.col("sc_status") != 401)
255
+ )
256
+
257
+ if errors.height == 0:
258
+ return []
259
+
260
+ error_stats = errors.group_by("sc_status").agg([
261
+ pl.count().alias("count")
262
+ ]).sort("count", descending=True)
263
+
264
+ return error_stats.to_dicts()
265
+
266
+ def get_errors_by_method(self, n: int = 10) -> List[Dict]:
267
+ """
268
+ Get detailed error breakdown by method with full context.
269
+ Shows which methods are causing the most errors.
270
+
271
+ Args:
272
+ n: Number of top error-prone methods to return
273
+
274
+ Returns:
275
+ List of dicts with method, error count, total calls, and error rate
276
+ """
277
+ df = self.filter_logs()
278
+
279
+ if df.height == 0:
280
+ return []
281
+
282
+ # Get error counts and total counts per full method path
283
+ method_errors = df.group_by("full_method_path").agg([
284
+ pl.count().alias("total_calls"),
285
+ pl.col("sc_status").filter(
286
+ (pl.col("sc_status") != 200) & (pl.col("sc_status") != 401)
287
+ ).count().alias("error_count"),
288
+ pl.col("sc_status").filter(
289
+ (pl.col("sc_status") != 200) & (pl.col("sc_status") != 401)
290
+ ).first().alias("most_common_error_status"),
291
+ pl.col("time_taken").mean().alias("avg_response_time_ms"),
292
+ ]).filter(
293
+ pl.col("error_count") > 0
294
+ ).with_columns([
295
+ (pl.col("error_count") * 100.0 / pl.col("total_calls")).alias("error_rate_percent")
296
+ ]).sort("error_count", descending=True).limit(n)
297
+
298
+ return method_errors.to_dicts()
299
+
300
+ def get_error_details(self, method_path: str = None, limit: int = 100) -> List[Dict]:
301
+ """
302
+ Get detailed error logs with full context for debugging.
303
+
304
+ Args:
305
+ method_path: Optional filter for specific method path
306
+ limit: Maximum number of error records to return
307
+
308
+ Returns:
309
+ List of error records with timestamp, method, status, response time, etc.
310
+ """
311
+ df = self.filter_logs()
312
+
313
+ # Filter for errors only
314
+ errors = df.filter(
315
+ (pl.col("sc_status") != 200) & (pl.col("sc_status") != 401)
316
+ )
317
+
318
+ # Apply method filter if specified
319
+ if method_path:
320
+ errors = errors.filter(pl.col("full_method_path") == method_path)
321
+
322
+ if errors.height == 0:
323
+ return []
324
+
325
+ # Select relevant columns for debugging
326
+ error_details = errors.select([
327
+ "timestamp",
328
+ "service_name",
329
+ "full_method_path",
330
+ "method_name",
331
+ "sc_status",
332
+ "sc_substatus",
333
+ "sc_win32_status",
334
+ "time_taken",
335
+ "c_ip",
336
+ "cs_uri_query"
337
+ ]).sort("timestamp", descending=True).limit(limit)
338
+
339
+ return error_details.to_dicts()
340
+
341
+ def get_response_time_distribution(self, buckets: List[int] = None) -> Dict:
342
+ """Get response time distribution by buckets."""
343
+ if buckets is None:
344
+ buckets = [0, 50, 100, 200, 500, 1000, 3000, 10000]
345
+
346
+ df = self.filter_logs()
347
+
348
+ if df.height == 0:
349
+ return {}
350
+
351
+ distribution = {}
352
+ for i in range(len(buckets) - 1):
353
+ lower = buckets[i]
354
+ upper = buckets[i + 1]
355
+ count = df.filter(
356
+ (pl.col("time_taken") >= lower) & (pl.col("time_taken") < upper)
357
+ ).height
358
+ distribution[f"{lower}-{upper}ms"] = count
359
+
360
+ # Add bucket for values above last threshold
361
+ count = df.filter(pl.col("time_taken") >= buckets[-1]).height
362
+ distribution[f">{buckets[-1]}ms"] = count
363
+
364
+ return distribution
365
+
366
+ def get_rps_timeline(self, interval: str = "1m") -> pl.DataFrame:
367
+ """Get RPS over time with specified interval."""
368
+ df = self.filter_logs()
369
+
370
+ if df.height == 0:
371
+ return pl.DataFrame()
372
+
373
+ # Group by time interval
374
+ timeline = df.group_by_dynamic("timestamp", every=interval).agg([
375
+ pl.count().alias("requests")
376
+ ]).sort("timestamp")
377
+
378
+ return timeline
379
+
380
+
381
+ def analyze_multiple_logs(log_files: List[str]) -> Tuple[Dict, List[Dict]]:
382
+ """
383
+ Analyze multiple log files and generate combined report.
384
+
385
+ Args:
386
+ log_files: List of log file paths
387
+
388
+ Returns:
389
+ Tuple of (combined_stats, individual_stats)
390
+ """
391
+ individual_stats = []
392
+
393
+ for log_file in log_files:
394
+ parser = IISLogParser(log_file)
395
+ df = parser.parse()
396
+ analyzer = LogAnalyzer(df, parser.service_name)
397
+
398
+ stats = {
399
+ "summary": analyzer.get_summary_stats(),
400
+ "top_methods": analyzer.get_top_methods(),
401
+ "error_breakdown": analyzer.get_error_breakdown(),
402
+ "errors_by_method": analyzer.get_errors_by_method(n=10),
403
+ "response_time_dist": analyzer.get_response_time_distribution(),
404
+ "analyzer": analyzer,
405
+ }
406
+
407
+ individual_stats.append(stats)
408
+
409
+ # Calculate combined statistics
410
+ combined = {
411
+ "total_requests_before": sum(s["summary"]["total_requests_before"] for s in individual_stats),
412
+ "excluded_requests": sum(s["summary"]["excluded_requests"] for s in individual_stats),
413
+ "excluded_401": sum(s["summary"]["excluded_401"] for s in individual_stats),
414
+ "total_requests_after": sum(s["summary"]["total_requests_after"] for s in individual_stats),
415
+ "errors": sum(s["summary"]["errors"] for s in individual_stats),
416
+ "slow_requests": sum(s["summary"]["slow_requests"] for s in individual_stats),
417
+ }
418
+
419
+ return combined, individual_stats
requirements.txt ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ # Core dependencies
2
+ streamlit>=1.28.0
3
+ polars>=0.19.0
4
+ plotly>=5.17.0
5
+ pandas>=2.0.0
6
+
7
+ # Optional performance improvements
8
+ pyarrow>=13.0.0
run.sh ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+ # Launch script for IIS Log Analyzer
3
+
4
+ echo "🚀 Starting IIS Log Analyzer..."
5
+ echo ""
6
+
7
+ # Check if dependencies are installed
8
+ if ! python -c "import streamlit" 2>/dev/null; then
9
+ echo "📦 Installing dependencies..."
10
+ pip install -r requirements.txt
11
+ echo ""
12
+ fi
13
+
14
+ # Launch Streamlit app
15
+ echo "✓ Launching web application..."
16
+ echo " URL: http://localhost:8501"
17
+ echo " Press Ctrl+C to stop"
18
+ echo ""
19
+
20
+ streamlit run app.py --server.maxUploadSize=1024
test_parser.py ADDED
@@ -0,0 +1,118 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Test script for the IIS log parser
3
+ """
4
+
5
+ from log_parser import IISLogParser, LogAnalyzer
6
+ import time
7
+
8
+ def test_log_file(file_path: str):
9
+ """Test parsing a single log file."""
10
+ print(f"\n{'='*80}")
11
+ print(f"Testing: {file_path}")
12
+ print(f"{'='*80}")
13
+
14
+ start_time = time.time()
15
+
16
+ # Parse
17
+ parser = IISLogParser(file_path)
18
+ df = parser.parse()
19
+ parse_time = time.time() - start_time
20
+ print(f"Service Name: {parser.service_name}")
21
+ print(f"✓ Parsed {df.height:,} log entries in {parse_time:.2f}s")
22
+
23
+ # Analyze
24
+ analyzer = LogAnalyzer(df, parser.service_name)
25
+ stats = analyzer.get_summary_stats()
26
+
27
+ analyze_time = time.time() - start_time - parse_time
28
+ print(f"✓ Analyzed in {analyze_time:.2f}s")
29
+
30
+ # Display summary
31
+ print(f"\n📊 Summary Statistics:")
32
+ print(f" Total Requests (before): {stats['total_requests_before']:,}")
33
+ print(f" Excluded Requests: {stats['excluded_requests']:,}")
34
+ print(f" Total Requests (after): {stats['total_requests_after']:,}")
35
+ print(f" Errors (≠200,≠401): {stats['errors']:,}")
36
+ print(f" Slow Requests (>3s): {stats['slow_requests']:,}")
37
+ print(f" Peak RPS: {stats['peak_rps']:,} @ {stats['peak_timestamp']}")
38
+ print(f" Avg Response Time: {stats['avg_time_ms']:,}ms")
39
+ print(f" Max Response Time: {stats['max_time_ms']:,}ms")
40
+ print(f" Min Response Time: {stats['min_time_ms']:,}ms")
41
+
42
+ # Top methods
43
+ print(f"\n🔝 Top 5 Methods:")
44
+ top_methods = analyzer.get_top_methods(5)
45
+ for i, method in enumerate(top_methods, 1):
46
+ print(f" {i}. {method['method_name']}")
47
+ print(f" Count: {method['count']:,} | Avg Time: {method['avg_time']:.1f}ms | Errors: {method['errors']}")
48
+
49
+ # Error breakdown
50
+ errors = analyzer.get_error_breakdown()
51
+ if errors:
52
+ print(f"\n❌ Error Breakdown:")
53
+ for error in errors:
54
+ print(f" Status {error['sc_status']}: {error['count']:,} occurrences")
55
+ else:
56
+ print(f"\n✓ No errors found!")
57
+
58
+ # Errors by method
59
+ errors_by_method = analyzer.get_errors_by_method(5)
60
+ if errors_by_method:
61
+ print(f"\n⚠️ Top 5 Error-Prone Methods:")
62
+ for i, method_error in enumerate(errors_by_method, 1):
63
+ print(f" {i}. {method_error['full_method_path']}")
64
+ print(f" Total Calls: {method_error['total_calls']:,} | Errors: {method_error['error_count']:,} | "
65
+ f"Error Rate: {method_error['error_rate_percent']:.2f}% | "
66
+ f"Most Common Error: {method_error.get('most_common_error_status', 'N/A')}")
67
+ else:
68
+ print(f"\n✓ No method errors found!")
69
+
70
+ # Response time distribution
71
+ dist = analyzer.get_response_time_distribution()
72
+ print(f"\n⏱️ Response Time Distribution:")
73
+ for bucket, count in dist.items():
74
+ print(f" {bucket}: {count:,}")
75
+
76
+ total_time = time.time() - start_time
77
+ print(f"\n⏱️ Total processing time: {total_time:.2f}s")
78
+
79
+ return stats
80
+
81
+
82
+ if __name__ == "__main__":
83
+ import sys
84
+
85
+ # Test with both log files
86
+ files = [
87
+ "administrator_rhr_ex250922.log",
88
+ "customer_rhr_ex250922.log"
89
+ ]
90
+
91
+ all_stats = []
92
+ total_start = time.time()
93
+
94
+ for file_path in files:
95
+ try:
96
+ stats = test_log_file(file_path)
97
+ all_stats.append(stats)
98
+ except Exception as e:
99
+ print(f"\n❌ Error processing {file_path}: {e}")
100
+ import traceback
101
+ traceback.print_exc()
102
+
103
+ # Combined summary
104
+ if len(all_stats) > 1:
105
+ print(f"\n{'='*80}")
106
+ print(f"COMBINED STATISTICS")
107
+ print(f"{'='*80}")
108
+ total_requests = sum(s['total_requests_after'] for s in all_stats)
109
+ total_errors = sum(s['errors'] for s in all_stats)
110
+ total_slow = sum(s['slow_requests'] for s in all_stats)
111
+
112
+ print(f"Total Requests (all services): {total_requests:,}")
113
+ print(f"Total Errors (all services): {total_errors:,}")
114
+ print(f"Total Slow Requests (all services): {total_slow:,}")
115
+
116
+ total_elapsed = time.time() - total_start
117
+ print(f"\n⏱️ Total elapsed time: {total_elapsed:.2f}s")
118
+ print(f"\n✓ All tests completed successfully!")