Spaces:
Sleeping
Sleeping
Commit ·
002262c
0
Parent(s):
Initial commit: IIS Log Performance Analyzer
Browse filesAdd complete Streamlit application for analyzing large IIS log files:
- High-performance log parsing with Polars
- Interactive web UI with Streamlit
- Comprehensive metrics and visualizations
- Support for multi-file analysis
- Smart filtering for monitoring requests
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- .gitignore +44 -0
- README.md +208 -0
- app.py +499 -0
- log_parser.py +419 -0
- requirements.txt +8 -0
- run.sh +20 -0
- test_parser.py +118 -0
.gitignore
ADDED
|
@@ -0,0 +1,44 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Python
|
| 2 |
+
__pycache__/
|
| 3 |
+
*.py[cod]
|
| 4 |
+
*$py.class
|
| 5 |
+
*.so
|
| 6 |
+
.Python
|
| 7 |
+
env/
|
| 8 |
+
venv/
|
| 9 |
+
ENV/
|
| 10 |
+
build/
|
| 11 |
+
develop-eggs/
|
| 12 |
+
dist/
|
| 13 |
+
downloads/
|
| 14 |
+
eggs/
|
| 15 |
+
.eggs/
|
| 16 |
+
lib/
|
| 17 |
+
lib64/
|
| 18 |
+
parts/
|
| 19 |
+
sdist/
|
| 20 |
+
var/
|
| 21 |
+
wheels/
|
| 22 |
+
*.egg-info/
|
| 23 |
+
.installed.cfg
|
| 24 |
+
*.egg
|
| 25 |
+
|
| 26 |
+
# Streamlit
|
| 27 |
+
.streamlit/
|
| 28 |
+
|
| 29 |
+
# Log files (example/sample files - users will upload their own)
|
| 30 |
+
*.log
|
| 31 |
+
|
| 32 |
+
# PDF files (example reports/analysis)
|
| 33 |
+
*.pdf
|
| 34 |
+
|
| 35 |
+
# IDE
|
| 36 |
+
.vscode/
|
| 37 |
+
.idea/
|
| 38 |
+
*.swp
|
| 39 |
+
*.swo
|
| 40 |
+
*~
|
| 41 |
+
|
| 42 |
+
# OS
|
| 43 |
+
.DS_Store
|
| 44 |
+
Thumbs.db
|
README.md
ADDED
|
@@ -0,0 +1,208 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# IIS Log Performance Analyzer
|
| 2 |
+
|
| 3 |
+
High-performance web application for analyzing large IIS log files (200MB-1GB+). Built with Streamlit and Polars for fast, efficient processing.
|
| 4 |
+
|
| 5 |
+
**GitHub Repository**: [https://github.com/pilot-stuk/odata_log_parser](https://github.com/pilot-stuk/odata_log_parser)
|
| 6 |
+
|
| 7 |
+
**Live Demo**: Deploy on [Streamlit Cloud](https://streamlit.io/cloud)
|
| 8 |
+
|
| 9 |
+
## Features
|
| 10 |
+
|
| 11 |
+
- **Fast Processing**: Uses Polars library for 10-100x faster parsing compared to pandas
|
| 12 |
+
- **Large File Support**: Efficiently handles files up to 1GB+
|
| 13 |
+
- **Comprehensive Metrics**:
|
| 14 |
+
- Total requests (before/after filtering)
|
| 15 |
+
- Error rates and breakdown by status code
|
| 16 |
+
- Response time statistics (min/max/avg)
|
| 17 |
+
- Slow request detection (configurable threshold)
|
| 18 |
+
- Peak RPS (Requests Per Second) with timestamp
|
| 19 |
+
- Top methods by request count and response time
|
| 20 |
+
- **Multi-File Analysis**: Upload and compare multiple log files side-by-side
|
| 21 |
+
- **Interactive Visualizations**: Charts and graphs using Plotly
|
| 22 |
+
- **Smart Filtering**: Automatically excludes monitoring requests (Zabbix HEAD) and 401 unauthorized
|
| 23 |
+
|
| 24 |
+
## Requirements
|
| 25 |
+
|
| 26 |
+
- Python 3.8+
|
| 27 |
+
- See `requirements.txt` for package dependencies
|
| 28 |
+
|
| 29 |
+
## Installation
|
| 30 |
+
|
| 31 |
+
### Local Installation
|
| 32 |
+
|
| 33 |
+
1. Clone the repository:
|
| 34 |
+
```bash
|
| 35 |
+
git clone https://github.com/pilot-stuk/odata_log_parser.git
|
| 36 |
+
cd odata_log_parser
|
| 37 |
+
```
|
| 38 |
+
|
| 39 |
+
2. Install dependencies:
|
| 40 |
+
```bash
|
| 41 |
+
pip install -r requirements.txt
|
| 42 |
+
```
|
| 43 |
+
|
| 44 |
+
### Deploy to Streamlit Cloud
|
| 45 |
+
|
| 46 |
+
1. Fork or clone this repository to your GitHub account
|
| 47 |
+
2. Go to [share.streamlit.io](https://share.streamlit.io/)
|
| 48 |
+
3. Sign in with your GitHub account
|
| 49 |
+
4. Click "New app"
|
| 50 |
+
5. Select your repository: `pilot-stuk/odata_log_parser`
|
| 51 |
+
6. Set the main file path: `app.py`
|
| 52 |
+
7. Click "Deploy"
|
| 53 |
+
|
| 54 |
+
The app will be live at: `https://share.streamlit.io/pilot-stuki/odata_log_parser/main/app.py`
|
| 55 |
+
|
| 56 |
+
## Usage
|
| 57 |
+
|
| 58 |
+
### Run the Streamlit App
|
| 59 |
+
|
| 60 |
+
```bash
|
| 61 |
+
streamlit run app.py
|
| 62 |
+
```
|
| 63 |
+
|
| 64 |
+
The application will open in your browser at `http://localhost:8501`
|
| 65 |
+
|
| 66 |
+
### Upload Log Files
|
| 67 |
+
|
| 68 |
+
1. Click "Browse files" in the sidebar
|
| 69 |
+
2. Select one or more IIS log files (.log or .txt)
|
| 70 |
+
3. View the analysis results
|
| 71 |
+
|
| 72 |
+
### Configuration Options
|
| 73 |
+
|
| 74 |
+
- **Upload Mode**: Single or Multiple files
|
| 75 |
+
- **Top N Methods**: Number of top methods to display (3-20)
|
| 76 |
+
- **Slow Request Threshold**: Configure what constitutes a "slow" request (default: 3000ms)
|
| 77 |
+
|
| 78 |
+
## Log Format
|
| 79 |
+
|
| 80 |
+
This tool supports **IIS W3C Extended Log Format** with the following fields:
|
| 81 |
+
|
| 82 |
+
```
|
| 83 |
+
date time s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip
|
| 84 |
+
cs(User-Agent) cs(Referer) sc-status sc-substatus sc-win32-status time-taken
|
| 85 |
+
```
|
| 86 |
+
|
| 87 |
+
Example log line:
|
| 88 |
+
```
|
| 89 |
+
2025-09-22 00:00:46 10.21.31.42 GET /Service/Contact/Get sessionid='xxx' 443 - 212.233.92.232 - - 200 0 0 24
|
| 90 |
+
```
|
| 91 |
+
|
| 92 |
+
## Filtering Rules
|
| 93 |
+
|
| 94 |
+
The analyzer applies the following filters automatically:
|
| 95 |
+
|
| 96 |
+
1. **Monitoring Exclusion**: Lines containing both `HEAD` method and `Zabbix` are excluded
|
| 97 |
+
2. **401 Handling**: 401 Unauthorized responses are excluded from error counts (considered authentication attempts, not system errors)
|
| 98 |
+
3. **Error Definition**: Errors are HTTP status codes ≠ 200 and ≠ 401
|
| 99 |
+
|
| 100 |
+
## Metrics Explained
|
| 101 |
+
|
| 102 |
+
| Metric | Description |
|
| 103 |
+
|--------|-------------|
|
| 104 |
+
| **Total Requests (before filtering)** | Raw number of log entries |
|
| 105 |
+
| **Excluded Requests** | Lines filtered out (HEAD+Zabbix + 401) |
|
| 106 |
+
| **Processed Requests** | Valid requests included in analysis |
|
| 107 |
+
| **Errors** | Requests with status ≠ 200 and ≠ 401 |
|
| 108 |
+
| **Slow Requests** | Requests exceeding threshold (default: 3000ms) |
|
| 109 |
+
| **Peak RPS** | Maximum requests per second observed |
|
| 110 |
+
| **Avg/Max/Min Response Time** | Response time statistics in milliseconds |
|
| 111 |
+
|
| 112 |
+
## Performance
|
| 113 |
+
|
| 114 |
+
- **Small files** (<50MB): Process in seconds
|
| 115 |
+
- **Medium files** (50-200MB): Process in 10-30 seconds
|
| 116 |
+
- **Large files** (200MB-1GB): Process in 1-3 minutes
|
| 117 |
+
|
| 118 |
+
Performance depends on:
|
| 119 |
+
- File size
|
| 120 |
+
- Number of log entries
|
| 121 |
+
- System CPU and RAM
|
| 122 |
+
- Disk I/O speed
|
| 123 |
+
|
| 124 |
+
## Architecture
|
| 125 |
+
|
| 126 |
+
```
|
| 127 |
+
app.py # Streamlit UI application
|
| 128 |
+
log_parser.py # Core parsing and analysis logic using Polars
|
| 129 |
+
requirements.txt # Python dependencies
|
| 130 |
+
README.md # This file
|
| 131 |
+
```
|
| 132 |
+
|
| 133 |
+
### Key Components
|
| 134 |
+
|
| 135 |
+
- **IISLogParser**: Parses IIS W3C log format into Polars DataFrame
|
| 136 |
+
- **LogAnalyzer**: Calculates metrics and statistics
|
| 137 |
+
- **Streamlit UI**: Interactive web interface with visualizations
|
| 138 |
+
|
| 139 |
+
## Use Cases
|
| 140 |
+
|
| 141 |
+
- **Performance Analysis**: Identify slow endpoints and response time patterns
|
| 142 |
+
- **Error Investigation**: Track error rates and problematic methods
|
| 143 |
+
- **Capacity Planning**: Analyze peak load and RPS patterns
|
| 144 |
+
- **Service Comparison**: Compare performance across multiple services
|
| 145 |
+
- **Incident Review**: Analyze logs from specific time periods
|
| 146 |
+
|
| 147 |
+
## Troubleshooting
|
| 148 |
+
|
| 149 |
+
### Large File Upload Issues
|
| 150 |
+
|
| 151 |
+
If Streamlit has trouble with very large files (>500MB):
|
| 152 |
+
|
| 153 |
+
1. Increase Streamlit's upload size limit:
|
| 154 |
+
```bash
|
| 155 |
+
streamlit run app.py --server.maxUploadSize=1024
|
| 156 |
+
```
|
| 157 |
+
|
| 158 |
+
2. Or modify `.streamlit/config.toml`:
|
| 159 |
+
```toml
|
| 160 |
+
[server]
|
| 161 |
+
maxUploadSize = 1024
|
| 162 |
+
```
|
| 163 |
+
|
| 164 |
+
### Memory Issues
|
| 165 |
+
|
| 166 |
+
For files >1GB, you may need to:
|
| 167 |
+
- Increase available system memory
|
| 168 |
+
- Process files in smaller chunks
|
| 169 |
+
- Use the CLI version (can be developed if needed)
|
| 170 |
+
|
| 171 |
+
### Performance Tips
|
| 172 |
+
|
| 173 |
+
- Close other memory-intensive applications
|
| 174 |
+
- Process files one at a time for very large files
|
| 175 |
+
- Use SSD for faster I/O
|
| 176 |
+
- Ensure adequate RAM (8GB+ recommended for 1GB files)
|
| 177 |
+
|
| 178 |
+
## Future Enhancements
|
| 179 |
+
|
| 180 |
+
Potential features for future versions:
|
| 181 |
+
- CLI tool for batch processing
|
| 182 |
+
- Export results to PDF/Excel
|
| 183 |
+
- Real-time log monitoring
|
| 184 |
+
- Custom metric definitions
|
| 185 |
+
- Time range filtering
|
| 186 |
+
- IP address analysis
|
| 187 |
+
- Session tracking
|
| 188 |
+
|
| 189 |
+
## Example Output
|
| 190 |
+
|
| 191 |
+
The application generates:
|
| 192 |
+
|
| 193 |
+
1. **Summary Table**: Key metrics for each log file
|
| 194 |
+
2. **Top Methods Chart**: Most frequently called endpoints
|
| 195 |
+
3. **Response Time Distribution**: Histogram of response times
|
| 196 |
+
4. **Error Breakdown**: Pie chart of error types
|
| 197 |
+
5. **Service Comparison**: Side-by-side comparison for multiple files
|
| 198 |
+
|
| 199 |
+
## License
|
| 200 |
+
|
| 201 |
+
This tool is provided as-is for log analysis purposes.
|
| 202 |
+
|
| 203 |
+
## Support
|
| 204 |
+
|
| 205 |
+
For issues or questions:
|
| 206 |
+
1. Check log file format matches IIS W3C Extended format
|
| 207 |
+
2. Verify all required fields are present
|
| 208 |
+
3. Ensure Python and dependencies are correctly installed
|
app.py
ADDED
|
@@ -0,0 +1,499 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
IIS Log Analyzer - Streamlit Application
|
| 3 |
+
High-performance log analysis tool for large IIS log files (200MB-1GB+)
|
| 4 |
+
"""
|
| 5 |
+
|
| 6 |
+
import streamlit as st
|
| 7 |
+
import plotly.graph_objects as go
|
| 8 |
+
import plotly.express as px
|
| 9 |
+
from plotly.subplots import make_subplots
|
| 10 |
+
import pandas as pd
|
| 11 |
+
from pathlib import Path
|
| 12 |
+
import tempfile
|
| 13 |
+
from typing import List
|
| 14 |
+
import time
|
| 15 |
+
|
| 16 |
+
from log_parser import IISLogParser, LogAnalyzer, analyze_multiple_logs
|
| 17 |
+
|
| 18 |
+
|
| 19 |
+
# Page configuration
|
| 20 |
+
st.set_page_config(
|
| 21 |
+
page_title="IIS Log Analyzer",
|
| 22 |
+
page_icon="📊",
|
| 23 |
+
layout="wide",
|
| 24 |
+
initial_sidebar_state="expanded"
|
| 25 |
+
)
|
| 26 |
+
|
| 27 |
+
# Custom CSS
|
| 28 |
+
st.markdown("""
|
| 29 |
+
<style>
|
| 30 |
+
.metric-card {
|
| 31 |
+
background-color: #f0f2f6;
|
| 32 |
+
padding: 20px;
|
| 33 |
+
border-radius: 10px;
|
| 34 |
+
margin: 10px 0;
|
| 35 |
+
}
|
| 36 |
+
.error-metric {
|
| 37 |
+
background-color: #ffebee;
|
| 38 |
+
}
|
| 39 |
+
.success-metric {
|
| 40 |
+
background-color: #e8f5e9;
|
| 41 |
+
}
|
| 42 |
+
.warning-metric {
|
| 43 |
+
background-color: #fff3e0;
|
| 44 |
+
}
|
| 45 |
+
</style>
|
| 46 |
+
""", unsafe_allow_html=True)
|
| 47 |
+
|
| 48 |
+
|
| 49 |
+
def format_number(num: int) -> str:
|
| 50 |
+
"""Format large numbers with thousand separators."""
|
| 51 |
+
return f"{num:,}"
|
| 52 |
+
|
| 53 |
+
|
| 54 |
+
def create_summary_table(stats: dict) -> pd.DataFrame:
|
| 55 |
+
"""Create summary statistics table."""
|
| 56 |
+
data = {
|
| 57 |
+
"Metric": [
|
| 58 |
+
"Total Requests (before filtering)",
|
| 59 |
+
"Excluded Requests (HEAD+Zabbix + 401)",
|
| 60 |
+
"Processed Requests",
|
| 61 |
+
"Errors (≠200, ≠401)",
|
| 62 |
+
"Slow Requests (>3s)",
|
| 63 |
+
"Peak RPS",
|
| 64 |
+
"Peak Timestamp",
|
| 65 |
+
"Avg Response Time (ms)",
|
| 66 |
+
"Max Response Time (ms)",
|
| 67 |
+
"Min Response Time (ms)",
|
| 68 |
+
],
|
| 69 |
+
"Value": [
|
| 70 |
+
format_number(stats["total_requests_before"]),
|
| 71 |
+
format_number(stats["excluded_requests"]),
|
| 72 |
+
format_number(stats["total_requests_after"]),
|
| 73 |
+
format_number(stats["errors"]),
|
| 74 |
+
format_number(stats["slow_requests"]),
|
| 75 |
+
format_number(stats["peak_rps"]),
|
| 76 |
+
stats["peak_timestamp"] or "N/A",
|
| 77 |
+
format_number(stats["avg_time_ms"]),
|
| 78 |
+
format_number(stats["max_time_ms"]),
|
| 79 |
+
format_number(stats["min_time_ms"]),
|
| 80 |
+
]
|
| 81 |
+
}
|
| 82 |
+
return pd.DataFrame(data)
|
| 83 |
+
|
| 84 |
+
|
| 85 |
+
def create_response_time_chart(dist: dict, title: str) -> go.Figure:
|
| 86 |
+
"""Create response time distribution chart."""
|
| 87 |
+
labels = list(dist.keys())
|
| 88 |
+
values = list(dist.values())
|
| 89 |
+
|
| 90 |
+
fig = go.Figure(data=[
|
| 91 |
+
go.Bar(
|
| 92 |
+
x=labels,
|
| 93 |
+
y=values,
|
| 94 |
+
marker_color='lightblue',
|
| 95 |
+
text=values,
|
| 96 |
+
textposition='auto',
|
| 97 |
+
)
|
| 98 |
+
])
|
| 99 |
+
|
| 100 |
+
fig.update_layout(
|
| 101 |
+
title=title,
|
| 102 |
+
xaxis_title="Response Time Range",
|
| 103 |
+
yaxis_title="Request Count",
|
| 104 |
+
height=400,
|
| 105 |
+
showlegend=False
|
| 106 |
+
)
|
| 107 |
+
|
| 108 |
+
return fig
|
| 109 |
+
|
| 110 |
+
|
| 111 |
+
def create_top_methods_chart(methods: List[dict], title: str) -> go.Figure:
|
| 112 |
+
"""Create top methods bar chart."""
|
| 113 |
+
if not methods:
|
| 114 |
+
return go.Figure()
|
| 115 |
+
|
| 116 |
+
df = pd.DataFrame(methods)
|
| 117 |
+
|
| 118 |
+
fig = make_subplots(
|
| 119 |
+
rows=1, cols=2,
|
| 120 |
+
subplot_titles=("Request Count", "Avg Response Time (ms)")
|
| 121 |
+
)
|
| 122 |
+
|
| 123 |
+
# Request count
|
| 124 |
+
fig.add_trace(
|
| 125 |
+
go.Bar(
|
| 126 |
+
x=df["method_name"],
|
| 127 |
+
y=df["count"],
|
| 128 |
+
name="Count",
|
| 129 |
+
marker_color='steelblue',
|
| 130 |
+
text=df["count"],
|
| 131 |
+
textposition='auto',
|
| 132 |
+
),
|
| 133 |
+
row=1, col=1
|
| 134 |
+
)
|
| 135 |
+
|
| 136 |
+
# Average time
|
| 137 |
+
fig.add_trace(
|
| 138 |
+
go.Bar(
|
| 139 |
+
x=df["method_name"],
|
| 140 |
+
y=df["avg_time"].round(1),
|
| 141 |
+
name="Avg Time",
|
| 142 |
+
marker_color='coral',
|
| 143 |
+
text=df["avg_time"].round(1),
|
| 144 |
+
textposition='auto',
|
| 145 |
+
),
|
| 146 |
+
row=1, col=2
|
| 147 |
+
)
|
| 148 |
+
|
| 149 |
+
fig.update_layout(
|
| 150 |
+
title_text=title,
|
| 151 |
+
height=400,
|
| 152 |
+
showlegend=False
|
| 153 |
+
)
|
| 154 |
+
|
| 155 |
+
return fig
|
| 156 |
+
|
| 157 |
+
|
| 158 |
+
def create_metrics_comparison(individual_stats: List[dict]) -> go.Figure:
|
| 159 |
+
"""Create comparison chart for multiple services."""
|
| 160 |
+
services = [s["summary"]["service_name"] for s in individual_stats]
|
| 161 |
+
requests = [s["summary"]["total_requests_after"] for s in individual_stats]
|
| 162 |
+
errors = [s["summary"]["errors"] for s in individual_stats]
|
| 163 |
+
avg_times = [s["summary"]["avg_time_ms"] for s in individual_stats]
|
| 164 |
+
|
| 165 |
+
fig = make_subplots(
|
| 166 |
+
rows=1, cols=3,
|
| 167 |
+
subplot_titles=("Processed Requests", "Errors", "Avg Response Time (ms)"),
|
| 168 |
+
specs=[[{"type": "bar"}, {"type": "bar"}, {"type": "bar"}]]
|
| 169 |
+
)
|
| 170 |
+
|
| 171 |
+
fig.add_trace(
|
| 172 |
+
go.Bar(x=services, y=requests, marker_color='lightblue', text=requests, textposition='auto'),
|
| 173 |
+
row=1, col=1
|
| 174 |
+
)
|
| 175 |
+
|
| 176 |
+
fig.add_trace(
|
| 177 |
+
go.Bar(x=services, y=errors, marker_color='salmon', text=errors, textposition='auto'),
|
| 178 |
+
row=1, col=2
|
| 179 |
+
)
|
| 180 |
+
|
| 181 |
+
fig.add_trace(
|
| 182 |
+
go.Bar(x=services, y=avg_times, marker_color='lightgreen', text=avg_times, textposition='auto'),
|
| 183 |
+
row=1, col=3
|
| 184 |
+
)
|
| 185 |
+
|
| 186 |
+
fig.update_layout(
|
| 187 |
+
title_text="Service Comparison",
|
| 188 |
+
height=400,
|
| 189 |
+
showlegend=False
|
| 190 |
+
)
|
| 191 |
+
|
| 192 |
+
return fig
|
| 193 |
+
|
| 194 |
+
|
| 195 |
+
def process_log_file(file_path: str, service_name: str = None) -> dict:
|
| 196 |
+
"""Process a single log file and return statistics."""
|
| 197 |
+
parser = IISLogParser(file_path)
|
| 198 |
+
if service_name:
|
| 199 |
+
parser.service_name = service_name
|
| 200 |
+
|
| 201 |
+
with st.spinner(f"Parsing {Path(file_path).name}..."):
|
| 202 |
+
df = parser.parse()
|
| 203 |
+
|
| 204 |
+
if df.height == 0:
|
| 205 |
+
st.error(f"No valid log entries found in {Path(file_path).name}")
|
| 206 |
+
return None
|
| 207 |
+
|
| 208 |
+
with st.spinner(f"Analyzing {parser.service_name}..."):
|
| 209 |
+
analyzer = LogAnalyzer(df, parser.service_name)
|
| 210 |
+
|
| 211 |
+
stats = {
|
| 212 |
+
"summary": analyzer.get_summary_stats(),
|
| 213 |
+
"top_methods": analyzer.get_top_methods(),
|
| 214 |
+
"error_breakdown": analyzer.get_error_breakdown(),
|
| 215 |
+
"errors_by_method": analyzer.get_errors_by_method(n=10),
|
| 216 |
+
"response_time_dist": analyzer.get_response_time_distribution(),
|
| 217 |
+
"analyzer": analyzer, # Keep reference for detailed error queries
|
| 218 |
+
}
|
| 219 |
+
|
| 220 |
+
return stats
|
| 221 |
+
|
| 222 |
+
|
| 223 |
+
def main():
|
| 224 |
+
st.title("📊 IIS Log Performance Analyzer")
|
| 225 |
+
st.markdown("High-performance analysis tool for large IIS log files (up to 1GB+)")
|
| 226 |
+
|
| 227 |
+
# Sidebar
|
| 228 |
+
st.sidebar.header("Configuration")
|
| 229 |
+
|
| 230 |
+
# File upload mode
|
| 231 |
+
upload_mode = st.sidebar.radio(
|
| 232 |
+
"Upload Mode",
|
| 233 |
+
["Single File", "Multiple Files"],
|
| 234 |
+
help="Analyze one or multiple log files"
|
| 235 |
+
)
|
| 236 |
+
|
| 237 |
+
# File uploader
|
| 238 |
+
if upload_mode == "Single File":
|
| 239 |
+
uploaded_files = st.sidebar.file_uploader(
|
| 240 |
+
"Upload IIS Log File",
|
| 241 |
+
type=["log", "txt"],
|
| 242 |
+
help="Upload IIS W3C Extended format log file"
|
| 243 |
+
)
|
| 244 |
+
uploaded_files = [uploaded_files] if uploaded_files else []
|
| 245 |
+
else:
|
| 246 |
+
uploaded_files = st.sidebar.file_uploader(
|
| 247 |
+
"Upload IIS Log Files",
|
| 248 |
+
type=["log", "txt"],
|
| 249 |
+
accept_multiple_files=True,
|
| 250 |
+
help="Upload multiple IIS log files for comparison"
|
| 251 |
+
)
|
| 252 |
+
|
| 253 |
+
# Analysis options
|
| 254 |
+
st.sidebar.header("Analysis Options")
|
| 255 |
+
show_top_n = st.sidebar.slider("Top N Methods", 3, 20, 5)
|
| 256 |
+
slow_threshold = st.sidebar.number_input(
|
| 257 |
+
"Slow Request Threshold (ms)",
|
| 258 |
+
min_value=100,
|
| 259 |
+
max_value=10000,
|
| 260 |
+
value=3000,
|
| 261 |
+
step=100
|
| 262 |
+
)
|
| 263 |
+
|
| 264 |
+
# Process files
|
| 265 |
+
if uploaded_files:
|
| 266 |
+
st.info(f"Processing {len(uploaded_files)} file(s)...")
|
| 267 |
+
|
| 268 |
+
# Save uploaded files to temp directory
|
| 269 |
+
temp_files = []
|
| 270 |
+
for uploaded_file in uploaded_files:
|
| 271 |
+
with tempfile.NamedTemporaryFile(delete=False, suffix=".log") as tmp:
|
| 272 |
+
tmp.write(uploaded_file.getvalue())
|
| 273 |
+
temp_files.append(tmp.name)
|
| 274 |
+
|
| 275 |
+
start_time = time.time()
|
| 276 |
+
|
| 277 |
+
# Process each file
|
| 278 |
+
all_stats = []
|
| 279 |
+
for i, temp_file in enumerate(temp_files):
|
| 280 |
+
file_name = uploaded_files[i].name
|
| 281 |
+
st.subheader(f"📄 {file_name}")
|
| 282 |
+
|
| 283 |
+
stats = process_log_file(temp_file, None)
|
| 284 |
+
if stats:
|
| 285 |
+
all_stats.append(stats)
|
| 286 |
+
|
| 287 |
+
# Display summary metrics
|
| 288 |
+
col1, col2, col3, col4 = st.columns(4)
|
| 289 |
+
with col1:
|
| 290 |
+
st.metric(
|
| 291 |
+
"Total Requests",
|
| 292 |
+
format_number(stats["summary"]["total_requests_after"])
|
| 293 |
+
)
|
| 294 |
+
with col2:
|
| 295 |
+
st.metric(
|
| 296 |
+
"Errors",
|
| 297 |
+
format_number(stats["summary"]["errors"]),
|
| 298 |
+
delta=None,
|
| 299 |
+
delta_color="inverse"
|
| 300 |
+
)
|
| 301 |
+
with col3:
|
| 302 |
+
st.metric(
|
| 303 |
+
"Avg Time (ms)",
|
| 304 |
+
format_number(stats["summary"]["avg_time_ms"])
|
| 305 |
+
)
|
| 306 |
+
with col4:
|
| 307 |
+
st.metric(
|
| 308 |
+
"Peak RPS",
|
| 309 |
+
format_number(stats["summary"]["peak_rps"])
|
| 310 |
+
)
|
| 311 |
+
|
| 312 |
+
# Tabs for detailed analysis
|
| 313 |
+
tab1, tab2, tab3, tab4, tab5 = st.tabs([
|
| 314 |
+
"Summary", "Top Methods", "Response Time", "Error Breakdown", "Errors by Method"
|
| 315 |
+
])
|
| 316 |
+
|
| 317 |
+
with tab1:
|
| 318 |
+
st.dataframe(
|
| 319 |
+
create_summary_table(stats["summary"]),
|
| 320 |
+
hide_index=True,
|
| 321 |
+
use_container_width=True
|
| 322 |
+
)
|
| 323 |
+
|
| 324 |
+
with tab2:
|
| 325 |
+
if stats["top_methods"]:
|
| 326 |
+
st.plotly_chart(
|
| 327 |
+
create_top_methods_chart(
|
| 328 |
+
stats["top_methods"][:show_top_n],
|
| 329 |
+
f"Top {show_top_n} Methods - {stats['summary']['service_name']}"
|
| 330 |
+
),
|
| 331 |
+
use_container_width=True
|
| 332 |
+
)
|
| 333 |
+
|
| 334 |
+
# Show table
|
| 335 |
+
methods_df = pd.DataFrame(stats["top_methods"][:show_top_n])
|
| 336 |
+
methods_df["avg_time"] = methods_df["avg_time"].round(1)
|
| 337 |
+
st.dataframe(methods_df, hide_index=True, use_container_width=True)
|
| 338 |
+
else:
|
| 339 |
+
st.info("No method data available")
|
| 340 |
+
|
| 341 |
+
with tab3:
|
| 342 |
+
if stats["response_time_dist"]:
|
| 343 |
+
st.plotly_chart(
|
| 344 |
+
create_response_time_chart(
|
| 345 |
+
stats["response_time_dist"],
|
| 346 |
+
f"Response Time Distribution - {stats['summary']['service_name']}"
|
| 347 |
+
),
|
| 348 |
+
use_container_width=True
|
| 349 |
+
)
|
| 350 |
+
else:
|
| 351 |
+
st.info("No response time distribution data")
|
| 352 |
+
|
| 353 |
+
with tab4:
|
| 354 |
+
if stats["error_breakdown"]:
|
| 355 |
+
error_df = pd.DataFrame(stats["error_breakdown"])
|
| 356 |
+
error_df.columns = ["Status Code", "Count"]
|
| 357 |
+
st.dataframe(error_df, hide_index=True, use_container_width=True)
|
| 358 |
+
|
| 359 |
+
# Pie chart
|
| 360 |
+
fig = px.pie(
|
| 361 |
+
error_df,
|
| 362 |
+
values="Count",
|
| 363 |
+
names="Status Code",
|
| 364 |
+
title=f"Error Distribution - {stats['summary']['service_name']}"
|
| 365 |
+
)
|
| 366 |
+
st.plotly_chart(fig, use_container_width=True)
|
| 367 |
+
else:
|
| 368 |
+
st.success("No errors found! ✓")
|
| 369 |
+
|
| 370 |
+
with tab5:
|
| 371 |
+
st.markdown("### 🔍 Errors by Method")
|
| 372 |
+
st.markdown("This view shows which specific methods are causing errors, with full context for debugging.")
|
| 373 |
+
|
| 374 |
+
if stats["errors_by_method"]:
|
| 375 |
+
# Display summary table
|
| 376 |
+
errors_method_df = pd.DataFrame(stats["errors_by_method"])
|
| 377 |
+
errors_method_df["error_rate_percent"] = errors_method_df["error_rate_percent"].round(2)
|
| 378 |
+
errors_method_df["avg_response_time_ms"] = errors_method_df["avg_response_time_ms"].round(1)
|
| 379 |
+
|
| 380 |
+
# Rename columns for better display
|
| 381 |
+
errors_method_df.columns = [
|
| 382 |
+
"Method Path", "Total Calls", "Error Count",
|
| 383 |
+
"Most Common Error", "Avg Response Time (ms)", "Error Rate (%)"
|
| 384 |
+
]
|
| 385 |
+
|
| 386 |
+
st.dataframe(errors_method_df, hide_index=True, use_container_width=True)
|
| 387 |
+
|
| 388 |
+
# Bar chart of top error-prone methods
|
| 389 |
+
fig = go.Figure()
|
| 390 |
+
fig.add_trace(go.Bar(
|
| 391 |
+
x=errors_method_df["Method Path"],
|
| 392 |
+
y=errors_method_df["Error Count"],
|
| 393 |
+
marker_color='red',
|
| 394 |
+
text=errors_method_df["Error Count"],
|
| 395 |
+
textposition='auto',
|
| 396 |
+
name="Error Count"
|
| 397 |
+
))
|
| 398 |
+
|
| 399 |
+
fig.update_layout(
|
| 400 |
+
title=f"Top Error-Prone Methods - {stats['summary']['service_name']}",
|
| 401 |
+
xaxis_title="Method Path",
|
| 402 |
+
yaxis_title="Error Count",
|
| 403 |
+
height=400,
|
| 404 |
+
showlegend=False
|
| 405 |
+
)
|
| 406 |
+
st.plotly_chart(fig, use_container_width=True)
|
| 407 |
+
|
| 408 |
+
# Allow users to drill down into specific methods
|
| 409 |
+
st.markdown("#### 🔎 Detailed Error Logs")
|
| 410 |
+
selected_method = st.selectbox(
|
| 411 |
+
"Select a method to view detailed error logs:",
|
| 412 |
+
options=["All"] + errors_method_df["Method Path"].tolist(),
|
| 413 |
+
key=f"method_select_{file_name}"
|
| 414 |
+
)
|
| 415 |
+
|
| 416 |
+
if selected_method and selected_method != "All":
|
| 417 |
+
error_details = stats["analyzer"].get_error_details(
|
| 418 |
+
method_path=selected_method,
|
| 419 |
+
limit=50
|
| 420 |
+
)
|
| 421 |
+
if error_details:
|
| 422 |
+
details_df = pd.DataFrame(error_details)
|
| 423 |
+
st.dataframe(details_df, hide_index=True, use_container_width=True)
|
| 424 |
+
st.info(f"Showing up to 50 most recent errors for {selected_method}")
|
| 425 |
+
else:
|
| 426 |
+
st.info(f"No error details found for {selected_method}")
|
| 427 |
+
elif selected_method == "All":
|
| 428 |
+
error_details = stats["analyzer"].get_error_details(limit=50)
|
| 429 |
+
if error_details:
|
| 430 |
+
details_df = pd.DataFrame(error_details)
|
| 431 |
+
st.dataframe(details_df, hide_index=True, use_container_width=True)
|
| 432 |
+
st.info("Showing up to 50 most recent errors across all methods")
|
| 433 |
+
else:
|
| 434 |
+
st.success("No errors found in any methods! ✓")
|
| 435 |
+
|
| 436 |
+
st.divider()
|
| 437 |
+
|
| 438 |
+
# Multi-file comparison
|
| 439 |
+
if len(all_stats) > 1:
|
| 440 |
+
st.header("📊 Service Comparison")
|
| 441 |
+
st.plotly_chart(
|
| 442 |
+
create_metrics_comparison(all_stats),
|
| 443 |
+
use_container_width=True
|
| 444 |
+
)
|
| 445 |
+
|
| 446 |
+
# Combined summary
|
| 447 |
+
st.subheader("Combined Statistics")
|
| 448 |
+
combined = {
|
| 449 |
+
"total_requests_before": sum(s["summary"]["total_requests_before"] for s in all_stats),
|
| 450 |
+
"excluded_requests": sum(s["summary"]["excluded_requests"] for s in all_stats),
|
| 451 |
+
"total_requests_after": sum(s["summary"]["total_requests_after"] for s in all_stats),
|
| 452 |
+
"errors": sum(s["summary"]["errors"] for s in all_stats),
|
| 453 |
+
"slow_requests": sum(s["summary"]["slow_requests"] for s in all_stats),
|
| 454 |
+
}
|
| 455 |
+
|
| 456 |
+
col1, col2, col3 = st.columns(3)
|
| 457 |
+
with col1:
|
| 458 |
+
st.metric("Total Requests (All Services)", format_number(combined["total_requests_after"]))
|
| 459 |
+
with col2:
|
| 460 |
+
st.metric("Total Errors (All Services)", format_number(combined["errors"]))
|
| 461 |
+
with col3:
|
| 462 |
+
st.metric("Total Slow Requests (All Services)", format_number(combined["slow_requests"]))
|
| 463 |
+
|
| 464 |
+
processing_time = time.time() - start_time
|
| 465 |
+
st.success(f"✓ Analysis completed in {processing_time:.2f} seconds")
|
| 466 |
+
|
| 467 |
+
# Clean up temp files
|
| 468 |
+
for temp_file in temp_files:
|
| 469 |
+
Path(temp_file).unlink(missing_ok=True)
|
| 470 |
+
|
| 471 |
+
else:
|
| 472 |
+
# Welcome screen
|
| 473 |
+
st.info("👆 Upload one or more IIS log files to begin analysis")
|
| 474 |
+
|
| 475 |
+
st.markdown("""
|
| 476 |
+
### Features
|
| 477 |
+
- ⚡ **Fast processing** of large files (200MB-1GB+) using Polars
|
| 478 |
+
- 📊 **Comprehensive metrics**: RPS, response times, error rates
|
| 479 |
+
- 🔍 **Detailed analysis**: Top methods, error breakdown, time distribution
|
| 480 |
+
- 📈 **Visual reports**: Interactive charts with Plotly
|
| 481 |
+
- 🔄 **Multi-file support**: Compare multiple services side-by-side
|
| 482 |
+
|
| 483 |
+
### Log Format
|
| 484 |
+
This tool supports **IIS W3C Extended Log Format** with the following fields:
|
| 485 |
+
```
|
| 486 |
+
date time s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username
|
| 487 |
+
c-ip cs(User-Agent) cs(Referer) sc-status sc-substatus sc-win32-status time-taken
|
| 488 |
+
```
|
| 489 |
+
|
| 490 |
+
### Filtering Rules
|
| 491 |
+
- Excludes lines with both `HEAD` method and `Zabbix` in User-Agent
|
| 492 |
+
- 401 Unauthorized responses are excluded from error counts
|
| 493 |
+
- Errors are defined as status codes ≠ 200 and ≠ 401
|
| 494 |
+
- Slow requests are those with response time > 3000ms (configurable)
|
| 495 |
+
""")
|
| 496 |
+
|
| 497 |
+
|
| 498 |
+
if __name__ == "__main__":
|
| 499 |
+
main()
|
log_parser.py
ADDED
|
@@ -0,0 +1,419 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
IIS Log Parser using Polars for high-performance processing.
|
| 3 |
+
Handles large log files (200MB-1GB+) efficiently with streaming.
|
| 4 |
+
"""
|
| 5 |
+
|
| 6 |
+
import polars as pl
|
| 7 |
+
from pathlib import Path
|
| 8 |
+
from typing import Dict, List, Tuple, Optional
|
| 9 |
+
from datetime import datetime
|
| 10 |
+
import re
|
| 11 |
+
|
| 12 |
+
|
| 13 |
+
class IISLogParser:
|
| 14 |
+
"""Parser for IIS W3C Extended Log Format."""
|
| 15 |
+
|
| 16 |
+
# IIS log column names
|
| 17 |
+
COLUMNS = [
|
| 18 |
+
"date", "time", "s_ip", "cs_method", "cs_uri_stem", "cs_uri_query",
|
| 19 |
+
"s_port", "cs_username", "c_ip", "cs_user_agent", "cs_referer",
|
| 20 |
+
"sc_status", "sc_substatus", "sc_win32_status", "time_taken"
|
| 21 |
+
]
|
| 22 |
+
|
| 23 |
+
def __init__(self, file_path: str):
|
| 24 |
+
self.file_path = Path(file_path)
|
| 25 |
+
self.service_name = None # Will be determined from URI paths during parsing
|
| 26 |
+
|
| 27 |
+
def parse(self, chunk_size: Optional[int] = None) -> pl.DataFrame:
|
| 28 |
+
"""
|
| 29 |
+
Parse IIS log file.
|
| 30 |
+
|
| 31 |
+
Args:
|
| 32 |
+
chunk_size: If provided, process in chunks (for very large files)
|
| 33 |
+
|
| 34 |
+
Returns:
|
| 35 |
+
Polars DataFrame with parsed log data
|
| 36 |
+
"""
|
| 37 |
+
# Read file, skip comment lines
|
| 38 |
+
with open(self.file_path, 'r', encoding='utf-8', errors='ignore') as f:
|
| 39 |
+
lines = []
|
| 40 |
+
for line in f:
|
| 41 |
+
# Skip header/comment lines starting with #
|
| 42 |
+
if not line.startswith('#'):
|
| 43 |
+
lines.append(line.strip())
|
| 44 |
+
|
| 45 |
+
# Create DataFrame from lines
|
| 46 |
+
if not lines:
|
| 47 |
+
return pl.DataFrame()
|
| 48 |
+
|
| 49 |
+
# Split each line by space and create DataFrame
|
| 50 |
+
data = [line.split() for line in lines if line]
|
| 51 |
+
|
| 52 |
+
# Filter out lines that don't have correct number of columns
|
| 53 |
+
data = [row for row in data if len(row) == len(self.COLUMNS)]
|
| 54 |
+
|
| 55 |
+
if not data:
|
| 56 |
+
return pl.DataFrame()
|
| 57 |
+
|
| 58 |
+
df = pl.DataFrame(data, schema=self.COLUMNS, orient="row")
|
| 59 |
+
|
| 60 |
+
# Convert data types
|
| 61 |
+
df = df.with_columns([
|
| 62 |
+
pl.col("date").cast(pl.Utf8),
|
| 63 |
+
pl.col("time").cast(pl.Utf8),
|
| 64 |
+
pl.col("sc_status").cast(pl.Int32),
|
| 65 |
+
pl.col("sc_substatus").cast(pl.Int32),
|
| 66 |
+
pl.col("sc_win32_status").cast(pl.Int32),
|
| 67 |
+
pl.col("time_taken").cast(pl.Int32),
|
| 68 |
+
])
|
| 69 |
+
|
| 70 |
+
# Create timestamp column
|
| 71 |
+
df = df.with_columns([
|
| 72 |
+
(pl.col("date") + " " + pl.col("time")).alias("timestamp")
|
| 73 |
+
])
|
| 74 |
+
|
| 75 |
+
# Convert timestamp to datetime
|
| 76 |
+
df = df.with_columns([
|
| 77 |
+
pl.col("timestamp").str.strptime(pl.Datetime, format="%Y-%m-%d %H:%M:%S")
|
| 78 |
+
])
|
| 79 |
+
|
| 80 |
+
# Extract service name and method name from URI
|
| 81 |
+
df = df.with_columns([
|
| 82 |
+
self._extract_service_name().alias("service_name"),
|
| 83 |
+
self._extract_method_name().alias("method_name"),
|
| 84 |
+
self._extract_full_method_path().alias("full_method_path")
|
| 85 |
+
])
|
| 86 |
+
|
| 87 |
+
# Determine the primary service name for this log file
|
| 88 |
+
if df.height > 0:
|
| 89 |
+
# Get the most common service name
|
| 90 |
+
service_counts = df.group_by("service_name").agg([
|
| 91 |
+
pl.count().alias("count")
|
| 92 |
+
]).sort("count", descending=True)
|
| 93 |
+
|
| 94 |
+
if service_counts.height > 0:
|
| 95 |
+
self.service_name = service_counts.row(0, named=True)["service_name"]
|
| 96 |
+
else:
|
| 97 |
+
self.service_name = "Unknown"
|
| 98 |
+
else:
|
| 99 |
+
self.service_name = "Unknown"
|
| 100 |
+
|
| 101 |
+
return df
|
| 102 |
+
|
| 103 |
+
def _extract_service_name(self) -> pl.Expr:
|
| 104 |
+
"""Extract service name from URI stem (e.g., AdministratorOfficeService, CustomerOfficeService)."""
|
| 105 |
+
# Extract the first meaningful part after the leading slash
|
| 106 |
+
# Example: /AdministratorOfficeService/Contact/Get -> AdministratorOfficeService
|
| 107 |
+
return (
|
| 108 |
+
pl.col("cs_uri_stem")
|
| 109 |
+
.str.split("/")
|
| 110 |
+
.list.get(1) # Get first element after leading /
|
| 111 |
+
.fill_null("Unknown")
|
| 112 |
+
)
|
| 113 |
+
|
| 114 |
+
def _extract_full_method_path(self) -> pl.Expr:
|
| 115 |
+
"""Extract full method path for better error tracking (e.g., Contact/Get, Order/Create)."""
|
| 116 |
+
# Extract everything after the service name
|
| 117 |
+
# Example: /AdministratorOfficeService/Contact/Get -> Contact/Get
|
| 118 |
+
return (
|
| 119 |
+
pl.col("cs_uri_stem")
|
| 120 |
+
.str.split("/")
|
| 121 |
+
.list.slice(2) # Skip leading / and service name
|
| 122 |
+
.list.join("/")
|
| 123 |
+
.fill_null("Unknown")
|
| 124 |
+
)
|
| 125 |
+
|
| 126 |
+
def _extract_method_name(self) -> pl.Expr:
|
| 127 |
+
"""Extract method name from URI stem."""
|
| 128 |
+
# Extract last part of URI path (e.g., /Service/Contact/Get -> Get)
|
| 129 |
+
return pl.col("cs_uri_stem").str.split("/").list.last().fill_null("Unknown")
|
| 130 |
+
|
| 131 |
+
|
| 132 |
+
class LogAnalyzer:
|
| 133 |
+
"""Analyze parsed IIS logs and generate performance metrics."""
|
| 134 |
+
|
| 135 |
+
def __init__(self, df: pl.DataFrame, service_name: str = "Unknown"):
|
| 136 |
+
self.df = df
|
| 137 |
+
self.service_name = service_name
|
| 138 |
+
self._filtered_df = None
|
| 139 |
+
|
| 140 |
+
def filter_logs(self) -> pl.DataFrame:
|
| 141 |
+
"""
|
| 142 |
+
Apply filtering rules:
|
| 143 |
+
1. Exclude lines with both HEAD and Zabbix
|
| 144 |
+
2. Exclude 401 status codes (for error counting)
|
| 145 |
+
|
| 146 |
+
Returns:
|
| 147 |
+
Filtered DataFrame
|
| 148 |
+
"""
|
| 149 |
+
if self._filtered_df is not None:
|
| 150 |
+
return self._filtered_df
|
| 151 |
+
|
| 152 |
+
# Filter out HEAD + Zabbix
|
| 153 |
+
filtered = self.df.filter(
|
| 154 |
+
~(
|
| 155 |
+
(pl.col("cs_method") == "HEAD") &
|
| 156 |
+
(
|
| 157 |
+
pl.col("cs_user_agent").str.contains("Zabbix") |
|
| 158 |
+
pl.col("cs_uri_stem").str.contains("Zabbix")
|
| 159 |
+
)
|
| 160 |
+
)
|
| 161 |
+
)
|
| 162 |
+
|
| 163 |
+
self._filtered_df = filtered
|
| 164 |
+
return filtered
|
| 165 |
+
|
| 166 |
+
def get_summary_stats(self) -> Dict:
|
| 167 |
+
"""Get overall summary statistics."""
|
| 168 |
+
df = self.filter_logs()
|
| 169 |
+
|
| 170 |
+
# Count requests
|
| 171 |
+
total_before = self.df.height
|
| 172 |
+
total_after = df.height
|
| 173 |
+
excluded = total_before - total_after
|
| 174 |
+
|
| 175 |
+
# Count 401s separately
|
| 176 |
+
count_401 = self.df.filter(pl.col("sc_status") == 401).height
|
| 177 |
+
|
| 178 |
+
# Count errors (status != 200 and != 401)
|
| 179 |
+
errors = df.filter(
|
| 180 |
+
(pl.col("sc_status") != 200) & (pl.col("sc_status") != 401)
|
| 181 |
+
).height
|
| 182 |
+
|
| 183 |
+
# Count slow requests (>3000ms)
|
| 184 |
+
slow_requests = df.filter(pl.col("time_taken") > 3000).height
|
| 185 |
+
|
| 186 |
+
# Response time statistics
|
| 187 |
+
time_stats = df.select([
|
| 188 |
+
pl.col("time_taken").min().alias("min_time"),
|
| 189 |
+
pl.col("time_taken").max().alias("max_time"),
|
| 190 |
+
pl.col("time_taken").mean().alias("avg_time"),
|
| 191 |
+
]).to_dicts()[0]
|
| 192 |
+
|
| 193 |
+
# Peak RPS
|
| 194 |
+
rps_data = self._calculate_peak_rps(df)
|
| 195 |
+
|
| 196 |
+
return {
|
| 197 |
+
"service_name": self.service_name,
|
| 198 |
+
"total_requests_before": total_before,
|
| 199 |
+
"excluded_requests": excluded,
|
| 200 |
+
"excluded_401": count_401,
|
| 201 |
+
"total_requests_after": total_after,
|
| 202 |
+
"errors": errors,
|
| 203 |
+
"slow_requests": slow_requests,
|
| 204 |
+
"min_time_ms": int(time_stats["min_time"]) if time_stats["min_time"] else 0,
|
| 205 |
+
"max_time_ms": int(time_stats["max_time"]) if time_stats["max_time"] else 0,
|
| 206 |
+
"avg_time_ms": int(time_stats["avg_time"]) if time_stats["avg_time"] else 0,
|
| 207 |
+
"peak_rps": rps_data["peak_rps"],
|
| 208 |
+
"peak_timestamp": rps_data["peak_timestamp"],
|
| 209 |
+
}
|
| 210 |
+
|
| 211 |
+
def _calculate_peak_rps(self, df: pl.DataFrame) -> Dict:
|
| 212 |
+
"""Calculate peak requests per second."""
|
| 213 |
+
if df.height == 0:
|
| 214 |
+
return {"peak_rps": 0, "peak_timestamp": None}
|
| 215 |
+
|
| 216 |
+
# Group by second and count requests
|
| 217 |
+
rps = df.group_by("timestamp").agg([
|
| 218 |
+
pl.count().alias("count")
|
| 219 |
+
]).sort("count", descending=True)
|
| 220 |
+
|
| 221 |
+
if rps.height == 0:
|
| 222 |
+
return {"peak_rps": 0, "peak_timestamp": None}
|
| 223 |
+
|
| 224 |
+
peak_row = rps.row(0, named=True)
|
| 225 |
+
|
| 226 |
+
return {
|
| 227 |
+
"peak_rps": peak_row["count"],
|
| 228 |
+
"peak_timestamp": str(peak_row["timestamp"])
|
| 229 |
+
}
|
| 230 |
+
|
| 231 |
+
def get_top_methods(self, n: int = 5) -> List[Dict]:
|
| 232 |
+
"""Get top N methods by request count."""
|
| 233 |
+
df = self.filter_logs()
|
| 234 |
+
|
| 235 |
+
if df.height == 0:
|
| 236 |
+
return []
|
| 237 |
+
|
| 238 |
+
# Group by method name
|
| 239 |
+
method_stats = df.group_by("method_name").agg([
|
| 240 |
+
pl.count().alias("count"),
|
| 241 |
+
pl.col("time_taken").mean().alias("avg_time"),
|
| 242 |
+
pl.col("sc_status").filter(
|
| 243 |
+
(pl.col("sc_status") != 200) & (pl.col("sc_status") != 401)
|
| 244 |
+
).count().alias("errors")
|
| 245 |
+
]).sort("count", descending=True).limit(n)
|
| 246 |
+
|
| 247 |
+
return method_stats.to_dicts()
|
| 248 |
+
|
| 249 |
+
def get_error_breakdown(self) -> List[Dict]:
|
| 250 |
+
"""Get breakdown of errors by status code."""
|
| 251 |
+
df = self.filter_logs()
|
| 252 |
+
|
| 253 |
+
errors = df.filter(
|
| 254 |
+
(pl.col("sc_status") != 200) & (pl.col("sc_status") != 401)
|
| 255 |
+
)
|
| 256 |
+
|
| 257 |
+
if errors.height == 0:
|
| 258 |
+
return []
|
| 259 |
+
|
| 260 |
+
error_stats = errors.group_by("sc_status").agg([
|
| 261 |
+
pl.count().alias("count")
|
| 262 |
+
]).sort("count", descending=True)
|
| 263 |
+
|
| 264 |
+
return error_stats.to_dicts()
|
| 265 |
+
|
| 266 |
+
def get_errors_by_method(self, n: int = 10) -> List[Dict]:
|
| 267 |
+
"""
|
| 268 |
+
Get detailed error breakdown by method with full context.
|
| 269 |
+
Shows which methods are causing the most errors.
|
| 270 |
+
|
| 271 |
+
Args:
|
| 272 |
+
n: Number of top error-prone methods to return
|
| 273 |
+
|
| 274 |
+
Returns:
|
| 275 |
+
List of dicts with method, error count, total calls, and error rate
|
| 276 |
+
"""
|
| 277 |
+
df = self.filter_logs()
|
| 278 |
+
|
| 279 |
+
if df.height == 0:
|
| 280 |
+
return []
|
| 281 |
+
|
| 282 |
+
# Get error counts and total counts per full method path
|
| 283 |
+
method_errors = df.group_by("full_method_path").agg([
|
| 284 |
+
pl.count().alias("total_calls"),
|
| 285 |
+
pl.col("sc_status").filter(
|
| 286 |
+
(pl.col("sc_status") != 200) & (pl.col("sc_status") != 401)
|
| 287 |
+
).count().alias("error_count"),
|
| 288 |
+
pl.col("sc_status").filter(
|
| 289 |
+
(pl.col("sc_status") != 200) & (pl.col("sc_status") != 401)
|
| 290 |
+
).first().alias("most_common_error_status"),
|
| 291 |
+
pl.col("time_taken").mean().alias("avg_response_time_ms"),
|
| 292 |
+
]).filter(
|
| 293 |
+
pl.col("error_count") > 0
|
| 294 |
+
).with_columns([
|
| 295 |
+
(pl.col("error_count") * 100.0 / pl.col("total_calls")).alias("error_rate_percent")
|
| 296 |
+
]).sort("error_count", descending=True).limit(n)
|
| 297 |
+
|
| 298 |
+
return method_errors.to_dicts()
|
| 299 |
+
|
| 300 |
+
def get_error_details(self, method_path: str = None, limit: int = 100) -> List[Dict]:
|
| 301 |
+
"""
|
| 302 |
+
Get detailed error logs with full context for debugging.
|
| 303 |
+
|
| 304 |
+
Args:
|
| 305 |
+
method_path: Optional filter for specific method path
|
| 306 |
+
limit: Maximum number of error records to return
|
| 307 |
+
|
| 308 |
+
Returns:
|
| 309 |
+
List of error records with timestamp, method, status, response time, etc.
|
| 310 |
+
"""
|
| 311 |
+
df = self.filter_logs()
|
| 312 |
+
|
| 313 |
+
# Filter for errors only
|
| 314 |
+
errors = df.filter(
|
| 315 |
+
(pl.col("sc_status") != 200) & (pl.col("sc_status") != 401)
|
| 316 |
+
)
|
| 317 |
+
|
| 318 |
+
# Apply method filter if specified
|
| 319 |
+
if method_path:
|
| 320 |
+
errors = errors.filter(pl.col("full_method_path") == method_path)
|
| 321 |
+
|
| 322 |
+
if errors.height == 0:
|
| 323 |
+
return []
|
| 324 |
+
|
| 325 |
+
# Select relevant columns for debugging
|
| 326 |
+
error_details = errors.select([
|
| 327 |
+
"timestamp",
|
| 328 |
+
"service_name",
|
| 329 |
+
"full_method_path",
|
| 330 |
+
"method_name",
|
| 331 |
+
"sc_status",
|
| 332 |
+
"sc_substatus",
|
| 333 |
+
"sc_win32_status",
|
| 334 |
+
"time_taken",
|
| 335 |
+
"c_ip",
|
| 336 |
+
"cs_uri_query"
|
| 337 |
+
]).sort("timestamp", descending=True).limit(limit)
|
| 338 |
+
|
| 339 |
+
return error_details.to_dicts()
|
| 340 |
+
|
| 341 |
+
def get_response_time_distribution(self, buckets: List[int] = None) -> Dict:
|
| 342 |
+
"""Get response time distribution by buckets."""
|
| 343 |
+
if buckets is None:
|
| 344 |
+
buckets = [0, 50, 100, 200, 500, 1000, 3000, 10000]
|
| 345 |
+
|
| 346 |
+
df = self.filter_logs()
|
| 347 |
+
|
| 348 |
+
if df.height == 0:
|
| 349 |
+
return {}
|
| 350 |
+
|
| 351 |
+
distribution = {}
|
| 352 |
+
for i in range(len(buckets) - 1):
|
| 353 |
+
lower = buckets[i]
|
| 354 |
+
upper = buckets[i + 1]
|
| 355 |
+
count = df.filter(
|
| 356 |
+
(pl.col("time_taken") >= lower) & (pl.col("time_taken") < upper)
|
| 357 |
+
).height
|
| 358 |
+
distribution[f"{lower}-{upper}ms"] = count
|
| 359 |
+
|
| 360 |
+
# Add bucket for values above last threshold
|
| 361 |
+
count = df.filter(pl.col("time_taken") >= buckets[-1]).height
|
| 362 |
+
distribution[f">{buckets[-1]}ms"] = count
|
| 363 |
+
|
| 364 |
+
return distribution
|
| 365 |
+
|
| 366 |
+
def get_rps_timeline(self, interval: str = "1m") -> pl.DataFrame:
|
| 367 |
+
"""Get RPS over time with specified interval."""
|
| 368 |
+
df = self.filter_logs()
|
| 369 |
+
|
| 370 |
+
if df.height == 0:
|
| 371 |
+
return pl.DataFrame()
|
| 372 |
+
|
| 373 |
+
# Group by time interval
|
| 374 |
+
timeline = df.group_by_dynamic("timestamp", every=interval).agg([
|
| 375 |
+
pl.count().alias("requests")
|
| 376 |
+
]).sort("timestamp")
|
| 377 |
+
|
| 378 |
+
return timeline
|
| 379 |
+
|
| 380 |
+
|
| 381 |
+
def analyze_multiple_logs(log_files: List[str]) -> Tuple[Dict, List[Dict]]:
|
| 382 |
+
"""
|
| 383 |
+
Analyze multiple log files and generate combined report.
|
| 384 |
+
|
| 385 |
+
Args:
|
| 386 |
+
log_files: List of log file paths
|
| 387 |
+
|
| 388 |
+
Returns:
|
| 389 |
+
Tuple of (combined_stats, individual_stats)
|
| 390 |
+
"""
|
| 391 |
+
individual_stats = []
|
| 392 |
+
|
| 393 |
+
for log_file in log_files:
|
| 394 |
+
parser = IISLogParser(log_file)
|
| 395 |
+
df = parser.parse()
|
| 396 |
+
analyzer = LogAnalyzer(df, parser.service_name)
|
| 397 |
+
|
| 398 |
+
stats = {
|
| 399 |
+
"summary": analyzer.get_summary_stats(),
|
| 400 |
+
"top_methods": analyzer.get_top_methods(),
|
| 401 |
+
"error_breakdown": analyzer.get_error_breakdown(),
|
| 402 |
+
"errors_by_method": analyzer.get_errors_by_method(n=10),
|
| 403 |
+
"response_time_dist": analyzer.get_response_time_distribution(),
|
| 404 |
+
"analyzer": analyzer,
|
| 405 |
+
}
|
| 406 |
+
|
| 407 |
+
individual_stats.append(stats)
|
| 408 |
+
|
| 409 |
+
# Calculate combined statistics
|
| 410 |
+
combined = {
|
| 411 |
+
"total_requests_before": sum(s["summary"]["total_requests_before"] for s in individual_stats),
|
| 412 |
+
"excluded_requests": sum(s["summary"]["excluded_requests"] for s in individual_stats),
|
| 413 |
+
"excluded_401": sum(s["summary"]["excluded_401"] for s in individual_stats),
|
| 414 |
+
"total_requests_after": sum(s["summary"]["total_requests_after"] for s in individual_stats),
|
| 415 |
+
"errors": sum(s["summary"]["errors"] for s in individual_stats),
|
| 416 |
+
"slow_requests": sum(s["summary"]["slow_requests"] for s in individual_stats),
|
| 417 |
+
}
|
| 418 |
+
|
| 419 |
+
return combined, individual_stats
|
requirements.txt
ADDED
|
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Core dependencies
|
| 2 |
+
streamlit>=1.28.0
|
| 3 |
+
polars>=0.19.0
|
| 4 |
+
plotly>=5.17.0
|
| 5 |
+
pandas>=2.0.0
|
| 6 |
+
|
| 7 |
+
# Optional performance improvements
|
| 8 |
+
pyarrow>=13.0.0
|
run.sh
ADDED
|
@@ -0,0 +1,20 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/bin/bash
|
| 2 |
+
# Launch script for IIS Log Analyzer
|
| 3 |
+
|
| 4 |
+
echo "🚀 Starting IIS Log Analyzer..."
|
| 5 |
+
echo ""
|
| 6 |
+
|
| 7 |
+
# Check if dependencies are installed
|
| 8 |
+
if ! python -c "import streamlit" 2>/dev/null; then
|
| 9 |
+
echo "📦 Installing dependencies..."
|
| 10 |
+
pip install -r requirements.txt
|
| 11 |
+
echo ""
|
| 12 |
+
fi
|
| 13 |
+
|
| 14 |
+
# Launch Streamlit app
|
| 15 |
+
echo "✓ Launching web application..."
|
| 16 |
+
echo " URL: http://localhost:8501"
|
| 17 |
+
echo " Press Ctrl+C to stop"
|
| 18 |
+
echo ""
|
| 19 |
+
|
| 20 |
+
streamlit run app.py --server.maxUploadSize=1024
|
test_parser.py
ADDED
|
@@ -0,0 +1,118 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Test script for the IIS log parser
|
| 3 |
+
"""
|
| 4 |
+
|
| 5 |
+
from log_parser import IISLogParser, LogAnalyzer
|
| 6 |
+
import time
|
| 7 |
+
|
| 8 |
+
def test_log_file(file_path: str):
|
| 9 |
+
"""Test parsing a single log file."""
|
| 10 |
+
print(f"\n{'='*80}")
|
| 11 |
+
print(f"Testing: {file_path}")
|
| 12 |
+
print(f"{'='*80}")
|
| 13 |
+
|
| 14 |
+
start_time = time.time()
|
| 15 |
+
|
| 16 |
+
# Parse
|
| 17 |
+
parser = IISLogParser(file_path)
|
| 18 |
+
df = parser.parse()
|
| 19 |
+
parse_time = time.time() - start_time
|
| 20 |
+
print(f"Service Name: {parser.service_name}")
|
| 21 |
+
print(f"✓ Parsed {df.height:,} log entries in {parse_time:.2f}s")
|
| 22 |
+
|
| 23 |
+
# Analyze
|
| 24 |
+
analyzer = LogAnalyzer(df, parser.service_name)
|
| 25 |
+
stats = analyzer.get_summary_stats()
|
| 26 |
+
|
| 27 |
+
analyze_time = time.time() - start_time - parse_time
|
| 28 |
+
print(f"✓ Analyzed in {analyze_time:.2f}s")
|
| 29 |
+
|
| 30 |
+
# Display summary
|
| 31 |
+
print(f"\n📊 Summary Statistics:")
|
| 32 |
+
print(f" Total Requests (before): {stats['total_requests_before']:,}")
|
| 33 |
+
print(f" Excluded Requests: {stats['excluded_requests']:,}")
|
| 34 |
+
print(f" Total Requests (after): {stats['total_requests_after']:,}")
|
| 35 |
+
print(f" Errors (≠200,≠401): {stats['errors']:,}")
|
| 36 |
+
print(f" Slow Requests (>3s): {stats['slow_requests']:,}")
|
| 37 |
+
print(f" Peak RPS: {stats['peak_rps']:,} @ {stats['peak_timestamp']}")
|
| 38 |
+
print(f" Avg Response Time: {stats['avg_time_ms']:,}ms")
|
| 39 |
+
print(f" Max Response Time: {stats['max_time_ms']:,}ms")
|
| 40 |
+
print(f" Min Response Time: {stats['min_time_ms']:,}ms")
|
| 41 |
+
|
| 42 |
+
# Top methods
|
| 43 |
+
print(f"\n🔝 Top 5 Methods:")
|
| 44 |
+
top_methods = analyzer.get_top_methods(5)
|
| 45 |
+
for i, method in enumerate(top_methods, 1):
|
| 46 |
+
print(f" {i}. {method['method_name']}")
|
| 47 |
+
print(f" Count: {method['count']:,} | Avg Time: {method['avg_time']:.1f}ms | Errors: {method['errors']}")
|
| 48 |
+
|
| 49 |
+
# Error breakdown
|
| 50 |
+
errors = analyzer.get_error_breakdown()
|
| 51 |
+
if errors:
|
| 52 |
+
print(f"\n❌ Error Breakdown:")
|
| 53 |
+
for error in errors:
|
| 54 |
+
print(f" Status {error['sc_status']}: {error['count']:,} occurrences")
|
| 55 |
+
else:
|
| 56 |
+
print(f"\n✓ No errors found!")
|
| 57 |
+
|
| 58 |
+
# Errors by method
|
| 59 |
+
errors_by_method = analyzer.get_errors_by_method(5)
|
| 60 |
+
if errors_by_method:
|
| 61 |
+
print(f"\n⚠️ Top 5 Error-Prone Methods:")
|
| 62 |
+
for i, method_error in enumerate(errors_by_method, 1):
|
| 63 |
+
print(f" {i}. {method_error['full_method_path']}")
|
| 64 |
+
print(f" Total Calls: {method_error['total_calls']:,} | Errors: {method_error['error_count']:,} | "
|
| 65 |
+
f"Error Rate: {method_error['error_rate_percent']:.2f}% | "
|
| 66 |
+
f"Most Common Error: {method_error.get('most_common_error_status', 'N/A')}")
|
| 67 |
+
else:
|
| 68 |
+
print(f"\n✓ No method errors found!")
|
| 69 |
+
|
| 70 |
+
# Response time distribution
|
| 71 |
+
dist = analyzer.get_response_time_distribution()
|
| 72 |
+
print(f"\n⏱️ Response Time Distribution:")
|
| 73 |
+
for bucket, count in dist.items():
|
| 74 |
+
print(f" {bucket}: {count:,}")
|
| 75 |
+
|
| 76 |
+
total_time = time.time() - start_time
|
| 77 |
+
print(f"\n⏱️ Total processing time: {total_time:.2f}s")
|
| 78 |
+
|
| 79 |
+
return stats
|
| 80 |
+
|
| 81 |
+
|
| 82 |
+
if __name__ == "__main__":
|
| 83 |
+
import sys
|
| 84 |
+
|
| 85 |
+
# Test with both log files
|
| 86 |
+
files = [
|
| 87 |
+
"administrator_rhr_ex250922.log",
|
| 88 |
+
"customer_rhr_ex250922.log"
|
| 89 |
+
]
|
| 90 |
+
|
| 91 |
+
all_stats = []
|
| 92 |
+
total_start = time.time()
|
| 93 |
+
|
| 94 |
+
for file_path in files:
|
| 95 |
+
try:
|
| 96 |
+
stats = test_log_file(file_path)
|
| 97 |
+
all_stats.append(stats)
|
| 98 |
+
except Exception as e:
|
| 99 |
+
print(f"\n❌ Error processing {file_path}: {e}")
|
| 100 |
+
import traceback
|
| 101 |
+
traceback.print_exc()
|
| 102 |
+
|
| 103 |
+
# Combined summary
|
| 104 |
+
if len(all_stats) > 1:
|
| 105 |
+
print(f"\n{'='*80}")
|
| 106 |
+
print(f"COMBINED STATISTICS")
|
| 107 |
+
print(f"{'='*80}")
|
| 108 |
+
total_requests = sum(s['total_requests_after'] for s in all_stats)
|
| 109 |
+
total_errors = sum(s['errors'] for s in all_stats)
|
| 110 |
+
total_slow = sum(s['slow_requests'] for s in all_stats)
|
| 111 |
+
|
| 112 |
+
print(f"Total Requests (all services): {total_requests:,}")
|
| 113 |
+
print(f"Total Errors (all services): {total_errors:,}")
|
| 114 |
+
print(f"Total Slow Requests (all services): {total_slow:,}")
|
| 115 |
+
|
| 116 |
+
total_elapsed = time.time() - total_start
|
| 117 |
+
print(f"\n⏱️ Total elapsed time: {total_elapsed:.2f}s")
|
| 118 |
+
print(f"\n✓ All tests completed successfully!")
|