Spaces:
Sleeping
Sleeping
Commit ·
ddfcf3e
1
Parent(s): 002262c
Add Hugging Face Spaces deployment configuration
Browse files- Add Dockerfile for Docker-based deployment on HF Spaces
- Add .dockerignore to optimize Docker build
- Update README.md with HF Space metadata (YAML frontmatter)
- Keep README_GITHUB.md for detailed GitHub documentation
- Configure app to run on port 7860 (HF Spaces standard)
Ready for deployment to: https://huggingface.co/spaces/pilotstuki/odatalogparser
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- .dockerignore +42 -0
- Dockerfile +33 -0
- README.md +38 -182
- README_GITHUB.md +208 -0
.dockerignore
ADDED
|
@@ -0,0 +1,42 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Git
|
| 2 |
+
.git
|
| 3 |
+
.gitignore
|
| 4 |
+
|
| 5 |
+
# Python cache
|
| 6 |
+
__pycache__/
|
| 7 |
+
*.py[cod]
|
| 8 |
+
*$py.class
|
| 9 |
+
*.so
|
| 10 |
+
|
| 11 |
+
# Virtual environments
|
| 12 |
+
env/
|
| 13 |
+
venv/
|
| 14 |
+
ENV/
|
| 15 |
+
|
| 16 |
+
# Log files (sample data - users upload their own)
|
| 17 |
+
*.log
|
| 18 |
+
|
| 19 |
+
# PDF files (sample reports)
|
| 20 |
+
*.pdf
|
| 21 |
+
|
| 22 |
+
# IDE
|
| 23 |
+
.vscode/
|
| 24 |
+
.idea/
|
| 25 |
+
*.swp
|
| 26 |
+
*.swo
|
| 27 |
+
*~
|
| 28 |
+
.claude/
|
| 29 |
+
|
| 30 |
+
# OS
|
| 31 |
+
.DS_Store
|
| 32 |
+
Thumbs.db
|
| 33 |
+
|
| 34 |
+
# Test files
|
| 35 |
+
test_*.py
|
| 36 |
+
*_test.py
|
| 37 |
+
|
| 38 |
+
# Scripts
|
| 39 |
+
run.sh
|
| 40 |
+
|
| 41 |
+
# Documentation (GitHub-specific)
|
| 42 |
+
README_GITHUB.md
|
Dockerfile
ADDED
|
@@ -0,0 +1,33 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Dockerfile for Hugging Face Spaces - Streamlit App
|
| 2 |
+
FROM python:3.10-slim
|
| 3 |
+
|
| 4 |
+
# Set working directory
|
| 5 |
+
WORKDIR /app
|
| 6 |
+
|
| 7 |
+
# Install system dependencies
|
| 8 |
+
RUN apt-get update && apt-get install -y \
|
| 9 |
+
build-essential \
|
| 10 |
+
curl \
|
| 11 |
+
&& rm -rf /var/lib/apt/lists/*
|
| 12 |
+
|
| 13 |
+
# Copy requirements file
|
| 14 |
+
COPY requirements.txt .
|
| 15 |
+
|
| 16 |
+
# Install Python dependencies
|
| 17 |
+
RUN pip install --no-cache-dir -r requirements.txt
|
| 18 |
+
|
| 19 |
+
# Copy application files
|
| 20 |
+
COPY app.py .
|
| 21 |
+
COPY log_parser.py .
|
| 22 |
+
|
| 23 |
+
# Expose port 7860 (Hugging Face Spaces default)
|
| 24 |
+
EXPOSE 7860
|
| 25 |
+
|
| 26 |
+
# Set environment variables for Streamlit
|
| 27 |
+
ENV STREAMLIT_SERVER_PORT=7860
|
| 28 |
+
ENV STREAMLIT_SERVER_ADDRESS=0.0.0.0
|
| 29 |
+
ENV STREAMLIT_SERVER_HEADLESS=true
|
| 30 |
+
ENV STREAMLIT_BROWSER_GATHER_USAGE_STATS=false
|
| 31 |
+
|
| 32 |
+
# Run the application
|
| 33 |
+
CMD ["streamlit", "run", "app.py", "--server.port=7860", "--server.address=0.0.0.0"]
|
README.md
CHANGED
|
@@ -1,208 +1,64 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
# IIS Log Performance Analyzer
|
| 2 |
|
| 3 |
High-performance web application for analyzing large IIS log files (200MB-1GB+). Built with Streamlit and Polars for fast, efficient processing.
|
| 4 |
|
| 5 |
**GitHub Repository**: [https://github.com/pilot-stuk/odata_log_parser](https://github.com/pilot-stuk/odata_log_parser)
|
| 6 |
|
| 7 |
-
**Live Demo**: Deploy on [Streamlit Cloud](https://streamlit.io/cloud)
|
| 8 |
-
|
| 9 |
## Features
|
| 10 |
|
| 11 |
-
- **Fast Processing**: Uses Polars library for 10-100x faster parsing compared to pandas
|
| 12 |
-
- **Large File Support**: Efficiently handles files up to 1GB+
|
| 13 |
-
- **Comprehensive Metrics**:
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
- Slow request detection (configurable threshold)
|
| 18 |
-
- Peak RPS (Requests Per Second) with timestamp
|
| 19 |
-
- Top methods by request count and response time
|
| 20 |
-
- **Multi-File Analysis**: Upload and compare multiple log files side-by-side
|
| 21 |
-
- **Interactive Visualizations**: Charts and graphs using Plotly
|
| 22 |
-
- **Smart Filtering**: Automatically excludes monitoring requests (Zabbix HEAD) and 401 unauthorized
|
| 23 |
-
|
| 24 |
-
## Requirements
|
| 25 |
-
|
| 26 |
-
- Python 3.8+
|
| 27 |
-
- See `requirements.txt` for package dependencies
|
| 28 |
-
|
| 29 |
-
## Installation
|
| 30 |
-
|
| 31 |
-
### Local Installation
|
| 32 |
-
|
| 33 |
-
1. Clone the repository:
|
| 34 |
-
```bash
|
| 35 |
-
git clone https://github.com/pilot-stuk/odata_log_parser.git
|
| 36 |
-
cd odata_log_parser
|
| 37 |
-
```
|
| 38 |
-
|
| 39 |
-
2. Install dependencies:
|
| 40 |
-
```bash
|
| 41 |
-
pip install -r requirements.txt
|
| 42 |
-
```
|
| 43 |
-
|
| 44 |
-
### Deploy to Streamlit Cloud
|
| 45 |
-
|
| 46 |
-
1. Fork or clone this repository to your GitHub account
|
| 47 |
-
2. Go to [share.streamlit.io](https://share.streamlit.io/)
|
| 48 |
-
3. Sign in with your GitHub account
|
| 49 |
-
4. Click "New app"
|
| 50 |
-
5. Select your repository: `pilot-stuk/odata_log_parser`
|
| 51 |
-
6. Set the main file path: `app.py`
|
| 52 |
-
7. Click "Deploy"
|
| 53 |
-
|
| 54 |
-
The app will be live at: `https://share.streamlit.io/pilot-stuki/odata_log_parser/main/app.py`
|
| 55 |
|
| 56 |
-
##
|
| 57 |
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
```
|
| 63 |
-
|
| 64 |
-
The application will open in your browser at `http://localhost:8501`
|
| 65 |
-
|
| 66 |
-
### Upload Log Files
|
| 67 |
-
|
| 68 |
-
1. Click "Browse files" in the sidebar
|
| 69 |
-
2. Select one or more IIS log files (.log or .txt)
|
| 70 |
-
3. View the analysis results
|
| 71 |
-
|
| 72 |
-
### Configuration Options
|
| 73 |
-
|
| 74 |
-
- **Upload Mode**: Single or Multiple files
|
| 75 |
-
- **Top N Methods**: Number of top methods to display (3-20)
|
| 76 |
-
- **Slow Request Threshold**: Configure what constitutes a "slow" request (default: 3000ms)
|
| 77 |
|
| 78 |
## Log Format
|
| 79 |
|
| 80 |
-
|
| 81 |
-
|
| 82 |
-
```
|
| 83 |
-
date time s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip
|
| 84 |
-
cs(User-Agent) cs(Referer) sc-status sc-substatus sc-win32-status time-taken
|
| 85 |
-
```
|
| 86 |
-
|
| 87 |
-
Example log line:
|
| 88 |
```
|
| 89 |
-
|
|
|
|
| 90 |
```
|
| 91 |
|
| 92 |
## Filtering Rules
|
| 93 |
|
| 94 |
-
|
| 95 |
-
|
| 96 |
-
|
| 97 |
-
|
| 98 |
-
3. **Error Definition**: Errors are HTTP status codes ≠ 200 and ≠ 401
|
| 99 |
|
| 100 |
-
##
|
| 101 |
|
| 102 |
-
|
| 103 |
-
|
| 104 |
-
|
| 105 |
-
|
| 106 |
-
| **Processed Requests** | Valid requests included in analysis |
|
| 107 |
-
| **Errors** | Requests with status ≠ 200 and ≠ 401 |
|
| 108 |
-
| **Slow Requests** | Requests exceeding threshold (default: 3000ms) |
|
| 109 |
-
| **Peak RPS** | Maximum requests per second observed |
|
| 110 |
-
| **Avg/Max/Min Response Time** | Response time statistics in milliseconds |
|
| 111 |
|
| 112 |
## Performance
|
| 113 |
|
| 114 |
-
-
|
| 115 |
-
-
|
| 116 |
-
-
|
| 117 |
-
|
| 118 |
-
Performance depends on:
|
| 119 |
-
- File size
|
| 120 |
-
- Number of log entries
|
| 121 |
-
- System CPU and RAM
|
| 122 |
-
- Disk I/O speed
|
| 123 |
-
|
| 124 |
-
## Architecture
|
| 125 |
-
|
| 126 |
-
```
|
| 127 |
-
app.py # Streamlit UI application
|
| 128 |
-
log_parser.py # Core parsing and analysis logic using Polars
|
| 129 |
-
requirements.txt # Python dependencies
|
| 130 |
-
README.md # This file
|
| 131 |
-
```
|
| 132 |
-
|
| 133 |
-
### Key Components
|
| 134 |
-
|
| 135 |
-
- **IISLogParser**: Parses IIS W3C log format into Polars DataFrame
|
| 136 |
-
- **LogAnalyzer**: Calculates metrics and statistics
|
| 137 |
-
- **Streamlit UI**: Interactive web interface with visualizations
|
| 138 |
-
|
| 139 |
-
## Use Cases
|
| 140 |
-
|
| 141 |
-
- **Performance Analysis**: Identify slow endpoints and response time patterns
|
| 142 |
-
- **Error Investigation**: Track error rates and problematic methods
|
| 143 |
-
- **Capacity Planning**: Analyze peak load and RPS patterns
|
| 144 |
-
- **Service Comparison**: Compare performance across multiple services
|
| 145 |
-
- **Incident Review**: Analyze logs from specific time periods
|
| 146 |
-
|
| 147 |
-
## Troubleshooting
|
| 148 |
-
|
| 149 |
-
### Large File Upload Issues
|
| 150 |
-
|
| 151 |
-
If Streamlit has trouble with very large files (>500MB):
|
| 152 |
-
|
| 153 |
-
1. Increase Streamlit's upload size limit:
|
| 154 |
-
```bash
|
| 155 |
-
streamlit run app.py --server.maxUploadSize=1024
|
| 156 |
-
```
|
| 157 |
-
|
| 158 |
-
2. Or modify `.streamlit/config.toml`:
|
| 159 |
-
```toml
|
| 160 |
-
[server]
|
| 161 |
-
maxUploadSize = 1024
|
| 162 |
-
```
|
| 163 |
-
|
| 164 |
-
### Memory Issues
|
| 165 |
-
|
| 166 |
-
For files >1GB, you may need to:
|
| 167 |
-
- Increase available system memory
|
| 168 |
-
- Process files in smaller chunks
|
| 169 |
-
- Use the CLI version (can be developed if needed)
|
| 170 |
-
|
| 171 |
-
### Performance Tips
|
| 172 |
-
|
| 173 |
-
- Close other memory-intensive applications
|
| 174 |
-
- Process files one at a time for very large files
|
| 175 |
-
- Use SSD for faster I/O
|
| 176 |
-
- Ensure adequate RAM (8GB+ recommended for 1GB files)
|
| 177 |
-
|
| 178 |
-
## Future Enhancements
|
| 179 |
-
|
| 180 |
-
Potential features for future versions:
|
| 181 |
-
- CLI tool for batch processing
|
| 182 |
-
- Export results to PDF/Excel
|
| 183 |
-
- Real-time log monitoring
|
| 184 |
-
- Custom metric definitions
|
| 185 |
-
- Time range filtering
|
| 186 |
-
- IP address analysis
|
| 187 |
-
- Session tracking
|
| 188 |
-
|
| 189 |
-
## Example Output
|
| 190 |
-
|
| 191 |
-
The application generates:
|
| 192 |
-
|
| 193 |
-
1. **Summary Table**: Key metrics for each log file
|
| 194 |
-
2. **Top Methods Chart**: Most frequently called endpoints
|
| 195 |
-
3. **Response Time Distribution**: Histogram of response times
|
| 196 |
-
4. **Error Breakdown**: Pie chart of error types
|
| 197 |
-
5. **Service Comparison**: Side-by-side comparison for multiple files
|
| 198 |
|
| 199 |
## License
|
| 200 |
|
| 201 |
-
|
| 202 |
-
|
| 203 |
-
## Support
|
| 204 |
-
|
| 205 |
-
For issues or questions:
|
| 206 |
-
1. Check log file format matches IIS W3C Extended format
|
| 207 |
-
2. Verify all required fields are present
|
| 208 |
-
3. Ensure Python and dependencies are correctly installed
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: IIS Log Performance Analyzer
|
| 3 |
+
emoji: 📊
|
| 4 |
+
colorFrom: blue
|
| 5 |
+
colorTo: purple
|
| 6 |
+
sdk: docker
|
| 7 |
+
pinned: false
|
| 8 |
+
license: mit
|
| 9 |
+
app_port: 7860
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
# IIS Log Performance Analyzer
|
| 13 |
|
| 14 |
High-performance web application for analyzing large IIS log files (200MB-1GB+). Built with Streamlit and Polars for fast, efficient processing.
|
| 15 |
|
| 16 |
**GitHub Repository**: [https://github.com/pilot-stuk/odata_log_parser](https://github.com/pilot-stuk/odata_log_parser)
|
| 17 |
|
|
|
|
|
|
|
| 18 |
## Features
|
| 19 |
|
| 20 |
+
- ⚡ **Fast Processing**: Uses Polars library for 10-100x faster parsing compared to pandas
|
| 21 |
+
- 📦 **Large File Support**: Efficiently handles files up to 1GB+
|
| 22 |
+
- 📊 **Comprehensive Metrics**: RPS, response times, error rates, and more
|
| 23 |
+
- 🔍 **Detailed Analysis**: Top methods, error breakdown, time distribution
|
| 24 |
+
- 📈 **Visual Reports**: Interactive charts with Plotly
|
| 25 |
+
- 🔄 **Multi-file Support**: Compare multiple services side-by-side
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
|
| 27 |
+
## How to Use
|
| 28 |
|
| 29 |
+
1. Upload one or more IIS log files (W3C Extended format)
|
| 30 |
+
2. View comprehensive performance metrics
|
| 31 |
+
3. Analyze errors, slow requests, and response time distribution
|
| 32 |
+
4. Compare multiple services side-by-side
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
|
| 34 |
## Log Format
|
| 35 |
|
| 36 |
+
Supports **IIS W3C Extended Log Format** with fields:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
```
|
| 38 |
+
date time s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username
|
| 39 |
+
c-ip cs(User-Agent) cs(Referer) sc-status sc-substatus sc-win32-status time-taken
|
| 40 |
```
|
| 41 |
|
| 42 |
## Filtering Rules
|
| 43 |
|
| 44 |
+
- Excludes monitoring requests (HEAD + Zabbix)
|
| 45 |
+
- 401 Unauthorized responses excluded from error counts
|
| 46 |
+
- Errors defined as status codes ≠ 200 and ≠ 401
|
| 47 |
+
- Slow requests: response time > 3000ms (configurable)
|
|
|
|
| 48 |
|
| 49 |
+
## Technology Stack
|
| 50 |
|
| 51 |
+
- **Frontend**: Streamlit
|
| 52 |
+
- **Data Processing**: Polars
|
| 53 |
+
- **Visualizations**: Plotly
|
| 54 |
+
- **Deployment**: Docker on Hugging Face Spaces
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 55 |
|
| 56 |
## Performance
|
| 57 |
|
| 58 |
+
- Small files (<50MB): Process in seconds
|
| 59 |
+
- Medium files (50-200MB): Process in 10-30 seconds
|
| 60 |
+
- Large files (200MB-1GB): Process in 1-3 minutes
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 61 |
|
| 62 |
## License
|
| 63 |
|
| 64 |
+
MIT License - See GitHub repository for details
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
README_GITHUB.md
ADDED
|
@@ -0,0 +1,208 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# IIS Log Performance Analyzer
|
| 2 |
+
|
| 3 |
+
High-performance web application for analyzing large IIS log files (200MB-1GB+). Built with Streamlit and Polars for fast, efficient processing.
|
| 4 |
+
|
| 5 |
+
**GitHub Repository**: [https://github.com/pilot-stuk/odata_log_parser](https://github.com/pilot-stuk/odata_log_parser)
|
| 6 |
+
|
| 7 |
+
**Live Demo**: Deploy on [Streamlit Cloud](https://streamlit.io/cloud)
|
| 8 |
+
|
| 9 |
+
## Features
|
| 10 |
+
|
| 11 |
+
- **Fast Processing**: Uses Polars library for 10-100x faster parsing compared to pandas
|
| 12 |
+
- **Large File Support**: Efficiently handles files up to 1GB+
|
| 13 |
+
- **Comprehensive Metrics**:
|
| 14 |
+
- Total requests (before/after filtering)
|
| 15 |
+
- Error rates and breakdown by status code
|
| 16 |
+
- Response time statistics (min/max/avg)
|
| 17 |
+
- Slow request detection (configurable threshold)
|
| 18 |
+
- Peak RPS (Requests Per Second) with timestamp
|
| 19 |
+
- Top methods by request count and response time
|
| 20 |
+
- **Multi-File Analysis**: Upload and compare multiple log files side-by-side
|
| 21 |
+
- **Interactive Visualizations**: Charts and graphs using Plotly
|
| 22 |
+
- **Smart Filtering**: Automatically excludes monitoring requests (Zabbix HEAD) and 401 unauthorized
|
| 23 |
+
|
| 24 |
+
## Requirements
|
| 25 |
+
|
| 26 |
+
- Python 3.8+
|
| 27 |
+
- See `requirements.txt` for package dependencies
|
| 28 |
+
|
| 29 |
+
## Installation
|
| 30 |
+
|
| 31 |
+
### Local Installation
|
| 32 |
+
|
| 33 |
+
1. Clone the repository:
|
| 34 |
+
```bash
|
| 35 |
+
git clone https://github.com/pilot-stuk/odata_log_parser.git
|
| 36 |
+
cd odata_log_parser
|
| 37 |
+
```
|
| 38 |
+
|
| 39 |
+
2. Install dependencies:
|
| 40 |
+
```bash
|
| 41 |
+
pip install -r requirements.txt
|
| 42 |
+
```
|
| 43 |
+
|
| 44 |
+
### Deploy to Streamlit Cloud
|
| 45 |
+
|
| 46 |
+
1. Fork or clone this repository to your GitHub account
|
| 47 |
+
2. Go to [share.streamlit.io](https://share.streamlit.io/)
|
| 48 |
+
3. Sign in with your GitHub account
|
| 49 |
+
4. Click "New app"
|
| 50 |
+
5. Select your repository: `pilot-stuk/odata_log_parser`
|
| 51 |
+
6. Set the main file path: `app.py`
|
| 52 |
+
7. Click "Deploy"
|
| 53 |
+
|
| 54 |
+
The app will be live at: `https://share.streamlit.io/pilot-stuki/odata_log_parser/main/app.py`
|
| 55 |
+
|
| 56 |
+
## Usage
|
| 57 |
+
|
| 58 |
+
### Run the Streamlit App
|
| 59 |
+
|
| 60 |
+
```bash
|
| 61 |
+
streamlit run app.py
|
| 62 |
+
```
|
| 63 |
+
|
| 64 |
+
The application will open in your browser at `http://localhost:8501`
|
| 65 |
+
|
| 66 |
+
### Upload Log Files
|
| 67 |
+
|
| 68 |
+
1. Click "Browse files" in the sidebar
|
| 69 |
+
2. Select one or more IIS log files (.log or .txt)
|
| 70 |
+
3. View the analysis results
|
| 71 |
+
|
| 72 |
+
### Configuration Options
|
| 73 |
+
|
| 74 |
+
- **Upload Mode**: Single or Multiple files
|
| 75 |
+
- **Top N Methods**: Number of top methods to display (3-20)
|
| 76 |
+
- **Slow Request Threshold**: Configure what constitutes a "slow" request (default: 3000ms)
|
| 77 |
+
|
| 78 |
+
## Log Format
|
| 79 |
+
|
| 80 |
+
This tool supports **IIS W3C Extended Log Format** with the following fields:
|
| 81 |
+
|
| 82 |
+
```
|
| 83 |
+
date time s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip
|
| 84 |
+
cs(User-Agent) cs(Referer) sc-status sc-substatus sc-win32-status time-taken
|
| 85 |
+
```
|
| 86 |
+
|
| 87 |
+
Example log line:
|
| 88 |
+
```
|
| 89 |
+
2025-09-22 00:00:46 10.21.31.42 GET /Service/Contact/Get sessionid='xxx' 443 - 212.233.92.232 - - 200 0 0 24
|
| 90 |
+
```
|
| 91 |
+
|
| 92 |
+
## Filtering Rules
|
| 93 |
+
|
| 94 |
+
The analyzer applies the following filters automatically:
|
| 95 |
+
|
| 96 |
+
1. **Monitoring Exclusion**: Lines containing both `HEAD` method and `Zabbix` are excluded
|
| 97 |
+
2. **401 Handling**: 401 Unauthorized responses are excluded from error counts (considered authentication attempts, not system errors)
|
| 98 |
+
3. **Error Definition**: Errors are HTTP status codes ≠ 200 and ≠ 401
|
| 99 |
+
|
| 100 |
+
## Metrics Explained
|
| 101 |
+
|
| 102 |
+
| Metric | Description |
|
| 103 |
+
|--------|-------------|
|
| 104 |
+
| **Total Requests (before filtering)** | Raw number of log entries |
|
| 105 |
+
| **Excluded Requests** | Lines filtered out (HEAD+Zabbix + 401) |
|
| 106 |
+
| **Processed Requests** | Valid requests included in analysis |
|
| 107 |
+
| **Errors** | Requests with status ≠ 200 and ≠ 401 |
|
| 108 |
+
| **Slow Requests** | Requests exceeding threshold (default: 3000ms) |
|
| 109 |
+
| **Peak RPS** | Maximum requests per second observed |
|
| 110 |
+
| **Avg/Max/Min Response Time** | Response time statistics in milliseconds |
|
| 111 |
+
|
| 112 |
+
## Performance
|
| 113 |
+
|
| 114 |
+
- **Small files** (<50MB): Process in seconds
|
| 115 |
+
- **Medium files** (50-200MB): Process in 10-30 seconds
|
| 116 |
+
- **Large files** (200MB-1GB): Process in 1-3 minutes
|
| 117 |
+
|
| 118 |
+
Performance depends on:
|
| 119 |
+
- File size
|
| 120 |
+
- Number of log entries
|
| 121 |
+
- System CPU and RAM
|
| 122 |
+
- Disk I/O speed
|
| 123 |
+
|
| 124 |
+
## Architecture
|
| 125 |
+
|
| 126 |
+
```
|
| 127 |
+
app.py # Streamlit UI application
|
| 128 |
+
log_parser.py # Core parsing and analysis logic using Polars
|
| 129 |
+
requirements.txt # Python dependencies
|
| 130 |
+
README.md # This file
|
| 131 |
+
```
|
| 132 |
+
|
| 133 |
+
### Key Components
|
| 134 |
+
|
| 135 |
+
- **IISLogParser**: Parses IIS W3C log format into Polars DataFrame
|
| 136 |
+
- **LogAnalyzer**: Calculates metrics and statistics
|
| 137 |
+
- **Streamlit UI**: Interactive web interface with visualizations
|
| 138 |
+
|
| 139 |
+
## Use Cases
|
| 140 |
+
|
| 141 |
+
- **Performance Analysis**: Identify slow endpoints and response time patterns
|
| 142 |
+
- **Error Investigation**: Track error rates and problematic methods
|
| 143 |
+
- **Capacity Planning**: Analyze peak load and RPS patterns
|
| 144 |
+
- **Service Comparison**: Compare performance across multiple services
|
| 145 |
+
- **Incident Review**: Analyze logs from specific time periods
|
| 146 |
+
|
| 147 |
+
## Troubleshooting
|
| 148 |
+
|
| 149 |
+
### Large File Upload Issues
|
| 150 |
+
|
| 151 |
+
If Streamlit has trouble with very large files (>500MB):
|
| 152 |
+
|
| 153 |
+
1. Increase Streamlit's upload size limit:
|
| 154 |
+
```bash
|
| 155 |
+
streamlit run app.py --server.maxUploadSize=1024
|
| 156 |
+
```
|
| 157 |
+
|
| 158 |
+
2. Or modify `.streamlit/config.toml`:
|
| 159 |
+
```toml
|
| 160 |
+
[server]
|
| 161 |
+
maxUploadSize = 1024
|
| 162 |
+
```
|
| 163 |
+
|
| 164 |
+
### Memory Issues
|
| 165 |
+
|
| 166 |
+
For files >1GB, you may need to:
|
| 167 |
+
- Increase available system memory
|
| 168 |
+
- Process files in smaller chunks
|
| 169 |
+
- Use the CLI version (can be developed if needed)
|
| 170 |
+
|
| 171 |
+
### Performance Tips
|
| 172 |
+
|
| 173 |
+
- Close other memory-intensive applications
|
| 174 |
+
- Process files one at a time for very large files
|
| 175 |
+
- Use SSD for faster I/O
|
| 176 |
+
- Ensure adequate RAM (8GB+ recommended for 1GB files)
|
| 177 |
+
|
| 178 |
+
## Future Enhancements
|
| 179 |
+
|
| 180 |
+
Potential features for future versions:
|
| 181 |
+
- CLI tool for batch processing
|
| 182 |
+
- Export results to PDF/Excel
|
| 183 |
+
- Real-time log monitoring
|
| 184 |
+
- Custom metric definitions
|
| 185 |
+
- Time range filtering
|
| 186 |
+
- IP address analysis
|
| 187 |
+
- Session tracking
|
| 188 |
+
|
| 189 |
+
## Example Output
|
| 190 |
+
|
| 191 |
+
The application generates:
|
| 192 |
+
|
| 193 |
+
1. **Summary Table**: Key metrics for each log file
|
| 194 |
+
2. **Top Methods Chart**: Most frequently called endpoints
|
| 195 |
+
3. **Response Time Distribution**: Histogram of response times
|
| 196 |
+
4. **Error Breakdown**: Pie chart of error types
|
| 197 |
+
5. **Service Comparison**: Side-by-side comparison for multiple files
|
| 198 |
+
|
| 199 |
+
## License
|
| 200 |
+
|
| 201 |
+
This tool is provided as-is for log analysis purposes.
|
| 202 |
+
|
| 203 |
+
## Support
|
| 204 |
+
|
| 205 |
+
For issues or questions:
|
| 206 |
+
1. Check log file format matches IIS W3C Extended format
|
| 207 |
+
2. Verify all required fields are present
|
| 208 |
+
3. Ensure Python and dependencies are correctly installed
|