odatalogparser / README_GITHUB.md
pilotstuki's picture
Add Hugging Face Spaces deployment configuration
ddfcf3e
# IIS Log Performance Analyzer
High-performance web application for analyzing large IIS log files (200MB-1GB+). Built with Streamlit and Polars for fast, efficient processing.
**GitHub Repository**: [https://github.com/pilot-stuk/odata_log_parser](https://github.com/pilot-stuk/odata_log_parser)
**Live Demo**: Deploy on [Streamlit Cloud](https://streamlit.io/cloud)
## Features
- **Fast Processing**: Uses Polars library for 10-100x faster parsing compared to pandas
- **Large File Support**: Efficiently handles files up to 1GB+
- **Comprehensive Metrics**:
- Total requests (before/after filtering)
- Error rates and breakdown by status code
- Response time statistics (min/max/avg)
- Slow request detection (configurable threshold)
- Peak RPS (Requests Per Second) with timestamp
- Top methods by request count and response time
- **Multi-File Analysis**: Upload and compare multiple log files side-by-side
- **Interactive Visualizations**: Charts and graphs using Plotly
- **Smart Filtering**: Automatically excludes monitoring requests (Zabbix HEAD) and 401 unauthorized
## Requirements
- Python 3.8+
- See `requirements.txt` for package dependencies
## Installation
### Local Installation
1. Clone the repository:
```bash
git clone https://github.com/pilot-stuk/odata_log_parser.git
cd odata_log_parser
```
2. Install dependencies:
```bash
pip install -r requirements.txt
```
### Deploy to Streamlit Cloud
1. Fork or clone this repository to your GitHub account
2. Go to [share.streamlit.io](https://share.streamlit.io/)
3. Sign in with your GitHub account
4. Click "New app"
5. Select your repository: `pilot-stuk/odata_log_parser`
6. Set the main file path: `app.py`
7. Click "Deploy"
The app will be live at: `https://share.streamlit.io/pilot-stuki/odata_log_parser/main/app.py`
## Usage
### Run the Streamlit App
```bash
streamlit run app.py
```
The application will open in your browser at `http://localhost:8501`
### Upload Log Files
1. Click "Browse files" in the sidebar
2. Select one or more IIS log files (.log or .txt)
3. View the analysis results
### Configuration Options
- **Upload Mode**: Single or Multiple files
- **Top N Methods**: Number of top methods to display (3-20)
- **Slow Request Threshold**: Configure what constitutes a "slow" request (default: 3000ms)
## Log Format
This tool supports **IIS W3C Extended Log Format** with the following fields:
```
date time s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip
cs(User-Agent) cs(Referer) sc-status sc-substatus sc-win32-status time-taken
```
Example log line:
```
2025-09-22 00:00:46 10.21.31.42 GET /Service/Contact/Get sessionid='xxx' 443 - 212.233.92.232 - - 200 0 0 24
```
## Filtering Rules
The analyzer applies the following filters automatically:
1. **Monitoring Exclusion**: Lines containing both `HEAD` method and `Zabbix` are excluded
2. **401 Handling**: 401 Unauthorized responses are excluded from error counts (considered authentication attempts, not system errors)
3. **Error Definition**: Errors are HTTP status codes ≠ 200 and ≠ 401
## Metrics Explained
| Metric | Description |
|--------|-------------|
| **Total Requests (before filtering)** | Raw number of log entries |
| **Excluded Requests** | Lines filtered out (HEAD+Zabbix + 401) |
| **Processed Requests** | Valid requests included in analysis |
| **Errors** | Requests with status ≠ 200 and ≠ 401 |
| **Slow Requests** | Requests exceeding threshold (default: 3000ms) |
| **Peak RPS** | Maximum requests per second observed |
| **Avg/Max/Min Response Time** | Response time statistics in milliseconds |
## Performance
- **Small files** (<50MB): Process in seconds
- **Medium files** (50-200MB): Process in 10-30 seconds
- **Large files** (200MB-1GB): Process in 1-3 minutes
Performance depends on:
- File size
- Number of log entries
- System CPU and RAM
- Disk I/O speed
## Architecture
```
app.py # Streamlit UI application
log_parser.py # Core parsing and analysis logic using Polars
requirements.txt # Python dependencies
README.md # This file
```
### Key Components
- **IISLogParser**: Parses IIS W3C log format into Polars DataFrame
- **LogAnalyzer**: Calculates metrics and statistics
- **Streamlit UI**: Interactive web interface with visualizations
## Use Cases
- **Performance Analysis**: Identify slow endpoints and response time patterns
- **Error Investigation**: Track error rates and problematic methods
- **Capacity Planning**: Analyze peak load and RPS patterns
- **Service Comparison**: Compare performance across multiple services
- **Incident Review**: Analyze logs from specific time periods
## Troubleshooting
### Large File Upload Issues
If Streamlit has trouble with very large files (>500MB):
1. Increase Streamlit's upload size limit:
```bash
streamlit run app.py --server.maxUploadSize=1024
```
2. Or modify `.streamlit/config.toml`:
```toml
[server]
maxUploadSize = 1024
```
### Memory Issues
For files >1GB, you may need to:
- Increase available system memory
- Process files in smaller chunks
- Use the CLI version (can be developed if needed)
### Performance Tips
- Close other memory-intensive applications
- Process files one at a time for very large files
- Use SSD for faster I/O
- Ensure adequate RAM (8GB+ recommended for 1GB files)
## Future Enhancements
Potential features for future versions:
- CLI tool for batch processing
- Export results to PDF/Excel
- Real-time log monitoring
- Custom metric definitions
- Time range filtering
- IP address analysis
- Session tracking
## Example Output
The application generates:
1. **Summary Table**: Key metrics for each log file
2. **Top Methods Chart**: Most frequently called endpoints
3. **Response Time Distribution**: Histogram of response times
4. **Error Breakdown**: Pie chart of error types
5. **Service Comparison**: Side-by-side comparison for multiple files
## License
This tool is provided as-is for log analysis purposes.
## Support
For issues or questions:
1. Check log file format matches IIS W3C Extended format
2. Verify all required fields are present
3. Ensure Python and dependencies are correctly installed