odatalogparser / README_GITHUB.md
pilotstuki's picture
Add Hugging Face Spaces deployment configuration
ddfcf3e

IIS Log Performance Analyzer

High-performance web application for analyzing large IIS log files (200MB-1GB+). Built with Streamlit and Polars for fast, efficient processing.

GitHub Repository: https://github.com/pilot-stuk/odata_log_parser

Live Demo: Deploy on Streamlit Cloud

Features

  • Fast Processing: Uses Polars library for 10-100x faster parsing compared to pandas
  • Large File Support: Efficiently handles files up to 1GB+
  • Comprehensive Metrics:
    • Total requests (before/after filtering)
    • Error rates and breakdown by status code
    • Response time statistics (min/max/avg)
    • Slow request detection (configurable threshold)
    • Peak RPS (Requests Per Second) with timestamp
    • Top methods by request count and response time
  • Multi-File Analysis: Upload and compare multiple log files side-by-side
  • Interactive Visualizations: Charts and graphs using Plotly
  • Smart Filtering: Automatically excludes monitoring requests (Zabbix HEAD) and 401 unauthorized

Requirements

  • Python 3.8+
  • See requirements.txt for package dependencies

Installation

Local Installation

  1. Clone the repository:
git clone https://github.com/pilot-stuk/odata_log_parser.git
cd odata_log_parser
  1. Install dependencies:
pip install -r requirements.txt

Deploy to Streamlit Cloud

  1. Fork or clone this repository to your GitHub account
  2. Go to share.streamlit.io
  3. Sign in with your GitHub account
  4. Click "New app"
  5. Select your repository: pilot-stuk/odata_log_parser
  6. Set the main file path: app.py
  7. Click "Deploy"

The app will be live at: https://share.streamlit.io/pilot-stuki/odata_log_parser/main/app.py

Usage

Run the Streamlit App

streamlit run app.py

The application will open in your browser at http://localhost:8501

Upload Log Files

  1. Click "Browse files" in the sidebar
  2. Select one or more IIS log files (.log or .txt)
  3. View the analysis results

Configuration Options

  • Upload Mode: Single or Multiple files
  • Top N Methods: Number of top methods to display (3-20)
  • Slow Request Threshold: Configure what constitutes a "slow" request (default: 3000ms)

Log Format

This tool supports IIS W3C Extended Log Format with the following fields:

date time s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip
cs(User-Agent) cs(Referer) sc-status sc-substatus sc-win32-status time-taken

Example log line:

2025-09-22 00:00:46 10.21.31.42 GET /Service/Contact/Get sessionid='xxx' 443 - 212.233.92.232 - - 200 0 0 24

Filtering Rules

The analyzer applies the following filters automatically:

  1. Monitoring Exclusion: Lines containing both HEAD method and Zabbix are excluded
  2. 401 Handling: 401 Unauthorized responses are excluded from error counts (considered authentication attempts, not system errors)
  3. Error Definition: Errors are HTTP status codes ≠ 200 and ≠ 401

Metrics Explained

Metric Description
Total Requests (before filtering) Raw number of log entries
Excluded Requests Lines filtered out (HEAD+Zabbix + 401)
Processed Requests Valid requests included in analysis
Errors Requests with status ≠ 200 and ≠ 401
Slow Requests Requests exceeding threshold (default: 3000ms)
Peak RPS Maximum requests per second observed
Avg/Max/Min Response Time Response time statistics in milliseconds

Performance

  • Small files (<50MB): Process in seconds
  • Medium files (50-200MB): Process in 10-30 seconds
  • Large files (200MB-1GB): Process in 1-3 minutes

Performance depends on:

  • File size
  • Number of log entries
  • System CPU and RAM
  • Disk I/O speed

Architecture

app.py              # Streamlit UI application
log_parser.py       # Core parsing and analysis logic using Polars
requirements.txt    # Python dependencies
README.md          # This file

Key Components

  • IISLogParser: Parses IIS W3C log format into Polars DataFrame
  • LogAnalyzer: Calculates metrics and statistics
  • Streamlit UI: Interactive web interface with visualizations

Use Cases

  • Performance Analysis: Identify slow endpoints and response time patterns
  • Error Investigation: Track error rates and problematic methods
  • Capacity Planning: Analyze peak load and RPS patterns
  • Service Comparison: Compare performance across multiple services
  • Incident Review: Analyze logs from specific time periods

Troubleshooting

Large File Upload Issues

If Streamlit has trouble with very large files (>500MB):

  1. Increase Streamlit's upload size limit:
streamlit run app.py --server.maxUploadSize=1024
  1. Or modify .streamlit/config.toml:
[server]
maxUploadSize = 1024

Memory Issues

For files >1GB, you may need to:

  • Increase available system memory
  • Process files in smaller chunks
  • Use the CLI version (can be developed if needed)

Performance Tips

  • Close other memory-intensive applications
  • Process files one at a time for very large files
  • Use SSD for faster I/O
  • Ensure adequate RAM (8GB+ recommended for 1GB files)

Future Enhancements

Potential features for future versions:

  • CLI tool for batch processing
  • Export results to PDF/Excel
  • Real-time log monitoring
  • Custom metric definitions
  • Time range filtering
  • IP address analysis
  • Session tracking

Example Output

The application generates:

  1. Summary Table: Key metrics for each log file
  2. Top Methods Chart: Most frequently called endpoints
  3. Response Time Distribution: Histogram of response times
  4. Error Breakdown: Pie chart of error types
  5. Service Comparison: Side-by-side comparison for multiple files

License

This tool is provided as-is for log analysis purposes.

Support

For issues or questions:

  1. Check log file format matches IIS W3C Extended format
  2. Verify all required fields are present
  3. Ensure Python and dependencies are correctly installed