Spaces:

pilotstuki
/

odatalogparser

Sleeping

App Files Files Community

odatalogparser / README_GITHUB.md

pilotstuki

Add Hugging Face Spaces deployment configuration

ddfcf3e 5 months ago

preview code

raw

history blame contribute delete

6.2 kB

IIS Log Performance Analyzer

High-performance web application for analyzing large IIS log files (200MB-1GB+). Built with Streamlit and Polars for fast, efficient processing.

GitHub Repository: https://github.com/pilot-stuk/odata_log_parser

Live Demo: Deploy on Streamlit Cloud

Features

Fast Processing: Uses Polars library for 10-100x faster parsing compared to pandas
Large File Support: Efficiently handles files up to 1GB+
Comprehensive Metrics:
- Total requests (before/after filtering)
- Error rates and breakdown by status code
- Response time statistics (min/max/avg)
- Slow request detection (configurable threshold)
- Peak RPS (Requests Per Second) with timestamp
- Top methods by request count and response time
Multi-File Analysis: Upload and compare multiple log files side-by-side
Interactive Visualizations: Charts and graphs using Plotly
Smart Filtering: Automatically excludes monitoring requests (Zabbix HEAD) and 401 unauthorized

Requirements

Python 3.8+
See requirements.txt for package dependencies

Installation

Local Installation

Clone the repository:

git clone https://github.com/pilot-stuk/odata_log_parser.git
cd odata_log_parser

Install dependencies:

pip install -r requirements.txt

Deploy to Streamlit Cloud

Fork or clone this repository to your GitHub account
Go to share.streamlit.io
Sign in with your GitHub account
Click "New app"
Select your repository: pilot-stuk/odata_log_parser
Set the main file path: app.py
Click "Deploy"

The app will be live at: https://share.streamlit.io/pilot-stuki/odata_log_parser/main/app.py

Usage

Run the Streamlit App

streamlit run app.py

The application will open in your browser at http://localhost:8501

Upload Log Files

Click "Browse files" in the sidebar
Select one or more IIS log files (.log or .txt)
View the analysis results

Configuration Options

Upload Mode: Single or Multiple files
Top N Methods: Number of top methods to display (3-20)
Slow Request Threshold: Configure what constitutes a "slow" request (default: 3000ms)

Log Format

This tool supports IIS W3C Extended Log Format with the following fields:

date time s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip
cs(User-Agent) cs(Referer) sc-status sc-substatus sc-win32-status time-taken

Example log line:

2025-09-22 00:00:46 10.21.31.42 GET /Service/Contact/Get sessionid='xxx' 443 - 212.233.92.232 - - 200 0 0 24

Filtering Rules

The analyzer applies the following filters automatically:

Monitoring Exclusion: Lines containing both HEAD method and Zabbix are excluded
401 Handling: 401 Unauthorized responses are excluded from error counts (considered authentication attempts, not system errors)
Error Definition: Errors are HTTP status codes ≠ 200 and ≠ 401

Metrics Explained

Metric	Description
Total Requests (before filtering)	Raw number of log entries
Excluded Requests	Lines filtered out (HEAD+Zabbix + 401)
Processed Requests	Valid requests included in analysis
Errors	Requests with status ≠ 200 and ≠ 401
Slow Requests	Requests exceeding threshold (default: 3000ms)
Peak RPS	Maximum requests per second observed
Avg/Max/Min Response Time	Response time statistics in milliseconds

Performance

Small files (<50MB): Process in seconds
Medium files (50-200MB): Process in 10-30 seconds
Large files (200MB-1GB): Process in 1-3 minutes

Performance depends on:

File size
Number of log entries
System CPU and RAM
Disk I/O speed

Architecture

app.py              # Streamlit UI application
log_parser.py       # Core parsing and analysis logic using Polars
requirements.txt    # Python dependencies
README.md          # This file

Key Components

IISLogParser: Parses IIS W3C log format into Polars DataFrame
LogAnalyzer: Calculates metrics and statistics
Streamlit UI: Interactive web interface with visualizations

Use Cases

Performance Analysis: Identify slow endpoints and response time patterns
Error Investigation: Track error rates and problematic methods
Capacity Planning: Analyze peak load and RPS patterns
Service Comparison: Compare performance across multiple services
Incident Review: Analyze logs from specific time periods

Troubleshooting

Large File Upload Issues

If Streamlit has trouble with very large files (>500MB):

Increase Streamlit's upload size limit:

streamlit run app.py --server.maxUploadSize=1024

Or modify .streamlit/config.toml:

[server]
maxUploadSize = 1024

Memory Issues

For files >1GB, you may need to:

Increase available system memory
Process files in smaller chunks
Use the CLI version (can be developed if needed)

Performance Tips

Close other memory-intensive applications
Process files one at a time for very large files
Use SSD for faster I/O
Ensure adequate RAM (8GB+ recommended for 1GB files)

Future Enhancements

Potential features for future versions:

CLI tool for batch processing
Export results to PDF/Excel
Real-time log monitoring
Custom metric definitions
Time range filtering
IP address analysis
Session tracking

Example Output

The application generates:

Summary Table: Key metrics for each log file
Top Methods Chart: Most frequently called endpoints
Response Time Distribution: Histogram of response times
Error Breakdown: Pie chart of error types
Service Comparison: Side-by-side comparison for multiple files

License

This tool is provided as-is for log analysis purposes.

Support

For issues or questions:

Check log file format matches IIS W3C Extended format
Verify all required fields are present
Ensure Python and dependencies are correctly installed