Spaces:

pilotstuki
/

odatalogparser

Sleeping

App Files Files Community

odatalogparser / README_GITHUB.md

pilotstuki

Add Hugging Face Spaces deployment configuration

ddfcf3e 5 months ago

preview code

raw

history blame contribute delete

6.2 kB

	# IIS Log Performance Analyzer

	High-performance web application for analyzing large IIS log files (200MB-1GB+). Built with Streamlit and Polars for fast, efficient processing.

	GitHub Repository: [https://github.com/pilot-stuk/odata_log_parser](https://github.com/pilot-stuk/odata_log_parser)

	Live Demo: Deploy on [Streamlit Cloud](https://streamlit.io/cloud)

	## Features

	- Fast Processing: Uses Polars library for 10-100x faster parsing compared to pandas
	- Large File Support: Efficiently handles files up to 1GB+
	- Comprehensive Metrics:
	- Total requests (before/after filtering)
	- Error rates and breakdown by status code
	- Response time statistics (min/max/avg)
	- Slow request detection (configurable threshold)
	- Peak RPS (Requests Per Second) with timestamp
	- Top methods by request count and response time
	- Multi-File Analysis: Upload and compare multiple log files side-by-side
	- Interactive Visualizations: Charts and graphs using Plotly
	- Smart Filtering: Automatically excludes monitoring requests (Zabbix HEAD) and 401 unauthorized

	## Requirements

	- Python 3.8+
	- See `requirements.txt` for package dependencies

	## Installation

	### Local Installation

	1. Clone the repository:
	```bash
	git clone https://github.com/pilot-stuk/odata_log_parser.git
	cd odata_log_parser
	```

	2. Install dependencies:
	```bash
	pip install -r requirements.txt
	```

	### Deploy to Streamlit Cloud

	1. Fork or clone this repository to your GitHub account
	2. Go to [share.streamlit.io](https://share.streamlit.io/)
	3. Sign in with your GitHub account
	4. Click "New app"
	5. Select your repository: `pilot-stuk/odata_log_parser`
	6. Set the main file path: `app.py`
	7. Click "Deploy"

	The app will be live at: `https://share.streamlit.io/pilot-stuki/odata_log_parser/main/app.py`

	## Usage

	### Run the Streamlit App

	```bash
	streamlit run app.py
	```

	The application will open in your browser at `http://localhost:8501`

	### Upload Log Files

	1. Click "Browse files" in the sidebar
	2. Select one or more IIS log files (.log or .txt)
	3. View the analysis results

	### Configuration Options

	- Upload Mode: Single or Multiple files
	- Top N Methods: Number of top methods to display (3-20)
	- Slow Request Threshold: Configure what constitutes a "slow" request (default: 3000ms)

	## Log Format

	This tool supports IIS W3C Extended Log Format with the following fields:

	```
	date time s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip
	cs(User-Agent) cs(Referer) sc-status sc-substatus sc-win32-status time-taken
	```

	Example log line:
	```
	2025-09-22 00:00:46 10.21.31.42 GET /Service/Contact/Get sessionid='xxx' 443 - 212.233.92.232 - - 200 0 0 24
	```

	## Filtering Rules

	The analyzer applies the following filters automatically:

	1. Monitoring Exclusion: Lines containing both `HEAD` method and `Zabbix` are excluded
	2. 401 Handling: 401 Unauthorized responses are excluded from error counts (considered authentication attempts, not system errors)
	3. Error Definition: Errors are HTTP status codes ≠ 200 and ≠ 401

	## Metrics Explained

	\| Metric \| Description \|
	\|--------\|-------------\|
	\| Total Requests (before filtering) \| Raw number of log entries \|
	\| Excluded Requests \| Lines filtered out (HEAD+Zabbix + 401) \|
	\| Processed Requests \| Valid requests included in analysis \|
	\| Errors \| Requests with status ≠ 200 and ≠ 401 \|
	\| Slow Requests \| Requests exceeding threshold (default: 3000ms) \|
	\| Peak RPS \| Maximum requests per second observed \|
	\| Avg/Max/Min Response Time \| Response time statistics in milliseconds \|

	## Performance

	- Small files (<50MB): Process in seconds
	- Medium files (50-200MB): Process in 10-30 seconds
	- Large files (200MB-1GB): Process in 1-3 minutes

	Performance depends on:
	- File size
	- Number of log entries
	- System CPU and RAM
	- Disk I/O speed

	## Architecture

	```
	app.py # Streamlit UI application
	log_parser.py # Core parsing and analysis logic using Polars
	requirements.txt # Python dependencies
	README.md # This file
	```

	### Key Components

	- IISLogParser: Parses IIS W3C log format into Polars DataFrame
	- LogAnalyzer: Calculates metrics and statistics
	- Streamlit UI: Interactive web interface with visualizations

	## Use Cases

	- Performance Analysis: Identify slow endpoints and response time patterns
	- Error Investigation: Track error rates and problematic methods
	- Capacity Planning: Analyze peak load and RPS patterns
	- Service Comparison: Compare performance across multiple services
	- Incident Review: Analyze logs from specific time periods

	## Troubleshooting

	### Large File Upload Issues

	If Streamlit has trouble with very large files (>500MB):

	1. Increase Streamlit's upload size limit:
	```bash
	streamlit run app.py --server.maxUploadSize=1024
	```

	2. Or modify `.streamlit/config.toml`:
	```toml
	[server]
	maxUploadSize = 1024
	```

	### Memory Issues

	For files >1GB, you may need to:
	- Increase available system memory
	- Process files in smaller chunks
	- Use the CLI version (can be developed if needed)

	### Performance Tips

	- Close other memory-intensive applications
	- Process files one at a time for very large files
	- Use SSD for faster I/O
	- Ensure adequate RAM (8GB+ recommended for 1GB files)

	## Future Enhancements

	Potential features for future versions:
	- CLI tool for batch processing
	- Export results to PDF/Excel
	- Real-time log monitoring
	- Custom metric definitions
	- Time range filtering
	- IP address analysis
	- Session tracking

	## Example Output

	The application generates:

	1. Summary Table: Key metrics for each log file
	2. Top Methods Chart: Most frequently called endpoints
	3. Response Time Distribution: Histogram of response times
	4. Error Breakdown: Pie chart of error types
	5. Service Comparison: Side-by-side comparison for multiple files

	## License

	This tool is provided as-is for log analysis purposes.

	## Support

	For issues or questions:
	1. Check log file format matches IIS W3C Extended format
	2. Verify all required fields are present
	3. Ensure Python and dependencies are correctly installed