NOT-OMEGA's picture
Update README.md
cab021f verified
---
title: Log Classification System
emoji: πŸ”
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.23.0
app_file: app.py
pinned: false
license: mit
---
# πŸ” Log Classification System
A **production-inspired hybrid log classification pipeline** that routes enterprise logs through 3 tiers β€” Regex β†’ BERT + Logistic Regression β†’ LLM β€” based on pattern confidence and source system.
## Architecture
```
Input Log
β”‚
β”œβ”€β–Ί [Tier 1] Regex Classifier β†’ Fixed patterns (sub-ms latency)
β”‚ β”‚ No match?
β”‚ β–Ό
β”œβ”€β–Ί [Tier 2] BERT + LogReg β†’ High-confidence ML (conf > 0.5)
β”‚ β”‚ Low confidence?
β”‚ β–Ό
└─► [Tier 3] LLM (HF Inference) β†’ LegacyCRM / rare patterns
```
## Categories
| Category | Tier Used |
|---|---|
| User Action | Regex |
| System Notification | Regex |
| HTTP Status | BERT |
| Security Alert | BERT |
| Critical Error | BERT |
| Error | BERT |
| Resource Usage | BERT |
| Workflow Error | LLM |
| Deprecation Warning | LLM |
## Setup
### HuggingFace Spaces Secrets Required
- `HF_TOKEN` β€” your HuggingFace token (for LLM inference on LegacyCRM logs)
### Local Setup
```bash
pip install -r requirements.txt
python app.py
```
## Source Systems
- `ModernCRM`, `ModernHR`, `BillingSystem`, `AnalyticsEngine`, `ThirdPartyAPI` β†’ Regex β†’ BERT
- `LegacyCRM` β†’ LLM directly (too few training samples for ML)
## Tech Stack
`sentence-transformers` Β· `scikit-learn` Β· `huggingface-hub` Β· `gradio` Β· `pandas`