--- title: Log Classification System emoji: ๐Ÿ” colorFrom: blue colorTo: indigo sdk: gradio sdk_version: 5.23.0 app_file: app.py pinned: false license: mit --- # ๐Ÿ” Log Classification System A **production-inspired hybrid log classification pipeline** that routes enterprise logs through 3 tiers โ€” Regex โ†’ BERT + Logistic Regression โ†’ LLM โ€” based on pattern confidence and source system. ## Architecture ``` Input Log โ”‚ โ”œโ”€โ–บ [Tier 1] Regex Classifier โ†’ Fixed patterns (sub-ms latency) โ”‚ โ”‚ No match? โ”‚ โ–ผ โ”œโ”€โ–บ [Tier 2] BERT + LogReg โ†’ High-confidence ML (conf > 0.5) โ”‚ โ”‚ Low confidence? โ”‚ โ–ผ โ””โ”€โ–บ [Tier 3] LLM (HF Inference) โ†’ LegacyCRM / rare patterns ``` ## Categories | Category | Tier Used | |---|---| | User Action | Regex | | System Notification | Regex | | HTTP Status | BERT | | Security Alert | BERT | | Critical Error | BERT | | Error | BERT | | Resource Usage | BERT | | Workflow Error | LLM | | Deprecation Warning | LLM | ## Setup ### HuggingFace Spaces Secrets Required - `HF_TOKEN` โ€” your HuggingFace token (for LLM inference on LegacyCRM logs) ### Local Setup ```bash pip install -r requirements.txt python app.py ``` ## Source Systems - `ModernCRM`, `ModernHR`, `BillingSystem`, `AnalyticsEngine`, `ThirdPartyAPI` โ†’ Regex โ†’ BERT - `LegacyCRM` โ†’ LLM directly (too few training samples for ML) ## Tech Stack `sentence-transformers` ยท `scikit-learn` ยท `huggingface-hub` ยท `gradio` ยท `pandas`