File size: 1,554 Bytes
435dc35
 
abc86a6
 
 
435dc35
cab021f
435dc35
 
 
 
 
abc86a6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cab021f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
---
title: Log Classification System
emoji: πŸ”
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.23.0
app_file: app.py
pinned: false
license: mit
---

# πŸ” Log Classification System

A **production-inspired hybrid log classification pipeline** that routes enterprise logs through 3 tiers β€” Regex β†’ BERT + Logistic Regression β†’ LLM β€” based on pattern confidence and source system.

## Architecture

```
Input Log
    β”‚
    β”œβ”€β–Ί [Tier 1] Regex Classifier      β†’ Fixed patterns (sub-ms latency)
    β”‚         β”‚ No match?
    β”‚         β–Ό
    β”œβ”€β–Ί [Tier 2] BERT + LogReg         β†’ High-confidence ML (conf > 0.5)
    β”‚         β”‚ Low confidence?
    β”‚         β–Ό
    └─► [Tier 3] LLM (HF Inference)   β†’ LegacyCRM / rare patterns
```

## Categories

| Category | Tier Used |
|---|---|
| User Action | Regex |
| System Notification | Regex |
| HTTP Status | BERT |
| Security Alert | BERT |
| Critical Error | BERT |
| Error | BERT |
| Resource Usage | BERT |
| Workflow Error | LLM |
| Deprecation Warning | LLM |

## Setup

### HuggingFace Spaces Secrets Required
- `HF_TOKEN` β€” your HuggingFace token (for LLM inference on LegacyCRM logs)

### Local Setup
```bash
pip install -r requirements.txt
python app.py
```

## Source Systems
- `ModernCRM`, `ModernHR`, `BillingSystem`, `AnalyticsEngine`, `ThirdPartyAPI` β†’ Regex β†’ BERT
- `LegacyCRM` β†’ LLM directly (too few training samples for ML)

## Tech Stack
`sentence-transformers` Β· `scikit-learn` Β· `huggingface-hub` Β· `gradio` Β· `pandas`