ayushgupta7777 commited on
Commit
59a99db
·
verified ·
1 Parent(s): 1de6e05

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +113 -0
README.md ADDED
@@ -0,0 +1,113 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: apache-2.0
4
+ library_name: pytorch
5
+ tags:
6
+ - distilbert
7
+ - text-classification
8
+ - multi-task
9
+ - incident-response
10
+ - sre
11
+ - sentinelops
12
+ pipeline_tag: text-classification
13
+ datasets:
14
+ - ayushgupta07xx/sentinelops-corpus
15
+ base_model: distilbert-base-uncased
16
+ ---
17
+
18
+ # SentinelOps Incident Classifier
19
+
20
+ Multi-task DistilBERT classifier that predicts **severity** (P0/P1/P2/P3) and **category** (networking, database, deploy, capacity, auth, other) for SRE incident postmortems. Non-generative baseline for the [SentinelOps](https://github.com/ayushgupta07xx/sentinelops) flagship project, benchmarked against a fine-tuned Mistral-7B generation model.
21
+
22
+ ## Intended use
23
+
24
+ Given an incident summary, returns `{severity, category}` predictions for routing, prioritization, or retrieval filtering inside the SentinelOps agent.
25
+
26
+ **Not intended for**: standalone production incident triage, compliance decisions, or any use case where a miscategorization has safety or financial impact.
27
+
28
+ ## Training data
29
+
30
+ - **Corpus**: 2,000+ public postmortems scraped from `danluu/post-mortems`, Cloudflare blog, GitHub status (Atom feed), and AWS post-event summaries.
31
+ - **Labeled subset**: ~270 manually labeled for severity (4 classes) and category (6 classes). Remaining examples weakly labeled via regex + keyword rules.
32
+ - **Preprocessing**: HTML→text; MinHash dedup (threshold 0.85, 5-word shingles, 128 permutations); PII scrub (regex for emails/IPs/tokens + Presidio for names/phones).
33
+
34
+ Dataset: https://huggingface.co/datasets/ayushgupta07xx/sentinelops-corpus
35
+
36
+ ## Training
37
+
38
+ - **Base**: `distilbert-base-uncased`
39
+ - **Architecture**: Shared DistilBERT encoder with two classification heads (severity + category), joint loss.
40
+ - **Hardware**: Kaggle T4 GPU (PyTorch).
41
+ - **Optimizer**: AdamW with cosine LR schedule.
42
+ - **Tracking**: Weights & Biases — project `sentinelops`, job_type `classifier_pt`.
43
+
44
+ Training code: https://github.com/ayushgupta07xx/sentinelops/blob/main/training/classifier/train_pt.py
45
+
46
+ ## Evaluation
47
+
48
+ ### ⚠️ Test set is 27 examples
49
+
50
+ This is a **small held-out set** — per-class F1 numbers have high variance and a single misclassification moves a 9-example class F1 by ~0.1. Treat these numbers as directional indicators, not population estimates. The limit reflects the labeled-data budget (~270 manually labeled examples, standard 80/10/10 split) which is the realistic ceiling for a solo 5-week project.
51
+
52
+ ### Headline metrics
53
+
54
+ | Head | Accuracy | Macro-F1 | Weighted-F1 |
55
+ |----------|----------|----------|-------------|
56
+ | Severity | 0.48 | 0.36 | 0.52 |
57
+ | Category | 0.30 | 0.36 | 0.24 |
58
+
59
+ Full per-class precision/recall/F1 is in `eval_report.json`. Confusion matrices: `assets/confusion_matrix_severity.png`, `assets/confusion_matrix_category.png`.
60
+
61
+ ## Limitations
62
+
63
+ 1. **Tiny test set (27)** — confidence intervals on per-class F1 are wide. Do not use these numbers to claim SOTA anything.
64
+ 2. **Class imbalance** — some classes (e.g., `auth`) had fewer than 10 training examples; the model underperforms there.
65
+ 3. **Weak labels** — the majority of training data uses regex/keyword rules, so the model partially learns those rules rather than semantic features. Performance on incidents whose vocabulary doesn't match the rules is likely worse.
66
+ 4. **Domain drift** — trained on public postmortems from hyperscalers and open-source infra. Generalization to internal enterprise incidents without further fine-tuning is unverified.
67
+ 5. **English only** — all training data is English.
68
+
69
+ ## Bias considerations
70
+
71
+ - Overrepresentation of hyperscaler incidents (AWS, GitHub, Cloudflare) vs. small-org or on-prem incidents.
72
+ - "Severity" labels reflect the reporting org's conventions, which differ across companies — the model learns an average that may not match your org's definitions.
73
+ - Weak labels encode my prior about what keywords indicate each category, which may bias category boundaries.
74
+
75
+ ## Files
76
+
77
+ - `model.pt` — PyTorch state dict (~253 MB)
78
+ - `tokenizer/` — HF tokenizer files
79
+ - `config.json` — model hyperparameters
80
+ - `label_mappings.json` — `severity` and `category` label lists
81
+ - `eval_report.json` — full classification reports
82
+ - `assets/confusion_matrix_{severity,category}.png` — confusion matrices
83
+
84
+ ## Loading
85
+
86
+ ```python
87
+ import json, torch
88
+ from transformers import DistilBertTokenizerFast
89
+ from huggingface_hub import snapshot_download
90
+
91
+ local = snapshot_download("ayushgupta07xx/sentinelops-classifier")
92
+
93
+ # Requires the DistilBertMultiTask class from:
94
+ # https://github.com/ayushgupta07xx/sentinelops/blob/main/training/classifier/model_pt.py
95
+ from model_pt import DistilBertMultiTask, Config
96
+
97
+ with open(f"{local}/label_mappings.json") as f:
98
+ lm = json.load(f)
99
+ cfg = Config(num_severity=len(lm["severity"]), num_category=len(lm["category"]))
100
+ model = DistilBertMultiTask(cfg)
101
+ model.load_state_dict(torch.load(f"{local}/model.pt", map_location="cpu"))
102
+ model.eval()
103
+
104
+ tokenizer = DistilBertTokenizerFast.from_pretrained(f"{local}/tokenizer")
105
+ ```
106
+
107
+ ## Citation
108
+
109
+ Built as part of [SentinelOps](https://github.com/ayushgupta07xx/sentinelops). No paper — this is engineering, not research.
110
+
111
+ ## Changelog
112
+
113
+ - **2026-04 v0.1** — Initial release; 27-example test set.