dongkoony commited on
Commit
e5e8b0d
ยท
verified ยท
1 Parent(s): 3c77d42

release: v0.3.0 model upload

Browse files
Files changed (1) hide show
  1. README.md +150 -48
README.md CHANGED
@@ -1,72 +1,174 @@
1
- # Model Card: DevOps-Incident-Triage-Model
2
-
3
- ## Model Details
4
-
5
- - Model name: `DevOps-Incident-Triage-Model`
6
- - Base checkpoint: `distilbert-base-uncased` (baseline)
7
- - Task: DevOps incident text classification (multiclass)
8
- - Labels:
9
- - `k8s_cluster`
10
- - `cicd_pipeline`
11
- - `aws_iam_network`
12
- - `deployment_release`
13
- - `container_runtime`
14
- - `observability_alerting`
15
- - `database_state`
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
 
17
  ## Intended Use
18
 
19
- ์ด ๋ชจ๋ธ์€ DevOps/Platform/SRE ํ™˜๊ฒฝ์—์„œ ์ธ์‹œ๋˜ํŠธ ํ…์ŠคํŠธ๋ฅผ ๋น ๋ฅด๊ฒŒ ๋ถ„๋ฅ˜ํ•˜์—ฌ
20
- ์ดˆ๋™ ๋Œ€์‘ ๋ผ์šฐํŒ…์„ ๋ณด์กฐํ•˜๊ธฐ ์œ„ํ•œ ์šฉ๋„์ž…๋‹ˆ๋‹ค.
21
 
22
- - ๊ถŒ์žฅ: ์˜จ์ฝœ triage ๋ณด์กฐ, ํ‹ฐ์ผ“ ์ž๋™ ํƒœ๊น… ๋ณด์กฐ
23
- - ๋น„๊ถŒ์žฅ: ์™„์ „ ์ž๋™ ์˜์‚ฌ๊ฒฐ์ •, ์ธ์  ๊ฒ€ํ†  ์—†๋Š” ์กฐ์น˜ ์‹คํ–‰
24
- - ์šด์˜ ๊ถŒ์žฅ: confidence threshold gating์„ ํ†ตํ•ด ์ €์‹ ๋ขฐ ์˜ˆ์ธก์€ `needs_human_review`๋กœ ๋ณด๋ƒ„
25
- - ์šด์˜ ๊ถŒ์žฅ: FastAPI `/predict/batch` ์‚ฌ์šฉ ์‹œ ๋ฐฐ์น˜ ์ƒํ•œ(`BATCH_MAX_ITEMS`)์„ ๋‘ฌ API ์•ˆ์ •์„ฑ ํ™•๋ณด
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
 
27
  ## Training Data
28
 
29
- ํ˜„์žฌ ๋ฒ„์ „์€ `data/sample/incidents_synthetic.csv` ๊ธฐ๋ฐ˜์˜ synthetic starter ๋ฐ์ดํ„ฐ๋กœ ํ•™์Šต๋ฉ๋‹ˆ๋‹ค.
 
 
 
 
30
 
31
- - ์ด ๋ฐ์ดํ„ฐ๋Š” ์‹ค์„œ๋น„์Šค์—์„œ ์ง์ ‘ ์ˆ˜์ง‘๋œ ๋กœ๊ทธ/ํ‹ฐ์ผ“์ด ์•„๋‹™๋‹ˆ๋‹ค.
32
- - ์‹ค์ œ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์€ ์šด์˜ ๋ฐ์ดํ„ฐ๋กœ ์žฌํ•™์Šต/์žฌํ‰๊ฐ€ํ•ด์•ผ ๊ฒ€์ฆ๋ฉ๋‹ˆ๋‹ค.
33
 
34
  ## Training Procedure
35
 
36
- - Data split: train/validation/test
37
- - Input max length: 256
38
- - Metrics: accuracy, macro F1, per-label precision/recall/F1
39
- - Optional: PEFT LoRA (`--use-peft`)
 
 
 
 
 
 
 
 
40
 
41
- ## Evaluation
42
 
43
- ํ‰๊ฐ€ ์Šคํฌ๋ฆฝํŠธ:
 
44
 
45
- ```bash
46
- uv run ditri-eval --model-path models/devops-incident-triage --data-dir data/processed --report-dir reports
 
 
 
 
 
 
 
 
47
  ```
48
 
49
- ์‚ฐ์ถœ๋ฌผ:
50
- - `reports/evaluation_metrics.json`
51
- - `reports/per_label_metrics.json`
52
- - `reports/confusion_matrix.csv`
53
- - `reports/sample_predictions.jsonl`
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
54
 
55
  ## Limitations
56
 
57
- - Synthetic ๋ฐ์ดํ„ฐ ์ค‘์‹ฌ์ด๋ผ ๋„๋ฉ”์ธ ํŽธํ–ฅ/ํ‘œํ˜„ ๋‹ค์–‘์„ฑ์ด ์ œํ•œ์ ์ž…๋‹ˆ๋‹ค.
58
- - ๋ผ๋ฒจ ์ •์˜๊ฐ€ ๋‹จ์ผ ์ฃผ ๋ผ๋ฒจ(multiclass)์ด๋ผ ๋ณตํ•ฉ ์›์ธ ์ธ์‹œ๋˜ํŠธ ๋ฐ˜์˜์ด ์•ฝํ•ฉ๋‹ˆ๋‹ค.
59
- - ์žฅ๋ฌธ ๋กœ๊ทธ/๋Œ€๋Ÿ‰ ์ปจํ…์ŠคํŠธ(๋ฉ€ํ‹ฐ๋ผ์ธ ์ŠคํƒํŠธ๋ ˆ์ด์Šค) ์ฒ˜๋ฆฌ ์„ฑ๋Šฅ์€ ์ถ”๊ฐ€ ๊ฒ€์ฆ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
 
60
 
61
  ## Ethical and Operational Considerations
62
 
63
- - ๋ชจ๋ธ ์˜ˆ์ธก์€ ์šฐ์„ ์ˆœ์œ„ ํŒ๋‹จ ๋ณด์กฐ์ด๋ฉฐ, ์ตœ์ข… ํŒ๋‹จ์€ ์šด์˜์ž์—๊ฒŒ ์žˆ์Šต๋‹ˆ๋‹ค.
64
- - ์˜ค๋ถ„๋ฅ˜ ์‹œ ์ž˜๋ชป๋œ ๋ผ์šฐํŒ…๊ณผ ๋Œ€์‘ ์ง€์—ฐ์ด ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ์–ด human-in-the-loop๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
65
- - ์šด์˜ ๋กœ๊ทธ์— ๋ฏผ๊ฐ์ •๋ณด๊ฐ€ ํฌํ•จ๋  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ๋ฐ์ดํ„ฐ ๋น„์‹๋ณ„ํ™”๊ฐ€ ์„ ํ–‰๋˜์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
 
 
 
 
 
 
 
 
66
 
67
- ## Recommended Next Improvements
68
 
69
- 1. ์‹ค์ œ ๋น„์‹๋ณ„ํ™” ์ธ์‹œ๋˜ํŠธ ๋ฐ์ดํ„ฐ์…‹ ๊ตฌ์ถ•
70
- 2. ๋ฉ€ํ‹ฐ๋ผ๋ฒจ/๊ณ„์ธตํ˜• ๋ถ„๋ฅ˜ ์‹คํ—˜
71
- 3. ๋ผ๋ฒจ๋ง ๊ฐ€์ด๋“œ ๋ฐ ํ’ˆ์งˆ ์ง€ํ‘œ(IAA) ๋„์ž…
72
- 4. ์˜คํ”„๋ผ์ธ + ์˜จ๋ผ์ธ ๋ชจ๋‹ˆํ„ฐ๋ง(๋ฐ์ดํ„ฐ/๋ชจ๋ธ ๋“œ๋ฆฌํ”„ํŠธ) ์—ฐ๊ณ„
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ library_name: transformers
6
+ pipeline_tag: text-classification
7
+ tags:
8
+ - devops
9
+ - sre
10
+ - incident-triage
11
+ - text-classification
12
+ - mlops
13
+ - fastapi
14
+ - transformers
15
+ - python
16
+ base_model: distilbert-base-uncased
17
+ ---
18
+
19
+ # devops-incident-triage
20
+
21
+ `devops-incident-triage` is a multiclass text classification model for routing DevOps incident summaries and error messages to the most likely operational domain.
22
+
23
+ In short: give it an incident sentence such as a deployment failure, Kubernetes cluster issue, IAM/network error, or database state problem, and it predicts which team/domain should review it first.
24
+
25
+ ## Model Summary
26
+
27
+ - Task: DevOps incident text classification
28
+ - Problem type: multiclass classification
29
+ - Base model: `distilbert-base-uncased`
30
+ - Project release: `v0.3.0`
31
+ - Intended role: first-pass triage support, not autonomous decision-making
32
+
33
+ ## Labels
34
+
35
+ | Label | Meaning |
36
+ |---|---|
37
+ | `k8s_cluster` | Kubernetes scheduling, node, or cluster-state issues |
38
+ | `cicd_pipeline` | CI/CD build, test, or deployment pipeline failures |
39
+ | `aws_iam_network` | AWS IAM, VPC, network, or permission-related issues |
40
+ | `deployment_release` | Helm, rollout, release, or deployment operation issues |
41
+ | `container_runtime` | Docker, containerd, image, or container runtime issues |
42
+ | `observability_alerting` | Monitoring, logging, tracing, or alerting issues |
43
+ | `database_state` | Database connectivity, replication, lock, or storage-state issues |
44
 
45
  ## Intended Use
46
 
47
+ This model is designed for:
 
48
 
49
+ - incident triage assistance in DevOps, Platform, and SRE workflows
50
+ - ticket auto-tagging support
51
+ - queue recommendation support before a human reviews the issue
52
+
53
+ This model is not designed for:
54
+
55
+ - fully autonomous production actions
56
+ - incident severity decisions without human review
57
+ - root-cause analysis by itself
58
+
59
+ ## Important Scope Note
60
+
61
+ The published model performs classification only.
62
+
63
+ Operational behaviors such as:
64
+
65
+ - confidence threshold gating
66
+ - `needs_human_review` fallback
67
+ - synchronous batch inference
68
+ - asynchronous batch jobs
69
+ - API observability and metrics
70
+
71
+ are implemented in the service layer of the project, not inside the model weights themselves.
72
+
73
+ Project repository:
74
+
75
+ - GitHub: `dongkoony/DevOps-Incident-Triage-Model`
76
 
77
  ## Training Data
78
 
79
+ This version was trained on a synthetic starter dataset derived from DevOps-style incident examples.
80
+
81
+ - Source file in project: `data/sample/incidents_synthetic.csv`
82
+ - The dataset is not collected from a real production environment.
83
+ - The reported behavior should be interpreted as portfolio and pipeline evidence, not as validated real-world generalization.
84
 
85
+ If this model is to be used beyond demonstration or experimentation, it should be retrained and reevaluated on anonymized real incident data.
 
86
 
87
  ## Training Procedure
88
 
89
+ - Data split: train / validation / test
90
+ - Max input length: 256
91
+ - Baseline checkpoint: `distilbert-base-uncased`
92
+ - Evaluation metrics: accuracy, macro F1, weighted F1, per-label precision/recall/F1
93
+
94
+ The project also includes a benchmark workflow to compare multiple backbones under the same setup:
95
+
96
+ - `distilbert-base-uncased`
97
+ - `sentence-transformers/all-MiniLM-L6-v2`
98
+ - `xlm-roberta-base`
99
+
100
+ ## How To Use
101
 
102
+ ### Transformers pipeline
103
 
104
+ ```python
105
+ from transformers import pipeline
106
 
107
+ classifier = pipeline(
108
+ "text-classification",
109
+ model="dongkoony/devops-incident-triage",
110
+ tokenizer="dongkoony/devops-incident-triage",
111
+ )
112
+
113
+ result = classifier(
114
+ "GitHub Actions deployment failed because IAM role assumption was denied."
115
+ )
116
+ print(result)
117
  ```
118
 
119
+ ### With `AutoTokenizer` and `AutoModelForSequenceClassification`
120
+
121
+ ```python
122
+ import torch
123
+ from transformers import AutoModelForSequenceClassification, AutoTokenizer
124
+
125
+ model_id = "dongkoony/devops-incident-triage"
126
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
127
+ model = AutoModelForSequenceClassification.from_pretrained(model_id)
128
+
129
+ text = "EKS worker nodes became NotReady after CNI upgrade."
130
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
131
+
132
+ with torch.no_grad():
133
+ logits = model(**inputs).logits
134
+
135
+ predicted_id = int(logits.argmax(dim=-1))
136
+ print(model.config.id2label[predicted_id])
137
+ ```
138
+
139
+ ## Evaluation Artifacts
140
+
141
+ The project evaluation pipeline produces:
142
+
143
+ - `evaluation_metrics.json`
144
+ - `per_label_metrics.json`
145
+ - `threshold_metrics.json`
146
+ - `confusion_matrix.csv`
147
+ - `sample_predictions.jsonl`
148
+
149
+ These artifacts are generated in the project repository and are intended to make the evaluation process reproducible and inspectable.
150
 
151
  ## Limitations
152
 
153
+ - trained on synthetic incident text rather than real anonymized production tickets/logs
154
+ - single-label formulation, while real incidents may have multiple contributing domains
155
+ - long, noisy, or multi-line logs may require additional preprocessing
156
+ - classification confidence should not be treated as an operational decision guarantee
157
 
158
  ## Ethical and Operational Considerations
159
 
160
+ - keep a human in the loop for low-confidence or high-impact decisions
161
+ - do not use the model as the sole authority for remediation actions
162
+ - ensure sensitive log data is anonymized before retraining or evaluation
163
+ - review failure cases regularly to avoid silently reinforcing routing bias
164
+
165
+ ## Recommended Next Steps
166
+
167
+ 1. Retrain on anonymized real incident data.
168
+ 2. Add multilabel classification experiments.
169
+ 3. Improve labeling guidelines and label quality review.
170
+ 4. Connect offline evaluation with online drift monitoring.
171
 
172
+ ## Citation
173
 
174
+ If you reference the project, please cite the GitHub repository and the released model version together so the implementation context and operational assumptions remain clear.