Arvind2006 commited on
Commit
3068b6b
·
verified ·
1 Parent(s): d7d9c59

Upload 13 files

Browse files
Files changed (8) hide show
  1. README.md +127 -10
  2. extract_error_features.py +24 -0
  3. image.png +0 -0
  4. ingest_docs.py +54 -0
  5. main.py +13 -0
  6. preload_model.py +9 -0
  7. requirements.txt +10 -0
  8. retrieve_docs.py +53 -0
README.md CHANGED
@@ -1,10 +1,127 @@
1
- ---
2
- title: Jenkins Error Explainer
3
- emoji: 📉
4
- colorFrom: indigo
5
- colorTo: green
6
- sdk: docker
7
- pinned: false
8
- ---
9
-
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Jenkins Error Explainer
2
+
3
+ A documentation-grounded system that explains Jenkins pipeline errors using official Jenkins documentation.
4
+
5
+ This project analyzes raw Jenkins build logs, extracts structured error signals, retrieves relevant documentation sections, and generates clear, human-readable explanations without relying on supervised training data.
6
+
7
+ ---
8
+
9
+ ## Motivation
10
+
11
+ Jenkins error logs are often verbose, difficult to interpret, and highly contextual.
12
+ There is no standardized dataset of Jenkins errors or canonical explanations.
13
+
14
+ This project addresses that gap by:
15
+ - Using heuristic-based error feature extraction
16
+ - Grounding explanations strictly in official Jenkins documentation
17
+ - Avoiding hallucinated or unsafe advice
18
+
19
+ ---
20
+
21
+ ## What This Project Does
22
+
23
+ Given a Jenkins pipeline error log, the system:
24
+ 1. Extracts key error signals (syntax errors, missing agents, missing plugins, etc.)
25
+ 2. Retrieves relevant sections from Jenkins documentation
26
+ 3. Generates a structured explanation including:
27
+ - Error summary
28
+ - Likely causes
29
+ - Links to official documentation
30
+
31
+ ---
32
+
33
+ ## What This Project Does NOT Do
34
+
35
+ - Does not train or fine-tune a language model
36
+ - Does not rely on labeled error datasets
37
+ - Does not scrape community forums or StackOverflow
38
+
39
+ ---
40
+
41
+ ## Project Structure
42
+
43
+ ![alt text](image.png)
44
+
45
+ ## Error Categories Covered
46
+
47
+ - Pipeline syntax errors (invalid Groovy, missing braces)
48
+ - Missing agent or unavailable nodes
49
+ - Missing plugins / undefined DSL methods
50
+ - Missing credentials
51
+ - Workspace and file system errors
52
+ - Git / SCM authentication failures during checkout
53
+
54
+ ---
55
+
56
+ ## Model Usage
57
+
58
+ This project does not train or fine-tune any machine learning models.
59
+
60
+ Pre-trained sentence embedding models are used solely for semantic retrieval
61
+ over Jenkins documentation. No Jenkins error logs are used for training.
62
+
63
+ This design avoids the need for labeled datasets and ensures predictable,
64
+ reproducible behavior.
65
+
66
+ ---
67
+ ## Example
68
+
69
+ **Input:**
70
+ Raw Jenkins console output containing a groovy syntax error.
71
+
72
+ **Output:**
73
+ Error Category:
74
+ groovy_syntax_error
75
+
76
+ Error Summary:
77
+ The pipeline failed due to a Groovy syntax error, most likely caused by an invalid or incomplete Jenkinsfile.
78
+
79
+ Likely Causes:
80
+ - Missing or mismatched braces in the Jenkinsfile
81
+ - Invalid declarative pipeline structure
82
+
83
+ Relevant Documentation:
84
+ - pipeline_syntax.txt (https://www.jenkins.io/doc/)
85
+ - using_a_jenkinsfile.txt (https://www.jenkins.io/doc/)
86
+ - using_a_jenkinsfile.txt (https://www.jenkins.io/doc/)
87
+ - using_a_jenkinsfile.txt (https://www.jenkins.io/doc/)
88
+ - using_a_jenkinsfile.txt (https://www.jenkins.io/doc/)
89
+
90
+ ## Design Principles
91
+
92
+ - Documentation-first retrieval (RAG)
93
+ - Heuristic-driven error understanding
94
+ - Explicit handling of uncertainty
95
+ - Reproducible and explainable behavior
96
+
97
+ ---
98
+
99
+ ## Future Extensions
100
+
101
+ - Jenkinsfile explanation support
102
+ - Plugin-aware error analysis
103
+ - CLI or web-based interface
104
+ - Version-aware documentation indexing
105
+
106
+ ---
107
+
108
+ ## Guarantees and Limitations
109
+
110
+ This tool does NOT act as a general-purpose AI assistant.
111
+
112
+ Guarantees:
113
+ - Explanations are grounded in official Jenkins documentation only.
114
+ - Error classification is deterministic and rule-based.
115
+ - If relevant documentation is not found, the tool explicitly reports uncertainty.
116
+
117
+ Limitations:
118
+ - The tool does not attempt to fix errors automatically.
119
+ - It does not infer causes beyond documented Jenkins behavior.
120
+ - Unknown or plugin-specific errors may be reported as unsupported.
121
+
122
+ ---
123
+
124
+ ## Disclaimer
125
+
126
+ This project is an independent personal project and is not affiliated with or endorsed by the Jenkins project.
127
+
extract_error_features.py ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import re
2
+ from error_taxonomy import ERROR_CATEGORIES
3
+
4
+ COMPILED_PATTERNS = {
5
+ cat: [re.compile(p) for p in pats]
6
+ for cat, pats in ERROR_CATEGORIES.items()
7
+ }
8
+
9
+ def extract_error_features(log_text: str) -> dict:
10
+ result = {
11
+ "category": "unknown",
12
+ "matched_signals": [],
13
+ "line_numbers": []
14
+ }
15
+
16
+ for category, patterns in COMPILED_PATTERNS.items():
17
+ for pattern in patterns:
18
+ if pattern.search(log_text):
19
+ result["category"] = category
20
+ result["matched_signals"].append(pattern.pattern)
21
+
22
+ result["line_numbers"] = re.findall(r"line (\d+)", log_text)
23
+
24
+ return result
image.png ADDED
ingest_docs.py ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ingest_docs.py
2
+
3
+ import os
4
+ import json
5
+ import faiss
6
+ from sentence_transformers import SentenceTransformer
7
+
8
+ RAW_DOCS_DIR = "data/docs/raw"
9
+ INDEX_PATH = "data/docs/docs.index"
10
+ META_PATH = "data/docs/docs_meta.json"
11
+
12
+ CHUNK_SIZE = 400
13
+
14
+ def chunk_text(text, size):
15
+ chunks = []
16
+ for i in range(0, len(text), size):
17
+ chunk = text[i:i+size].strip()
18
+ if chunk:
19
+ chunks.append(chunk)
20
+ return chunks
21
+
22
+ def main():
23
+ model = SentenceTransformer("paraphrase-MiniLM-L3-v2", cache_folder="./model_cache")
24
+
25
+ documents = []
26
+ metadata = []
27
+
28
+ for fname in os.listdir(RAW_DOCS_DIR):
29
+ path = os.path.join(RAW_DOCS_DIR, fname)
30
+ with open(path, "r", encoding="utf-8") as f:
31
+ text = f.read()
32
+
33
+ chunks = chunk_text(text, CHUNK_SIZE)
34
+ for chunk in chunks:
35
+ documents.append(chunk)
36
+ metadata.append({
37
+ "source_file": fname,
38
+ "source": f"https://www.jenkins.io/doc/"
39
+ })
40
+
41
+ embeddings = model.encode(documents)
42
+ index = faiss.IndexFlatL2(embeddings.shape[1])
43
+ index.add(embeddings)
44
+
45
+ os.makedirs("data/docs", exist_ok=True)
46
+ faiss.write_index(index, INDEX_PATH)
47
+
48
+ with open(META_PATH, "w", encoding="utf-8") as f:
49
+ json.dump(metadata, f, indent=2)
50
+
51
+ print(f"Ingested {len(documents)} document chunks.")
52
+
53
+ if __name__ == "__main__":
54
+ main()
main.py ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from fastapi import FastAPI
2
+ from explain_error import explain_error
3
+
4
+ app = FastAPI()
5
+
6
+ @app.get("/")
7
+ def home():
8
+ return {"message": "Jenkins Error Explainer API running"}
9
+
10
+ @app.post("/explain")
11
+ def explain(payload: dict):
12
+ log = payload["log_text"]
13
+ return explain_error(log)
preload_model.py ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ from sentence_transformers import SentenceTransformer
2
+
3
+ print("Downloading model...")
4
+ MODEL = SentenceTransformer(
5
+ "paraphrase-MiniLM-L3-v2",
6
+ cache_folder="./model_cache"
7
+ )
8
+
9
+ print("Model cached successfully!")
requirements.txt ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ fastapi
2
+ uvicorn
3
+ sentence-transformers==2.2.2
4
+ huggingface_hub==0.19.4
5
+ faiss-cpu
6
+ numpy
7
+ pydantic
8
+ torch
9
+ transformers==4.35.2
10
+ tokenizers==0.15.2
retrieve_docs.py ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # retrieve_docs.py
2
+
3
+ import json
4
+ import faiss
5
+ import numpy as np
6
+ from sentence_transformers import SentenceTransformer
7
+
8
+ INDEX_PATH = "data/docs/docs.index"
9
+ META_PATH = "data/docs/docs_meta.json"
10
+ TOP_K = 5
11
+
12
+ QUERY_TEMPLATES = {
13
+ "groovy_syntax_error": "Jenkins pipeline Groovy syntax error missing brace",
14
+ "missing_agent": "Jenkins pipeline agent none requires node context",
15
+ "no_node_available": "Jenkins no nodes with label scheduling executor",
16
+ "missing_plugin": "Jenkins No such DSL method pipeline step",
17
+ "missing_credentials": "Jenkins credentials not found pipeline",
18
+ "file_not_found": "Jenkins pipeline workspace file not found",
19
+ "git_authentication_error": "Jenkins git authentication failed checkout"
20
+ }
21
+ model = SentenceTransformer(
22
+ "paraphrase-MiniLM-L3-v2",
23
+ cache_folder="./model_cache"
24
+ )
25
+
26
+ def retrieve_docs(error_category: str):
27
+
28
+ index = faiss.read_index(INDEX_PATH)
29
+ with open(META_PATH, "r", encoding="utf-8") as f:
30
+ metadata = json.load(f)
31
+
32
+ query = QUERY_TEMPLATES.get(
33
+ error_category,
34
+ "Jenkins pipeline error"
35
+ )
36
+
37
+ query_embedding = model.encode([query])
38
+ distances, indices = index.search(query_embedding, TOP_K)
39
+
40
+ results = []
41
+ for idx in indices[0]:
42
+ results.append({
43
+ "text": None,
44
+ "meta": metadata[idx]
45
+ })
46
+
47
+ return results
48
+
49
+
50
+ if __name__ == "__main__":
51
+ results = retrieve_docs("Started by user Arvind Nandigam org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed: WorkflowScript: 10: expecting '}', found '' @ line 10, column 1.1 error")
52
+ for r in results:
53
+ print(r)