Upload 13 files
Browse files- README.md +127 -10
- extract_error_features.py +24 -0
- image.png +0 -0
- ingest_docs.py +54 -0
- main.py +13 -0
- preload_model.py +9 -0
- requirements.txt +10 -0
- retrieve_docs.py +53 -0
README.md
CHANGED
|
@@ -1,10 +1,127 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Jenkins Error Explainer
|
| 2 |
+
|
| 3 |
+
A documentation-grounded system that explains Jenkins pipeline errors using official Jenkins documentation.
|
| 4 |
+
|
| 5 |
+
This project analyzes raw Jenkins build logs, extracts structured error signals, retrieves relevant documentation sections, and generates clear, human-readable explanations without relying on supervised training data.
|
| 6 |
+
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
## Motivation
|
| 10 |
+
|
| 11 |
+
Jenkins error logs are often verbose, difficult to interpret, and highly contextual.
|
| 12 |
+
There is no standardized dataset of Jenkins errors or canonical explanations.
|
| 13 |
+
|
| 14 |
+
This project addresses that gap by:
|
| 15 |
+
- Using heuristic-based error feature extraction
|
| 16 |
+
- Grounding explanations strictly in official Jenkins documentation
|
| 17 |
+
- Avoiding hallucinated or unsafe advice
|
| 18 |
+
|
| 19 |
+
---
|
| 20 |
+
|
| 21 |
+
## What This Project Does
|
| 22 |
+
|
| 23 |
+
Given a Jenkins pipeline error log, the system:
|
| 24 |
+
1. Extracts key error signals (syntax errors, missing agents, missing plugins, etc.)
|
| 25 |
+
2. Retrieves relevant sections from Jenkins documentation
|
| 26 |
+
3. Generates a structured explanation including:
|
| 27 |
+
- Error summary
|
| 28 |
+
- Likely causes
|
| 29 |
+
- Links to official documentation
|
| 30 |
+
|
| 31 |
+
---
|
| 32 |
+
|
| 33 |
+
## What This Project Does NOT Do
|
| 34 |
+
|
| 35 |
+
- Does not train or fine-tune a language model
|
| 36 |
+
- Does not rely on labeled error datasets
|
| 37 |
+
- Does not scrape community forums or StackOverflow
|
| 38 |
+
|
| 39 |
+
---
|
| 40 |
+
|
| 41 |
+
## Project Structure
|
| 42 |
+
|
| 43 |
+

|
| 44 |
+
|
| 45 |
+
## Error Categories Covered
|
| 46 |
+
|
| 47 |
+
- Pipeline syntax errors (invalid Groovy, missing braces)
|
| 48 |
+
- Missing agent or unavailable nodes
|
| 49 |
+
- Missing plugins / undefined DSL methods
|
| 50 |
+
- Missing credentials
|
| 51 |
+
- Workspace and file system errors
|
| 52 |
+
- Git / SCM authentication failures during checkout
|
| 53 |
+
|
| 54 |
+
---
|
| 55 |
+
|
| 56 |
+
## Model Usage
|
| 57 |
+
|
| 58 |
+
This project does not train or fine-tune any machine learning models.
|
| 59 |
+
|
| 60 |
+
Pre-trained sentence embedding models are used solely for semantic retrieval
|
| 61 |
+
over Jenkins documentation. No Jenkins error logs are used for training.
|
| 62 |
+
|
| 63 |
+
This design avoids the need for labeled datasets and ensures predictable,
|
| 64 |
+
reproducible behavior.
|
| 65 |
+
|
| 66 |
+
---
|
| 67 |
+
## Example
|
| 68 |
+
|
| 69 |
+
**Input:**
|
| 70 |
+
Raw Jenkins console output containing a groovy syntax error.
|
| 71 |
+
|
| 72 |
+
**Output:**
|
| 73 |
+
Error Category:
|
| 74 |
+
groovy_syntax_error
|
| 75 |
+
|
| 76 |
+
Error Summary:
|
| 77 |
+
The pipeline failed due to a Groovy syntax error, most likely caused by an invalid or incomplete Jenkinsfile.
|
| 78 |
+
|
| 79 |
+
Likely Causes:
|
| 80 |
+
- Missing or mismatched braces in the Jenkinsfile
|
| 81 |
+
- Invalid declarative pipeline structure
|
| 82 |
+
|
| 83 |
+
Relevant Documentation:
|
| 84 |
+
- pipeline_syntax.txt (https://www.jenkins.io/doc/)
|
| 85 |
+
- using_a_jenkinsfile.txt (https://www.jenkins.io/doc/)
|
| 86 |
+
- using_a_jenkinsfile.txt (https://www.jenkins.io/doc/)
|
| 87 |
+
- using_a_jenkinsfile.txt (https://www.jenkins.io/doc/)
|
| 88 |
+
- using_a_jenkinsfile.txt (https://www.jenkins.io/doc/)
|
| 89 |
+
|
| 90 |
+
## Design Principles
|
| 91 |
+
|
| 92 |
+
- Documentation-first retrieval (RAG)
|
| 93 |
+
- Heuristic-driven error understanding
|
| 94 |
+
- Explicit handling of uncertainty
|
| 95 |
+
- Reproducible and explainable behavior
|
| 96 |
+
|
| 97 |
+
---
|
| 98 |
+
|
| 99 |
+
## Future Extensions
|
| 100 |
+
|
| 101 |
+
- Jenkinsfile explanation support
|
| 102 |
+
- Plugin-aware error analysis
|
| 103 |
+
- CLI or web-based interface
|
| 104 |
+
- Version-aware documentation indexing
|
| 105 |
+
|
| 106 |
+
---
|
| 107 |
+
|
| 108 |
+
## Guarantees and Limitations
|
| 109 |
+
|
| 110 |
+
This tool does NOT act as a general-purpose AI assistant.
|
| 111 |
+
|
| 112 |
+
Guarantees:
|
| 113 |
+
- Explanations are grounded in official Jenkins documentation only.
|
| 114 |
+
- Error classification is deterministic and rule-based.
|
| 115 |
+
- If relevant documentation is not found, the tool explicitly reports uncertainty.
|
| 116 |
+
|
| 117 |
+
Limitations:
|
| 118 |
+
- The tool does not attempt to fix errors automatically.
|
| 119 |
+
- It does not infer causes beyond documented Jenkins behavior.
|
| 120 |
+
- Unknown or plugin-specific errors may be reported as unsupported.
|
| 121 |
+
|
| 122 |
+
---
|
| 123 |
+
|
| 124 |
+
## Disclaimer
|
| 125 |
+
|
| 126 |
+
This project is an independent personal project and is not affiliated with or endorsed by the Jenkins project.
|
| 127 |
+
|
extract_error_features.py
ADDED
|
@@ -0,0 +1,24 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import re
|
| 2 |
+
from error_taxonomy import ERROR_CATEGORIES
|
| 3 |
+
|
| 4 |
+
COMPILED_PATTERNS = {
|
| 5 |
+
cat: [re.compile(p) for p in pats]
|
| 6 |
+
for cat, pats in ERROR_CATEGORIES.items()
|
| 7 |
+
}
|
| 8 |
+
|
| 9 |
+
def extract_error_features(log_text: str) -> dict:
|
| 10 |
+
result = {
|
| 11 |
+
"category": "unknown",
|
| 12 |
+
"matched_signals": [],
|
| 13 |
+
"line_numbers": []
|
| 14 |
+
}
|
| 15 |
+
|
| 16 |
+
for category, patterns in COMPILED_PATTERNS.items():
|
| 17 |
+
for pattern in patterns:
|
| 18 |
+
if pattern.search(log_text):
|
| 19 |
+
result["category"] = category
|
| 20 |
+
result["matched_signals"].append(pattern.pattern)
|
| 21 |
+
|
| 22 |
+
result["line_numbers"] = re.findall(r"line (\d+)", log_text)
|
| 23 |
+
|
| 24 |
+
return result
|
image.png
ADDED
|
ingest_docs.py
ADDED
|
@@ -0,0 +1,54 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# ingest_docs.py
|
| 2 |
+
|
| 3 |
+
import os
|
| 4 |
+
import json
|
| 5 |
+
import faiss
|
| 6 |
+
from sentence_transformers import SentenceTransformer
|
| 7 |
+
|
| 8 |
+
RAW_DOCS_DIR = "data/docs/raw"
|
| 9 |
+
INDEX_PATH = "data/docs/docs.index"
|
| 10 |
+
META_PATH = "data/docs/docs_meta.json"
|
| 11 |
+
|
| 12 |
+
CHUNK_SIZE = 400
|
| 13 |
+
|
| 14 |
+
def chunk_text(text, size):
|
| 15 |
+
chunks = []
|
| 16 |
+
for i in range(0, len(text), size):
|
| 17 |
+
chunk = text[i:i+size].strip()
|
| 18 |
+
if chunk:
|
| 19 |
+
chunks.append(chunk)
|
| 20 |
+
return chunks
|
| 21 |
+
|
| 22 |
+
def main():
|
| 23 |
+
model = SentenceTransformer("paraphrase-MiniLM-L3-v2", cache_folder="./model_cache")
|
| 24 |
+
|
| 25 |
+
documents = []
|
| 26 |
+
metadata = []
|
| 27 |
+
|
| 28 |
+
for fname in os.listdir(RAW_DOCS_DIR):
|
| 29 |
+
path = os.path.join(RAW_DOCS_DIR, fname)
|
| 30 |
+
with open(path, "r", encoding="utf-8") as f:
|
| 31 |
+
text = f.read()
|
| 32 |
+
|
| 33 |
+
chunks = chunk_text(text, CHUNK_SIZE)
|
| 34 |
+
for chunk in chunks:
|
| 35 |
+
documents.append(chunk)
|
| 36 |
+
metadata.append({
|
| 37 |
+
"source_file": fname,
|
| 38 |
+
"source": f"https://www.jenkins.io/doc/"
|
| 39 |
+
})
|
| 40 |
+
|
| 41 |
+
embeddings = model.encode(documents)
|
| 42 |
+
index = faiss.IndexFlatL2(embeddings.shape[1])
|
| 43 |
+
index.add(embeddings)
|
| 44 |
+
|
| 45 |
+
os.makedirs("data/docs", exist_ok=True)
|
| 46 |
+
faiss.write_index(index, INDEX_PATH)
|
| 47 |
+
|
| 48 |
+
with open(META_PATH, "w", encoding="utf-8") as f:
|
| 49 |
+
json.dump(metadata, f, indent=2)
|
| 50 |
+
|
| 51 |
+
print(f"Ingested {len(documents)} document chunks.")
|
| 52 |
+
|
| 53 |
+
if __name__ == "__main__":
|
| 54 |
+
main()
|
main.py
ADDED
|
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from fastapi import FastAPI
|
| 2 |
+
from explain_error import explain_error
|
| 3 |
+
|
| 4 |
+
app = FastAPI()
|
| 5 |
+
|
| 6 |
+
@app.get("/")
|
| 7 |
+
def home():
|
| 8 |
+
return {"message": "Jenkins Error Explainer API running"}
|
| 9 |
+
|
| 10 |
+
@app.post("/explain")
|
| 11 |
+
def explain(payload: dict):
|
| 12 |
+
log = payload["log_text"]
|
| 13 |
+
return explain_error(log)
|
preload_model.py
ADDED
|
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from sentence_transformers import SentenceTransformer
|
| 2 |
+
|
| 3 |
+
print("Downloading model...")
|
| 4 |
+
MODEL = SentenceTransformer(
|
| 5 |
+
"paraphrase-MiniLM-L3-v2",
|
| 6 |
+
cache_folder="./model_cache"
|
| 7 |
+
)
|
| 8 |
+
|
| 9 |
+
print("Model cached successfully!")
|
requirements.txt
ADDED
|
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
fastapi
|
| 2 |
+
uvicorn
|
| 3 |
+
sentence-transformers==2.2.2
|
| 4 |
+
huggingface_hub==0.19.4
|
| 5 |
+
faiss-cpu
|
| 6 |
+
numpy
|
| 7 |
+
pydantic
|
| 8 |
+
torch
|
| 9 |
+
transformers==4.35.2
|
| 10 |
+
tokenizers==0.15.2
|
retrieve_docs.py
ADDED
|
@@ -0,0 +1,53 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# retrieve_docs.py
|
| 2 |
+
|
| 3 |
+
import json
|
| 4 |
+
import faiss
|
| 5 |
+
import numpy as np
|
| 6 |
+
from sentence_transformers import SentenceTransformer
|
| 7 |
+
|
| 8 |
+
INDEX_PATH = "data/docs/docs.index"
|
| 9 |
+
META_PATH = "data/docs/docs_meta.json"
|
| 10 |
+
TOP_K = 5
|
| 11 |
+
|
| 12 |
+
QUERY_TEMPLATES = {
|
| 13 |
+
"groovy_syntax_error": "Jenkins pipeline Groovy syntax error missing brace",
|
| 14 |
+
"missing_agent": "Jenkins pipeline agent none requires node context",
|
| 15 |
+
"no_node_available": "Jenkins no nodes with label scheduling executor",
|
| 16 |
+
"missing_plugin": "Jenkins No such DSL method pipeline step",
|
| 17 |
+
"missing_credentials": "Jenkins credentials not found pipeline",
|
| 18 |
+
"file_not_found": "Jenkins pipeline workspace file not found",
|
| 19 |
+
"git_authentication_error": "Jenkins git authentication failed checkout"
|
| 20 |
+
}
|
| 21 |
+
model = SentenceTransformer(
|
| 22 |
+
"paraphrase-MiniLM-L3-v2",
|
| 23 |
+
cache_folder="./model_cache"
|
| 24 |
+
)
|
| 25 |
+
|
| 26 |
+
def retrieve_docs(error_category: str):
|
| 27 |
+
|
| 28 |
+
index = faiss.read_index(INDEX_PATH)
|
| 29 |
+
with open(META_PATH, "r", encoding="utf-8") as f:
|
| 30 |
+
metadata = json.load(f)
|
| 31 |
+
|
| 32 |
+
query = QUERY_TEMPLATES.get(
|
| 33 |
+
error_category,
|
| 34 |
+
"Jenkins pipeline error"
|
| 35 |
+
)
|
| 36 |
+
|
| 37 |
+
query_embedding = model.encode([query])
|
| 38 |
+
distances, indices = index.search(query_embedding, TOP_K)
|
| 39 |
+
|
| 40 |
+
results = []
|
| 41 |
+
for idx in indices[0]:
|
| 42 |
+
results.append({
|
| 43 |
+
"text": None,
|
| 44 |
+
"meta": metadata[idx]
|
| 45 |
+
})
|
| 46 |
+
|
| 47 |
+
return results
|
| 48 |
+
|
| 49 |
+
|
| 50 |
+
if __name__ == "__main__":
|
| 51 |
+
results = retrieve_docs("Started by user Arvind Nandigam org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed: WorkflowScript: 10: expecting '}', found '' @ line 10, column 1.1 error")
|
| 52 |
+
for r in results:
|
| 53 |
+
print(r)
|