Vito Proscia commited on
Commit
4ba57df
·
unverified ·
2 Parent(s): 1708201 94af946

Merge pull request #32 from se4ai2526-uniba/grafana-drift-monitoring

Browse files
docker-compose.yml CHANGED
@@ -72,6 +72,50 @@ services:
72
  - hopcroft-net
73
  restart: unless-stopped
74
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
75
  networks:
76
  hopcroft-net:
77
  driver: bridge
@@ -79,3 +123,7 @@ networks:
79
  volumes:
80
  hopcroft-logs:
81
  driver: local
 
 
 
 
 
72
  - hopcroft-net
73
  restart: unless-stopped
74
 
75
+ grafana:
76
+ image: grafana/grafana:latest
77
+ container_name: grafana
78
+ ports:
79
+ - "3000:3000"
80
+ environment:
81
+ - GF_SECURITY_ADMIN_USER=admin
82
+ - GF_SECURITY_ADMIN_PASSWORD=admin
83
+ - GF_USERS_ALLOW_SIGN_UP=false
84
+ - GF_SERVER_ROOT_URL=http://localhost:3000
85
+ volumes:
86
+ # Provisioning: auto-configure datasources and dashboards
87
+ - ./monitoring/grafana/provisioning/datasources:/etc/grafana/provisioning/datasources
88
+ - ./monitoring/grafana/provisioning/dashboards:/etc/grafana/provisioning/dashboards
89
+ - ./monitoring/grafana/dashboards:/var/lib/grafana/dashboards
90
+ # Persistent storage for Grafana data
91
+ - grafana-data:/var/lib/grafana
92
+ networks:
93
+ - hopcroft-net
94
+ depends_on:
95
+ - prometheus
96
+ restart: unless-stopped
97
+ healthcheck:
98
+ test: ["CMD-SHELL", "curl -f http://localhost:3000/api/health || exit 1"]
99
+ interval: 30s
100
+ timeout: 10s
101
+ retries: 3
102
+
103
+ pushgateway:
104
+ image: prom/pushgateway:latest
105
+ container_name: pushgateway
106
+ ports:
107
+ - "9091:9091"
108
+ networks:
109
+ - hopcroft-net
110
+ restart: unless-stopped
111
+ command:
112
+ - '--web.listen-address=:9091'
113
+ - '--persistence.file=/data/pushgateway.data'
114
+ - '--persistence.interval=5m'
115
+ volumes:
116
+ - pushgateway-data:/data
117
+
118
+
119
  networks:
120
  hopcroft-net:
121
  driver: bridge
 
123
  volumes:
124
  hopcroft-logs:
125
  driver: local
126
+ grafana-data:
127
+ driver: local
128
+ pushgateway-data:
129
+ driver: local
monitoring/README.md CHANGED
@@ -55,3 +55,153 @@ We used Better Stack Uptime to monitor the availability of the production deploy
55
  - A failure scenario was tested to confirm Better Stack reports the server error details.
56
 
57
  - Screenshots are available in `monitoring/screenshots/`.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
55
  - A failure scenario was tested to confirm Better Stack reports the server error details.
56
 
57
  - Screenshots are available in `monitoring/screenshots/`.
58
+
59
+ ---
60
+
61
+ ## Grafana Dashboard
62
+
63
+ Grafana provides real-time visualization of system metrics and drift detection status.
64
+
65
+ ### Configuration
66
+ - **Port**: `3000`
67
+ - **Credentials**: `admin` / `admin`
68
+ - **Dashboard**: Hopcroft Monitoring Dashboard
69
+ - **Datasource**: Prometheus (auto-provisioned)
70
+ - **Provisioning Files**:
71
+ - Datasources: `grafana/provisioning/datasources/prometheus.yml`
72
+ - Dashboards: `grafana/provisioning/dashboards/dashboard.yml`
73
+ - Dashboard JSON: `grafana/dashboards/hopcroft_dashboard.json`
74
+
75
+ ### Dashboard Panels
76
+ 1. **API Request Rate**: Rate of incoming requests per endpoint
77
+ 2. **API Latency**: Average response time per endpoint
78
+ 3. **Drift Detection Status**: Real-time drift detection indicator (0=No Drift, 1=Drift Detected)
79
+ 4. **Drift P-Value**: Statistical significance of detected drift
80
+ 5. **Drift Distance**: Kolmogorov-Smirnov distance metric
81
+
82
+ ### Access
83
+ Navigate to `http://localhost:3000` and login with the provided credentials. The dashboard refreshes every 10 seconds.
84
+
85
+ ---
86
+
87
+ ## Data Drift Detection
88
+
89
+ Automated distribution shift detection using statistical testing to monitor model input data quality.
90
+
91
+ ### Algorithm
92
+ - **Method**: Kolmogorov-Smirnov Two-Sample Test (scipy-based)
93
+ - **Baseline Data**: 1000 samples from training set
94
+ - **Detection Threshold**: p-value < 0.05 (with Bonferroni correction)
95
+ - **Metrics Published**: drift_detected, drift_p_value, drift_distance, drift_check_timestamp
96
+
97
+ ### Scripts
98
+
99
+ #### Baseline Preparation
100
+ **Script**: `drift/scripts/prepare_baseline.py`
101
+
102
+ Functionality:
103
+ - Loads data from SQLite database (`data/raw/skillscope_data.db`)
104
+ - Extracts numeric features only
105
+ - Samples 1000 representative records
106
+ - Saves to `drift/baseline/reference_data.pkl`
107
+
108
+ Usage:
109
+ ```bash
110
+ cd monitoring/drift/scripts
111
+ python prepare_baseline.py
112
+ ```
113
+
114
+ #### Drift Detection
115
+ **Script**: `drift/scripts/run_drift_check.py`
116
+
117
+ Functionality:
118
+ - Loads baseline reference data
119
+ - Compares with new production data
120
+ - Performs KS test on each feature
121
+ - Pushes metrics to Pushgateway
122
+ - Saves results to `drift/reports/`
123
+
124
+ Usage:
125
+ ```bash
126
+ cd monitoring/drift/scripts
127
+ python run_drift_check.py
128
+ ```
129
+
130
+ ### Verification
131
+ Check Pushgateway metrics:
132
+ ```bash
133
+ curl http://localhost:9091/metrics | grep drift
134
+ ```
135
+
136
+ Query in Prometheus:
137
+ ```promql
138
+ drift_detected
139
+ drift_p_value
140
+ drift_distance
141
+ ```
142
+
143
+ ---
144
+
145
+ ## Pushgateway
146
+
147
+ Pushgateway collects metrics from short-lived jobs such as the drift detection script.
148
+
149
+ ### Configuration
150
+ - **Port**: `9091`
151
+ - **Persistence**: Enabled with 5-minute intervals
152
+ - **Data Volume**: `pushgateway-data`
153
+
154
+ ### Metrics Endpoint
155
+ Access metrics at `http://localhost:9091/metrics`
156
+
157
+ ### Integration
158
+ The drift detection script pushes metrics to Pushgateway, which are then scraped by Prometheus and displayed in Grafana.
159
+
160
+ ---
161
+
162
+ ## Alerting
163
+
164
+ Alert rules are defined in `prometheus/alert_rules.yml`:
165
+
166
+ - **High Latency**: Triggered when average latency exceeds 2 seconds
167
+ - **High Error Rate**: Triggered when error rate exceeds 5%
168
+ - **Data Drift Detected**: Triggered when drift_detected = 1
169
+
170
+ Alerts are routed to Alertmanager (`http://localhost:9093`) and can be configured to send notifications via email, Slack, or other channels in `alertmanager/config.yml`.
171
+
172
+ ---
173
+
174
+ ## Complete Stack Usage
175
+
176
+ ### Starting All Services
177
+ ```bash
178
+ # Start all monitoring services
179
+ docker compose up -d
180
+
181
+ # Verify all containers are running
182
+ docker compose ps
183
+
184
+ # Check Prometheus targets
185
+ curl http://localhost:9090/targets
186
+
187
+ # Check Grafana health
188
+ curl http://localhost:3000/api/health
189
+ ```
190
+
191
+ ### Running Drift Detection Workflow
192
+
193
+ 1. **Prepare Baseline (One-time setup)**
194
+ ```bash
195
+ cd monitoring/drift/scripts
196
+ python prepare_baseline.py
197
+ ```
198
+
199
+ 2. **Execute Drift Check**
200
+ ```bash
201
+ python run_drift_check.py
202
+ ```
203
+
204
+ 3. **Verify Results**
205
+ - Check Pushgateway: `http://localhost:9091`
206
+ - Check Prometheus: `http://localhost:9090/graph`
207
+ - Check Grafana: `http://localhost:3000`
monitoring/drift/scripts/prepare_baseline.py ADDED
@@ -0,0 +1,119 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Prepare baseline/reference data for drift detection.
3
+ This script samples representative data from the training set.
4
+ """
5
+
6
+ import pickle
7
+ import pandas as pd
8
+ import numpy as np
9
+ import sqlite3
10
+ from pathlib import Path
11
+ from sklearn.model_selection import train_test_split
12
+
13
+ # Paths
14
+ PROJECT_ROOT = Path(__file__).parent.parent.parent.parent
15
+ BASELINE_DIR = Path(__file__).parent.parent / "baseline"
16
+ BASELINE_DIR.mkdir(parents=True, exist_ok=True)
17
+
18
+
19
+ def load_training_data():
20
+ """Load the original training dataset from SQLite database."""
21
+ # Load from SQLite database
22
+ db_path = PROJECT_ROOT / "data" / "raw" / "skillscope_data.db"
23
+
24
+ if not db_path.exists():
25
+ raise FileNotFoundError(f"Database not found at {db_path}")
26
+
27
+ print(f"Loading data from database: {db_path}")
28
+ conn = sqlite3.connect(db_path)
29
+
30
+ # Load from the main table
31
+ query = "SELECT * FROM nlbse_tool_competition_data_by_issue LIMIT 10000"
32
+ df = pd.read_sql_query(query, conn)
33
+ conn.close()
34
+
35
+ print(f"Loaded {len(df)} training samples")
36
+ return df
37
+
38
+
39
+ def prepare_baseline(df, sample_size=1000, random_state=42):
40
+ """
41
+ Sample representative baseline data.
42
+
43
+ Args:
44
+ df: Training dataframe
45
+ sample_size: Number of samples for baseline
46
+ random_state: Random seed for reproducibility
47
+
48
+ Returns:
49
+ Baseline dataframe
50
+ """
51
+ # Stratified sampling if you have labels
52
+ if 'label' in df.columns:
53
+ _, baseline_df = train_test_split(
54
+ df,
55
+ test_size=sample_size,
56
+ random_state=random_state,
57
+ stratify=df['label']
58
+ )
59
+ else:
60
+ baseline_df = df.sample(n=min(sample_size, len(df)), random_state=random_state)
61
+
62
+ print(f"Sampled {len(baseline_df)} baseline samples")
63
+ return baseline_df
64
+
65
+
66
+ def extract_features(df):
67
+ """
68
+ Extract features used for drift detection.
69
+ Should match the features used by your model.
70
+ """
71
+
72
+ # Select only numeric columns, exclude labels and IDs
73
+ numeric_cols = df.select_dtypes(include=[np.number]).columns.tolist()
74
+ exclude_cols = ['label', 'id', 'timestamp', 'issue_id', 'file_id', 'method_id', 'class_id']
75
+ feature_columns = [col for col in numeric_cols if col not in exclude_cols]
76
+
77
+ X = df[feature_columns].values
78
+
79
+ print(f"Extracted {X.shape[1]} numeric features from {X.shape[0]} samples")
80
+ return X
81
+
82
+
83
+ def save_baseline(baseline_data, filename="reference_data.pkl"):
84
+ """Save baseline data to disk."""
85
+ baseline_path = BASELINE_DIR / filename
86
+
87
+ with open(baseline_path, 'wb') as f:
88
+ pickle.dump(baseline_data, f)
89
+
90
+ print(f"Baseline saved to {baseline_path}")
91
+ print(f" Shape: {baseline_data.shape}")
92
+ print(f" Size: {baseline_path.stat().st_size / 1024:.2f} KB")
93
+
94
+
95
+ def main():
96
+ """Main execution."""
97
+ print("=" * 60)
98
+ print("Preparing Baseline Data for Drift Detection")
99
+ print("=" * 60)
100
+
101
+ # Load data
102
+ df = load_training_data()
103
+
104
+ # Sample baseline
105
+ baseline_df = prepare_baseline(df, sample_size=1000)
106
+
107
+ # Extract features
108
+ X_baseline = extract_features(baseline_df)
109
+
110
+ # Save
111
+ save_baseline(X_baseline)
112
+
113
+ print("\n" + "=" * 60)
114
+ print("Baseline preparation complete!")
115
+ print("=" * 60)
116
+
117
+
118
+ if __name__ == "__main__":
119
+ main()
monitoring/drift/scripts/requirements.txt ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ alibi-detect>=0.11.4
2
+ pandas>=2.0.0
3
+ numpy>=1.24.0
4
+ scikit-learn>=1.3.0
5
+ requests>=2.31.0
6
+ mlflow>=2.8.0
monitoring/drift/scripts/run_drift_check.py ADDED
@@ -0,0 +1,216 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Data Drift Detection using Scipy KS Test.
3
+ Detects distribution shifts between baseline and new data.
4
+ """
5
+
6
+ import pickle
7
+ import json
8
+ import requests
9
+ import numpy as np
10
+ import pandas as pd
11
+ from pathlib import Path
12
+ from datetime import datetime
13
+ from scipy.stats import ks_2samp
14
+ from typing import Dict, Tuple
15
+
16
+ # Configuration
17
+ PROJECT_ROOT = Path(__file__).parent.parent.parent.parent
18
+ BASELINE_DIR = Path(__file__).parent.parent / "baseline"
19
+ REPORTS_DIR = Path(__file__).parent.parent / "reports"
20
+ REPORTS_DIR.mkdir(parents=True, exist_ok=True)
21
+
22
+ PUSHGATEWAY_URL = "http://localhost:9091"
23
+ P_VALUE_THRESHOLD = 0.05 # Significance level
24
+
25
+
26
+ def load_baseline() -> np.ndarray:
27
+ """Load reference/baseline data."""
28
+ baseline_path = BASELINE_DIR / "reference_data.pkl"
29
+
30
+ if not baseline_path.exists():
31
+ raise FileNotFoundError(
32
+ f"Baseline data not found at {baseline_path}\n"
33
+ f"Run `python prepare_baseline.py` first!"
34
+ )
35
+
36
+ with open(baseline_path, 'rb') as f:
37
+ X_baseline = pickle.load(f)
38
+
39
+ print(f"Loaded baseline data: {X_baseline.shape}")
40
+ return X_baseline
41
+
42
+
43
+ def load_new_data() -> np.ndarray:
44
+ """
45
+ Load new/production data to check for drift.
46
+
47
+ In production, this would fetch from:
48
+ - Database
49
+ - S3 bucket
50
+ - API logs
51
+ - Data lake
52
+
53
+ For now, simulate or load from file.
54
+ """
55
+
56
+ # Option 1: Load from file
57
+ data_path = PROJECT_ROOT / "data" / "test.csv"
58
+ if data_path.exists():
59
+ df = pd.read_csv(data_path)
60
+ # Extract same features as baseline
61
+ feature_columns = [col for col in df.columns if col not in ['label', 'id', 'timestamp']]
62
+ X_new = df[feature_columns].values[:500] # Take 500 samples
63
+ print(f"Loaded new data from file: {X_new.shape}")
64
+ return X_new
65
+
66
+ # Option 2: Simulate (for testing)
67
+ print("Simulating new data (no test file found)")
68
+ X_baseline = load_baseline()
69
+ # Add slight shift to simulate drift
70
+ X_new = X_baseline[:500] + np.random.normal(0, 0.1, (500, X_baseline.shape[1]))
71
+ return X_new
72
+
73
+
74
+ def run_drift_detection(X_baseline: np.ndarray, X_new: np.ndarray) -> Dict:
75
+ """
76
+ Run Kolmogorov-Smirnov drift detection using scipy.
77
+
78
+ Args:
79
+ X_baseline: Reference data
80
+ X_new: New data to check
81
+
82
+ Returns:
83
+ Drift detection results
84
+ """
85
+ print("\n" + "=" * 60)
86
+ print("Running Drift Detection (Kolmogorov-Smirnov Test)")
87
+ print("=" * 60)
88
+
89
+ # Run KS test for each feature
90
+ p_values = []
91
+ distances = []
92
+
93
+ for i in range(X_baseline.shape[1]):
94
+ statistic, p_value = ks_2samp(X_baseline[:, i], X_new[:, i])
95
+ p_values.append(p_value)
96
+ distances.append(statistic)
97
+
98
+ # Aggregate results
99
+ min_p_value = np.min(p_values)
100
+ max_distance = np.max(distances)
101
+
102
+ # Apply Bonferroni correction for multiple testing
103
+ adjusted_threshold = P_VALUE_THRESHOLD / X_baseline.shape[1]
104
+ drift_detected = min_p_value < adjusted_threshold
105
+
106
+ # Extract results
107
+ results = {
108
+ "timestamp": datetime.now().isoformat(),
109
+ "drift_detected": int(drift_detected),
110
+ "p_value": float(min_p_value),
111
+ "threshold": adjusted_threshold,
112
+ "distance": float(max_distance),
113
+ "baseline_samples": X_baseline.shape[0],
114
+ "new_samples": X_new.shape[0],
115
+ "num_features": X_baseline.shape[1]
116
+ }
117
+
118
+ # Print results
119
+ print(f"\nResults:")
120
+ print(f" Drift Detected: {'YES' if results['drift_detected'] else 'NO'}")
121
+ print(f" P-Value: {results['p_value']:.6f} (adjusted threshold: {adjusted_threshold:.6f})")
122
+ print(f" Distance: {results['distance']:.6f}")
123
+ print(f" Baseline: {X_baseline.shape[0]} samples")
124
+ print(f" New Data: {X_new.shape[0]} samples")
125
+
126
+ return results
127
+
128
+
129
+ def save_report(results: Dict):
130
+ """Save drift detection report to file."""
131
+ timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
132
+ report_path = REPORTS_DIR / f"drift_report_{timestamp}.json"
133
+
134
+ with open(report_path, 'w') as f:
135
+ json.dump(results, f, indent=2)
136
+
137
+ print(f"\nReport saved to: {report_path}")
138
+
139
+
140
+ def push_to_prometheus(results: Dict):
141
+ """
142
+ Push drift metrics to Prometheus via Pushgateway.
143
+
144
+ This allows Prometheus to scrape short-lived job metrics.
145
+ """
146
+ metrics = f"""# TYPE drift_detected gauge
147
+ # HELP drift_detected Whether data drift was detected (1=yes, 0=no)
148
+ drift_detected {results['drift_detected']}
149
+
150
+ # TYPE drift_p_value gauge
151
+ # HELP drift_p_value P-value from drift detection test
152
+ drift_p_value {results['p_value']}
153
+
154
+ # TYPE drift_distance gauge
155
+ # HELP drift_distance Statistical distance between distributions
156
+ drift_distance {results['distance']}
157
+
158
+ # TYPE drift_check_timestamp gauge
159
+ # HELP drift_check_timestamp Unix timestamp of last drift check
160
+ drift_check_timestamp {datetime.now().timestamp()}
161
+ """
162
+
163
+ try:
164
+ response = requests.post(
165
+ f"{PUSHGATEWAY_URL}/metrics/job/drift_detection/instance/hopcroft",
166
+ data=metrics,
167
+ headers={'Content-Type': 'text/plain'}
168
+ )
169
+ response.raise_for_status()
170
+ print(f"Metrics pushed to Pushgateway at {PUSHGATEWAY_URL}")
171
+ except requests.exceptions.RequestException as e:
172
+ print(f"Failed to push to Pushgateway: {e}")
173
+ print(f" Make sure Pushgateway is running: docker compose ps pushgateway")
174
+
175
+
176
+ def main():
177
+ """Main execution."""
178
+ print("\n" + "=" * 60)
179
+ print("Hopcroft Data Drift Detection")
180
+ print("=" * 60)
181
+
182
+ try:
183
+ # Load data
184
+ X_baseline = load_baseline()
185
+ X_new = load_new_data()
186
+
187
+ # Run drift detection
188
+ results = run_drift_detection(X_baseline, X_new)
189
+
190
+ # Save report
191
+ save_report(results)
192
+
193
+ # Push to Prometheus
194
+ push_to_prometheus(results)
195
+
196
+ print("\n" + "=" * 60)
197
+ print("Drift Detection Complete!")
198
+ print("=" * 60)
199
+
200
+ if results['drift_detected']:
201
+ print("\nWARNING: Data drift detected!")
202
+ print(f" P-value: {results['p_value']:.6f} < {P_VALUE_THRESHOLD}")
203
+ return 1
204
+ else:
205
+ print("\nNo significant drift detected")
206
+ return 0
207
+
208
+ except Exception as e:
209
+ print(f"\nError: {e}")
210
+ import traceback
211
+ traceback.print_exc()
212
+ return 1
213
+
214
+
215
+ if __name__ == "__main__":
216
+ exit(main())
monitoring/grafana/dashboards/hopcroft_dashboard.json ADDED
@@ -0,0 +1,358 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "annotations": {
3
+ "list": [
4
+ {
5
+ "builtIn": 1,
6
+ "datasource": "-- Grafana --",
7
+ "enable": true,
8
+ "hide": true,
9
+ "iconColor": "rgba(0, 211, 255, 1)",
10
+ "name": "Annotations & Alerts",
11
+ "type": "dashboard"
12
+ }
13
+ ]
14
+ },
15
+ "editable": true,
16
+ "gnetId": null,
17
+ "graphTooltip": 1,
18
+ "id": null,
19
+ "links": [],
20
+ "panels": [
21
+ {
22
+ "datasource": "Prometheus",
23
+ "fieldConfig": {
24
+ "defaults": {
25
+ "color": {
26
+ "mode": "thresholds"
27
+ },
28
+ "mappings": [],
29
+ "thresholds": {
30
+ "mode": "absolute",
31
+ "steps": [
32
+ {
33
+ "color": "green",
34
+ "value": null
35
+ },
36
+ {
37
+ "color": "red",
38
+ "value": 80
39
+ }
40
+ ]
41
+ },
42
+ "unit": "reqps"
43
+ }
44
+ },
45
+ "gridPos": {
46
+ "h": 8,
47
+ "w": 6,
48
+ "x": 0,
49
+ "y": 0
50
+ },
51
+ "id": 1,
52
+ "options": {
53
+ "orientation": "auto",
54
+ "reduceOptions": {
55
+ "calcs": ["lastNotNull"],
56
+ "fields": "",
57
+ "values": false
58
+ },
59
+ "showThresholdLabels": false,
60
+ "showThresholdMarkers": true
61
+ },
62
+ "pluginVersion": "9.0.0",
63
+ "targets": [
64
+ {
65
+ "expr": "rate(fastapi_requests_total[1m])",
66
+ "refId": "A"
67
+ }
68
+ ],
69
+ "title": "Request Rate",
70
+ "type": "gauge",
71
+ "description": "Number of requests per second handled by the API"
72
+ },
73
+ {
74
+ "datasource": "Prometheus",
75
+ "fieldConfig": {
76
+ "defaults": {
77
+ "color": {
78
+ "mode": "palette-classic"
79
+ },
80
+ "custom": {
81
+ "axisLabel": "",
82
+ "axisPlacement": "auto",
83
+ "barAlignment": 0,
84
+ "drawStyle": "line",
85
+ "fillOpacity": 10,
86
+ "gradientMode": "none",
87
+ "hideFrom": {
88
+ "tooltip": false,
89
+ "viz": false,
90
+ "legend": false
91
+ },
92
+ "lineInterpolation": "linear",
93
+ "lineWidth": 1,
94
+ "pointSize": 5,
95
+ "scaleDistribution": {
96
+ "type": "linear"
97
+ },
98
+ "showPoints": "never",
99
+ "spanNulls": true
100
+ },
101
+ "mappings": [],
102
+ "thresholds": {
103
+ "mode": "absolute",
104
+ "steps": [
105
+ {
106
+ "color": "green",
107
+ "value": null
108
+ }
109
+ ]
110
+ },
111
+ "unit": "ms"
112
+ }
113
+ },
114
+ "gridPos": {
115
+ "h": 8,
116
+ "w": 18,
117
+ "x": 6,
118
+ "y": 0
119
+ },
120
+ "id": 2,
121
+ "options": {
122
+ "legend": {
123
+ "calcs": ["mean", "max"],
124
+ "displayMode": "table",
125
+ "placement": "right"
126
+ },
127
+ "tooltip": {
128
+ "mode": "multi"
129
+ }
130
+ },
131
+ "pluginVersion": "9.0.0",
132
+ "targets": [
133
+ {
134
+ "expr": "histogram_quantile(0.95, rate(fastapi_request_duration_seconds_bucket[5m])) * 1000",
135
+ "legendFormat": "p95",
136
+ "refId": "A"
137
+ },
138
+ {
139
+ "expr": "histogram_quantile(0.50, rate(fastapi_request_duration_seconds_bucket[5m])) * 1000",
140
+ "legendFormat": "p50 (median)",
141
+ "refId": "B"
142
+ }
143
+ ],
144
+ "title": "Request Latency (p50, p95)",
145
+ "type": "timeseries",
146
+ "description": "API response time percentiles over time"
147
+ },
148
+ {
149
+ "datasource": "Prometheus",
150
+ "fieldConfig": {
151
+ "defaults": {
152
+ "color": {
153
+ "mode": "thresholds"
154
+ },
155
+ "mappings": [
156
+ {
157
+ "options": {
158
+ "0": {
159
+ "color": "red",
160
+ "index": 1,
161
+ "text": "No Drift"
162
+ },
163
+ "1": {
164
+ "color": "green",
165
+ "index": 0,
166
+ "text": "Drift Detected"
167
+ }
168
+ },
169
+ "type": "value"
170
+ }
171
+ ],
172
+ "thresholds": {
173
+ "mode": "absolute",
174
+ "steps": [
175
+ {
176
+ "color": "green",
177
+ "value": null
178
+ }
179
+ ]
180
+ }
181
+ }
182
+ },
183
+ "gridPos": {
184
+ "h": 6,
185
+ "w": 6,
186
+ "x": 0,
187
+ "y": 8
188
+ },
189
+ "id": 3,
190
+ "options": {
191
+ "orientation": "auto",
192
+ "reduceOptions": {
193
+ "calcs": ["lastNotNull"],
194
+ "fields": "",
195
+ "values": false
196
+ },
197
+ "showThresholdLabels": false,
198
+ "showThresholdMarkers": true,
199
+ "text": {}
200
+ },
201
+ "pluginVersion": "9.0.0",
202
+ "targets": [
203
+ {
204
+ "expr": "drift_detected",
205
+ "refId": "A"
206
+ }
207
+ ],
208
+ "title": "Data Drift Status",
209
+ "type": "stat",
210
+ "description": "Current data drift detection status (1 = drift detected, 0 = no drift)"
211
+ },
212
+ {
213
+ "datasource": "Prometheus",
214
+ "fieldConfig": {
215
+ "defaults": {
216
+ "color": {
217
+ "mode": "thresholds"
218
+ },
219
+ "decimals": 4,
220
+ "mappings": [],
221
+ "thresholds": {
222
+ "mode": "absolute",
223
+ "steps": [
224
+ {
225
+ "color": "green",
226
+ "value": null
227
+ },
228
+ {
229
+ "color": "yellow",
230
+ "value": 0.01
231
+ },
232
+ {
233
+ "color": "red",
234
+ "value": 0.05
235
+ }
236
+ ]
237
+ },
238
+ "unit": "short"
239
+ }
240
+ },
241
+ "gridPos": {
242
+ "h": 6,
243
+ "w": 6,
244
+ "x": 6,
245
+ "y": 8
246
+ },
247
+ "id": 4,
248
+ "options": {
249
+ "orientation": "auto",
250
+ "reduceOptions": {
251
+ "calcs": ["lastNotNull"],
252
+ "fields": "",
253
+ "values": false
254
+ },
255
+ "showThresholdLabels": false,
256
+ "showThresholdMarkers": true,
257
+ "text": {}
258
+ },
259
+ "pluginVersion": "9.0.0",
260
+ "targets": [
261
+ {
262
+ "expr": "drift_p_value",
263
+ "refId": "A"
264
+ }
265
+ ],
266
+ "title": "Drift P-Value",
267
+ "type": "stat",
268
+ "description": "Statistical significance of detected drift (lower = more significant)"
269
+ },
270
+ {
271
+ "datasource": "Prometheus",
272
+ "fieldConfig": {
273
+ "defaults": {
274
+ "color": {
275
+ "mode": "palette-classic"
276
+ },
277
+ "custom": {
278
+ "axisLabel": "",
279
+ "axisPlacement": "auto",
280
+ "barAlignment": 0,
281
+ "drawStyle": "line",
282
+ "fillOpacity": 10,
283
+ "gradientMode": "none",
284
+ "hideFrom": {
285
+ "tooltip": false,
286
+ "viz": false,
287
+ "legend": false
288
+ },
289
+ "lineInterpolation": "linear",
290
+ "lineWidth": 1,
291
+ "pointSize": 5,
292
+ "scaleDistribution": {
293
+ "type": "linear"
294
+ },
295
+ "showPoints": "auto",
296
+ "spanNulls": false
297
+ },
298
+ "mappings": [],
299
+ "thresholds": {
300
+ "mode": "absolute",
301
+ "steps": [
302
+ {
303
+ "color": "green",
304
+ "value": null
305
+ }
306
+ ]
307
+ },
308
+ "unit": "short"
309
+ }
310
+ },
311
+ "gridPos": {
312
+ "h": 6,
313
+ "w": 12,
314
+ "x": 12,
315
+ "y": 8
316
+ },
317
+ "id": 5,
318
+ "options": {
319
+ "legend": {
320
+ "calcs": ["mean", "lastNotNull"],
321
+ "displayMode": "table",
322
+ "placement": "right"
323
+ },
324
+ "tooltip": {
325
+ "mode": "multi"
326
+ }
327
+ },
328
+ "pluginVersion": "9.0.0",
329
+ "targets": [
330
+ {
331
+ "expr": "drift_distance",
332
+ "legendFormat": "Distance",
333
+ "refId": "A"
334
+ }
335
+ ],
336
+ "title": "Drift Distance Over Time",
337
+ "type": "timeseries",
338
+ "description": "Statistical distance between baseline and current data distribution"
339
+ }
340
+ ],
341
+ "refresh": "10s",
342
+ "schemaVersion": 36,
343
+ "style": "dark",
344
+ "tags": ["hopcroft", "ml", "monitoring"],
345
+ "templating": {
346
+ "list": []
347
+ },
348
+ "time": {
349
+ "from": "now-1h",
350
+ "to": "now"
351
+ },
352
+ "timepicker": {},
353
+ "timezone": "",
354
+ "title": "Hopcroft ML Model Monitoring",
355
+ "uid": "hopcroft-ml-dashboard",
356
+ "version": 1,
357
+ "weekStart": ""
358
+ }
monitoring/grafana/provisioning/dashboards/dashboard.yml ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ apiVersion: 1
2
+
3
+ providers:
4
+ - name: 'Hopcroft Dashboards'
5
+ orgId: 1
6
+ folder: ''
7
+ type: file
8
+ disableDeletion: false
9
+ updateIntervalSeconds: 10
10
+ allowUiUpdates: true
11
+ options:
12
+ path: /var/lib/grafana/dashboards
13
+ foldersFromFilesStructure: true
monitoring/grafana/provisioning/dashboards/hopcroft_dashboard.json ADDED
@@ -0,0 +1,358 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "annotations": {
3
+ "list": [
4
+ {
5
+ "builtIn": 1,
6
+ "datasource": "-- Grafana --",
7
+ "enable": true,
8
+ "hide": true,
9
+ "iconColor": "rgba(0, 211, 255, 1)",
10
+ "name": "Annotations & Alerts",
11
+ "type": "dashboard"
12
+ }
13
+ ]
14
+ },
15
+ "editable": true,
16
+ "gnetId": null,
17
+ "graphTooltip": 1,
18
+ "id": null,
19
+ "links": [],
20
+ "panels": [
21
+ {
22
+ "datasource": "Prometheus",
23
+ "fieldConfig": {
24
+ "defaults": {
25
+ "color": {
26
+ "mode": "thresholds"
27
+ },
28
+ "mappings": [],
29
+ "thresholds": {
30
+ "mode": "absolute",
31
+ "steps": [
32
+ {
33
+ "color": "green",
34
+ "value": null
35
+ },
36
+ {
37
+ "color": "red",
38
+ "value": 80
39
+ }
40
+ ]
41
+ },
42
+ "unit": "reqps"
43
+ }
44
+ },
45
+ "gridPos": {
46
+ "h": 8,
47
+ "w": 6,
48
+ "x": 0,
49
+ "y": 0
50
+ },
51
+ "id": 1,
52
+ "options": {
53
+ "orientation": "auto",
54
+ "reduceOptions": {
55
+ "calcs": ["lastNotNull"],
56
+ "fields": "",
57
+ "values": false
58
+ },
59
+ "showThresholdLabels": false,
60
+ "showThresholdMarkers": true
61
+ },
62
+ "pluginVersion": "9.0.0",
63
+ "targets": [
64
+ {
65
+ "expr": "rate(fastapi_requests_total[1m])",
66
+ "refId": "A"
67
+ }
68
+ ],
69
+ "title": "Request Rate",
70
+ "type": "gauge",
71
+ "description": "Number of requests per second handled by the API"
72
+ },
73
+ {
74
+ "datasource": "Prometheus",
75
+ "fieldConfig": {
76
+ "defaults": {
77
+ "color": {
78
+ "mode": "palette-classic"
79
+ },
80
+ "custom": {
81
+ "axisLabel": "",
82
+ "axisPlacement": "auto",
83
+ "barAlignment": 0,
84
+ "drawStyle": "line",
85
+ "fillOpacity": 10,
86
+ "gradientMode": "none",
87
+ "hideFrom": {
88
+ "tooltip": false,
89
+ "viz": false,
90
+ "legend": false
91
+ },
92
+ "lineInterpolation": "linear",
93
+ "lineWidth": 1,
94
+ "pointSize": 5,
95
+ "scaleDistribution": {
96
+ "type": "linear"
97
+ },
98
+ "showPoints": "never",
99
+ "spanNulls": true
100
+ },
101
+ "mappings": [],
102
+ "thresholds": {
103
+ "mode": "absolute",
104
+ "steps": [
105
+ {
106
+ "color": "green",
107
+ "value": null
108
+ }
109
+ ]
110
+ },
111
+ "unit": "ms"
112
+ }
113
+ },
114
+ "gridPos": {
115
+ "h": 8,
116
+ "w": 18,
117
+ "x": 6,
118
+ "y": 0
119
+ },
120
+ "id": 2,
121
+ "options": {
122
+ "legend": {
123
+ "calcs": ["mean", "max"],
124
+ "displayMode": "table",
125
+ "placement": "right"
126
+ },
127
+ "tooltip": {
128
+ "mode": "multi"
129
+ }
130
+ },
131
+ "pluginVersion": "9.0.0",
132
+ "targets": [
133
+ {
134
+ "expr": "histogram_quantile(0.95, rate(fastapi_request_duration_seconds_bucket[5m])) * 1000",
135
+ "legendFormat": "p95",
136
+ "refId": "A"
137
+ },
138
+ {
139
+ "expr": "histogram_quantile(0.50, rate(fastapi_request_duration_seconds_bucket[5m])) * 1000",
140
+ "legendFormat": "p50 (median)",
141
+ "refId": "B"
142
+ }
143
+ ],
144
+ "title": "Request Latency (p50, p95)",
145
+ "type": "timeseries",
146
+ "description": "API response time percentiles over time"
147
+ },
148
+ {
149
+ "datasource": "Prometheus",
150
+ "fieldConfig": {
151
+ "defaults": {
152
+ "color": {
153
+ "mode": "thresholds"
154
+ },
155
+ "mappings": [
156
+ {
157
+ "options": {
158
+ "0": {
159
+ "color": "red",
160
+ "index": 1,
161
+ "text": "No Drift"
162
+ },
163
+ "1": {
164
+ "color": "green",
165
+ "index": 0,
166
+ "text": "Drift Detected"
167
+ }
168
+ },
169
+ "type": "value"
170
+ }
171
+ ],
172
+ "thresholds": {
173
+ "mode": "absolute",
174
+ "steps": [
175
+ {
176
+ "color": "green",
177
+ "value": null
178
+ }
179
+ ]
180
+ }
181
+ }
182
+ },
183
+ "gridPos": {
184
+ "h": 6,
185
+ "w": 6,
186
+ "x": 0,
187
+ "y": 8
188
+ },
189
+ "id": 3,
190
+ "options": {
191
+ "orientation": "auto",
192
+ "reduceOptions": {
193
+ "calcs": ["lastNotNull"],
194
+ "fields": "",
195
+ "values": false
196
+ },
197
+ "showThresholdLabels": false,
198
+ "showThresholdMarkers": true,
199
+ "text": {}
200
+ },
201
+ "pluginVersion": "9.0.0",
202
+ "targets": [
203
+ {
204
+ "expr": "drift_detected",
205
+ "refId": "A"
206
+ }
207
+ ],
208
+ "title": "Data Drift Status",
209
+ "type": "stat",
210
+ "description": "Current data drift detection status (1 = drift detected, 0 = no drift)"
211
+ },
212
+ {
213
+ "datasource": "Prometheus",
214
+ "fieldConfig": {
215
+ "defaults": {
216
+ "color": {
217
+ "mode": "thresholds"
218
+ },
219
+ "decimals": 4,
220
+ "mappings": [],
221
+ "thresholds": {
222
+ "mode": "absolute",
223
+ "steps": [
224
+ {
225
+ "color": "green",
226
+ "value": null
227
+ },
228
+ {
229
+ "color": "yellow",
230
+ "value": 0.01
231
+ },
232
+ {
233
+ "color": "red",
234
+ "value": 0.05
235
+ }
236
+ ]
237
+ },
238
+ "unit": "short"
239
+ }
240
+ },
241
+ "gridPos": {
242
+ "h": 6,
243
+ "w": 6,
244
+ "x": 6,
245
+ "y": 8
246
+ },
247
+ "id": 4,
248
+ "options": {
249
+ "orientation": "auto",
250
+ "reduceOptions": {
251
+ "calcs": ["lastNotNull"],
252
+ "fields": "",
253
+ "values": false
254
+ },
255
+ "showThresholdLabels": false,
256
+ "showThresholdMarkers": true,
257
+ "text": {}
258
+ },
259
+ "pluginVersion": "9.0.0",
260
+ "targets": [
261
+ {
262
+ "expr": "drift_p_value",
263
+ "refId": "A"
264
+ }
265
+ ],
266
+ "title": "Drift P-Value",
267
+ "type": "stat",
268
+ "description": "Statistical significance of detected drift (lower = more significant)"
269
+ },
270
+ {
271
+ "datasource": "Prometheus",
272
+ "fieldConfig": {
273
+ "defaults": {
274
+ "color": {
275
+ "mode": "palette-classic"
276
+ },
277
+ "custom": {
278
+ "axisLabel": "",
279
+ "axisPlacement": "auto",
280
+ "barAlignment": 0,
281
+ "drawStyle": "line",
282
+ "fillOpacity": 10,
283
+ "gradientMode": "none",
284
+ "hideFrom": {
285
+ "tooltip": false,
286
+ "viz": false,
287
+ "legend": false
288
+ },
289
+ "lineInterpolation": "linear",
290
+ "lineWidth": 1,
291
+ "pointSize": 5,
292
+ "scaleDistribution": {
293
+ "type": "linear"
294
+ },
295
+ "showPoints": "auto",
296
+ "spanNulls": false
297
+ },
298
+ "mappings": [],
299
+ "thresholds": {
300
+ "mode": "absolute",
301
+ "steps": [
302
+ {
303
+ "color": "green",
304
+ "value": null
305
+ }
306
+ ]
307
+ },
308
+ "unit": "short"
309
+ }
310
+ },
311
+ "gridPos": {
312
+ "h": 6,
313
+ "w": 12,
314
+ "x": 12,
315
+ "y": 8
316
+ },
317
+ "id": 5,
318
+ "options": {
319
+ "legend": {
320
+ "calcs": ["mean", "lastNotNull"],
321
+ "displayMode": "table",
322
+ "placement": "right"
323
+ },
324
+ "tooltip": {
325
+ "mode": "multi"
326
+ }
327
+ },
328
+ "pluginVersion": "9.0.0",
329
+ "targets": [
330
+ {
331
+ "expr": "drift_distance",
332
+ "legendFormat": "Distance",
333
+ "refId": "A"
334
+ }
335
+ ],
336
+ "title": "Drift Distance Over Time",
337
+ "type": "timeseries",
338
+ "description": "Statistical distance between baseline and current data distribution"
339
+ }
340
+ ],
341
+ "refresh": "10s",
342
+ "schemaVersion": 36,
343
+ "style": "dark",
344
+ "tags": ["hopcroft", "ml", "monitoring"],
345
+ "templating": {
346
+ "list": []
347
+ },
348
+ "time": {
349
+ "from": "now-1h",
350
+ "to": "now"
351
+ },
352
+ "timepicker": {},
353
+ "timezone": "",
354
+ "title": "Hopcroft ML Model Monitoring",
355
+ "uid": "hopcroft-ml-dashboard",
356
+ "version": 1,
357
+ "weekStart": ""
358
+ }
monitoring/grafana/provisioning/datasources/prometheus.yml ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ apiVersion: 1
2
+
3
+ datasources:
4
+ - name: Prometheus
5
+ type: prometheus
6
+ access: proxy
7
+ uid: prometheus
8
+ orgId: 1
9
+ url: http://prometheus:9090
10
+ isDefault: true
11
+ editable: true
12
+ jsonData:
13
+ httpMethod: POST
14
+ timeInterval: "15s"
monitoring/prometheus/prometheus.yml CHANGED
@@ -1,6 +1,9 @@
1
  global:
2
  scrape_interval: 15s
3
  evaluation_interval: 15s
 
 
 
4
 
5
  rule_files:
6
  - "alert_rules.yml"
@@ -13,5 +16,17 @@ alerting:
13
 
14
  scrape_configs:
15
  - job_name: 'hopcroft-api'
 
16
  static_configs:
17
  - targets: ['hopcroft-api:8080']
 
 
 
 
 
 
 
 
 
 
 
 
1
  global:
2
  scrape_interval: 15s
3
  evaluation_interval: 15s
4
+ external_labels:
5
+ monitor: 'hopcroft-monitor'
6
+ environment: 'development'
7
 
8
  rule_files:
9
  - "alert_rules.yml"
 
16
 
17
  scrape_configs:
18
  - job_name: 'hopcroft-api'
19
+ metrics_path: '/metrics'
20
  static_configs:
21
  - targets: ['hopcroft-api:8080']
22
+ scrape_interval: 10s
23
+
24
+ - job_name: 'prometheus'
25
+ static_configs:
26
+ - targets: ['localhost:9090']
27
+
28
+ - job_name: 'pushgateway'
29
+ honor_labels: true
30
+ static_configs:
31
+ - targets: ['pushgateway:9091']
32
+ scrape_interval: 30s