Spaces:
Running
Running
ruben de la fuente Claude Sonnet 4.6 commited on
Commit Β·
39a2a9f
1
Parent(s): 0e13326
feat: update with Claude-extracted benchmarks and UI fixes
Browse files- Re-extracted benchmarks.json using Claude Sonnet (15 patterns, 34 insights)
- Fixed form field text visibility (text-gray-900)
- Added CLAUDE.md and architecture docs
- Updated README with report generation details
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- CLAUDE.md +45 -0
- README.md +18 -2
- data/benchmarks.json +153 -531
- docs/architecture.md +62 -0
CLAUDE.md
ADDED
|
@@ -0,0 +1,45 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# CLAUDE.md
|
| 2 |
+
|
| 3 |
+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
| 4 |
+
|
| 5 |
+
## Commands
|
| 6 |
+
|
| 7 |
+
```bash
|
| 8 |
+
npm run dev # start dev server at localhost:3000
|
| 9 |
+
npm run build # production build (uses standalone output for Docker)
|
| 10 |
+
npm run lint # eslint
|
| 11 |
+
npx tsc --noEmit # type-check without building
|
| 12 |
+
npm run extract # run PDF β benchmarks.json extraction (requires Ollama or HF)
|
| 13 |
+
```
|
| 14 |
+
|
| 15 |
+
## Environment
|
| 16 |
+
|
| 17 |
+
Copy `.env.local.example` to `.env.local` before running locally. Ollama must be running (`ollama serve`) with `llama3.1:8b` pulled.
|
| 18 |
+
|
| 19 |
+
The LLM provider is swapped entirely via env vars β no code changes needed:
|
| 20 |
+
- **Local (Ollama):** `OLLAMA_BASE_URL=http://localhost:11434/v1`, `LLM_MODEL=llama3.1:8b`
|
| 21 |
+
- **HF Spaces:** `OLLAMA_BASE_URL=https://router.huggingface.co/v1`, `LLM_MODEL=Qwen/Qwen2.5-72B-Instruct`, `OPENAI_API_KEY=hf_...`
|
| 22 |
+
- **OpenAI:** set `OPENAI_API_KEY`, `LLM_MODEL=gpt-4o`, remove `OLLAMA_BASE_URL`
|
| 23 |
+
|
| 24 |
+
## Architecture
|
| 25 |
+
|
| 26 |
+
The app has two distinct flows:
|
| 27 |
+
|
| 28 |
+
**One-time setup:** `scripts/extract-knowledge.ts` reads PDFs from `data/pdfs/`, chunks text into ~8000-char pieces, sends each to the LLM, merges results into `data/benchmarks.json` (47 patterns, 124 insights from 3 DORA reports). This file is committed and bundled into the Docker image β the script does not run at runtime.
|
| 29 |
+
|
| 30 |
+
**Request flow:** Browser form (`app/page.tsx`, two steps) β POST `/api/interpret` β `lib/benchmarks.ts` loads `benchmarks.json` (cached in memory) β `lib/prompts.ts` builds system prompt with benchmark data β `lib/llm.ts` calls LLM via OpenAI-compatible client β response validated with `InterpretationReportSchema` (Zod) β JSON returned β stored in `sessionStorage` β `app/report/page.tsx` reads and renders report.
|
| 31 |
+
|
| 32 |
+
**LLM abstraction:** All LLM calls go through `lib/llm.ts`, which wraps the `openai` npm package. The provider is controlled entirely by `OLLAMA_BASE_URL`, `OPENAI_API_KEY`, and `LLM_MODEL` env vars. The OpenAI client's `baseURL` is set to `OLLAMA_BASE_URL`, making any OpenAI-compatible endpoint (Ollama, HF router, Groq, OpenAI) work without code changes.
|
| 33 |
+
|
| 34 |
+
## Key constraints
|
| 35 |
+
|
| 36 |
+
- `lib/schema.ts` defines both input schemas (`MetricsInputSchema`, `TeamContextSchema`) and output schema (`InterpretationReportSchema`). The API route validates both directions β 400 for bad input, 422 if the LLM returns a malformed report.
|
| 37 |
+
- `lib/benchmarks.ts` sanitizes `data/benchmarks.json` before Zod validation (Ollama sometimes returns arrays instead of strings in pattern fields).
|
| 38 |
+
- `next.config.ts` sets `output: 'standalone'` (required for Docker) and `serverExternalPackages: ['pdf-parse']`.
|
| 39 |
+
- `tsconfig.json` has a `"ts-node"` override block with `module: "CommonJS"` so scripts in `scripts/` can use `require`-style resolution while the Next.js app uses bundler resolution.
|
| 40 |
+
- The `sessionStorage` key is defined in `lib/constants.ts` as `REPORT_SESSION_KEY` β use that constant, not the string literal.
|
| 41 |
+
- Print/PDF export uses `window.print()` with `@media print` CSS in `globals.css`. Elements to hide during print get the `print-hide` class.
|
| 42 |
+
|
| 43 |
+
## Deployment
|
| 44 |
+
|
| 45 |
+
The app is deployed on HuggingFace Spaces at `rdlf/devops-metrics-interpreter`. To update, push to the `hf-deploy` branch (orphan, no PDF history) and force-push to `hf:main`. Do not push `data/pdfs/` β those files exceed HF's 10MB limit and are gitignored.
|
README.md
CHANGED
|
@@ -11,6 +11,22 @@ pinned: false
|
|
| 11 |
|
| 12 |
Enter your team's DevOps metrics and get a plain-language interpretation compared to DORA benchmark data from the State of DevOps Reports.
|
| 13 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
## Running locally
|
| 15 |
|
| 16 |
```bash
|
|
@@ -24,6 +40,6 @@ npm run dev
|
|
| 24 |
|
| 25 |
| Variable | Value |
|
| 26 |
|----------|-------|
|
| 27 |
-
| `OLLAMA_BASE_URL` | `https://
|
| 28 |
| `OPENAI_API_KEY` | Your HuggingFace token (`hf_...`) |
|
| 29 |
-
| `LLM_MODEL` | `
|
|
|
|
| 11 |
|
| 12 |
Enter your team's DevOps metrics and get a plain-language interpretation compared to DORA benchmark data from the State of DevOps Reports.
|
| 13 |
|
| 14 |
+
## How report generation works
|
| 15 |
+
|
| 16 |
+
Reports are generated by combining three sources of knowledge, all extracted from the State of DevOps Report PDFs and stored in `data/benchmarks.json`:
|
| 17 |
+
|
| 18 |
+
1. **Benchmark tiers** β elite/high/medium/low bands for the four DORA metrics (deployment frequency, lead time, change failure rate, MTTR)
|
| 19 |
+
2. **47 patterns** β metric combinations and what they signal (e.g. high deploy frequency + high failure rate β missing release safeguards)
|
| 20 |
+
3. **124 key insights** β statistics and findings from the reports (e.g. "elite performers deploy 208x more frequently than low performers")
|
| 21 |
+
|
| 22 |
+
All three are injected into the LLM's system prompt on every request alongside the team's submitted metrics. The LLM is instructed to reference benchmark bands explicitly and ground its improvement recommendations in the extracted knowledge.
|
| 23 |
+
|
| 24 |
+
Report quality depends on what was captured during extraction. To re-extract with a better model:
|
| 25 |
+
|
| 26 |
+
```bash
|
| 27 |
+
LLM_MODEL=gpt-4o npm run extract
|
| 28 |
+
```
|
| 29 |
+
|
| 30 |
## Running locally
|
| 31 |
|
| 32 |
```bash
|
|
|
|
| 40 |
|
| 41 |
| Variable | Value |
|
| 42 |
|----------|-------|
|
| 43 |
+
| `OLLAMA_BASE_URL` | `https://router.huggingface.co/v1` |
|
| 44 |
| `OPENAI_API_KEY` | Your HuggingFace token (`hf_...`) |
|
| 45 |
+
| `LLM_MODEL` | `Qwen/Qwen2.5-72B-Instruct` |
|
data/benchmarks.json
CHANGED
|
@@ -1,607 +1,229 @@
|
|
| 1 |
{
|
| 2 |
"deploymentFrequency": {
|
| 3 |
-
"elite": "On
|
| 4 |
-
"high": "Between once per day and once per week",
|
| 5 |
-
"medium": "Between once per week and once per month",
|
| 6 |
-
"low": "Between once per month and once every six months"
|
| 7 |
},
|
| 8 |
"leadTime": {
|
| 9 |
-
"elite": "Less than one day
|
| 10 |
-
"high": "
|
| 11 |
-
"medium": "
|
| 12 |
-
"low": "
|
| 13 |
},
|
| 14 |
"changeFailureRate": {
|
| 15 |
"elite": "5%",
|
| 16 |
"high": "20%",
|
| 17 |
-
"medium": "10%",
|
| 18 |
"low": "40%"
|
| 19 |
},
|
| 20 |
"mttr": {
|
| 21 |
-
"elite": "Less than one hour
|
| 22 |
-
"high": "
|
| 23 |
-
"medium": "
|
| 24 |
-
"low": "
|
| 25 |
},
|
| 26 |
"patterns": [
|
| 27 |
{
|
| 28 |
-
"id": "
|
| 29 |
-
"signature": "high
|
| 30 |
-
"interpretation": "
|
| 31 |
"improvements": [
|
| 32 |
-
"
|
| 33 |
-
"
|
|
|
|
|
|
|
| 34 |
]
|
| 35 |
},
|
| 36 |
{
|
| 37 |
-
"id": "
|
| 38 |
-
"signature": "
|
| 39 |
-
"interpretation": "
|
| 40 |
"improvements": [
|
| 41 |
-
"
|
| 42 |
-
"
|
|
|
|
|
|
|
| 43 |
]
|
| 44 |
},
|
| 45 |
{
|
| 46 |
-
"id": "
|
| 47 |
-
"signature": "
|
| 48 |
-
"interpretation": "
|
| 49 |
"improvements": [
|
| 50 |
-
"
|
| 51 |
-
"
|
|
|
|
|
|
|
| 52 |
]
|
| 53 |
},
|
| 54 |
{
|
| 55 |
-
"id": "
|
| 56 |
-
"signature": "
|
| 57 |
-
"interpretation": "
|
| 58 |
"improvements": [
|
| 59 |
-
"
|
| 60 |
-
"
|
|
|
|
|
|
|
| 61 |
]
|
| 62 |
},
|
| 63 |
{
|
| 64 |
-
"id": "
|
| 65 |
-
"signature": "high
|
| 66 |
-
"interpretation": "
|
| 67 |
"improvements": [
|
| 68 |
-
"
|
| 69 |
-
"Invest in
|
|
|
|
|
|
|
| 70 |
]
|
| 71 |
},
|
| 72 |
{
|
| 73 |
-
"id": "
|
| 74 |
-
"signature": "
|
| 75 |
-
"interpretation": "
|
| 76 |
"improvements": [
|
| 77 |
-
"
|
| 78 |
-
"
|
|
|
|
|
|
|
| 79 |
]
|
| 80 |
},
|
| 81 |
{
|
| 82 |
-
"id":
|
| 83 |
-
"signature":
|
| 84 |
-
"interpretation":
|
| 85 |
-
"improvements": []
|
| 86 |
-
},
|
| 87 |
-
{
|
| 88 |
-
"id": "vacuum hypothesis",
|
| 89 |
-
"signature": [
|
| 90 |
-
"increase in productivity and flow",
|
| 91 |
-
"decrease in time spent doing valuable work",
|
| 92 |
-
"no change in toil and burnout"
|
| 93 |
-
],
|
| 94 |
-
"interpretation": "AI expedites the realization of valuable work, creating a 'vacuum' of extra time",
|
| 95 |
-
"improvements": [
|
| 96 |
-
"increased productivity",
|
| 97 |
-
"increased flow",
|
| 98 |
-
"improved job satisfaction"
|
| 99 |
-
]
|
| 100 |
-
},
|
| 101 |
-
{
|
| 102 |
-
"id": "Transformational Leadership",
|
| 103 |
-
"signature": "Strong leaders with clear vision, inspirational communication, supportive leadership, and personal recognition lead to better outcomes.",
|
| 104 |
-
"interpretation": "Transformational leadership is strongly correlated with organizational performance, product performance, team performance, productivity, burnout, job satisfaction.",
|
| 105 |
-
"improvements": [
|
| 106 |
-
"Focus on user needs",
|
| 107 |
-
"Prioritize stability",
|
| 108 |
-
"Provide adequate resources and funding"
|
| 109 |
-
]
|
| 110 |
-
},
|
| 111 |
-
{
|
| 112 |
-
"id": "User-Centric Approach",
|
| 113 |
-
"signature": "Teams that focus on user needs and collect, track, and respond to user feedback have the highest levels of organizational performance.",
|
| 114 |
-
"interpretation": "User-centered teams have 40% higher level of organizational performance compared to those that did not.",
|
| 115 |
-
"improvements": [
|
| 116 |
-
"Collect and respond to user feedback",
|
| 117 |
-
"Prioritize user needs",
|
| 118 |
-
"Make better products"
|
| 119 |
-
]
|
| 120 |
-
},
|
| 121 |
-
{
|
| 122 |
-
"id": "1",
|
| 123 |
-
"signature": "Working in small batches, Strong version control practices, and Clear + communicated AI stance",
|
| 124 |
-
"interpretation": "Amplifying the positive impact of AI on performance",
|
| 125 |
-
"improvements": [
|
| 126 |
-
"Investing in foundational systems, Prioritizing user-centric focus"
|
| 127 |
-
]
|
| 128 |
-
},
|
| 129 |
-
{
|
| 130 |
-
"id": "2",
|
| 131 |
-
"signature": "Healthy data ecosystems, AI-accessible internal data, and User-centric focus",
|
| 132 |
-
"interpretation": "Unlocking the value of AI in software development",
|
| 133 |
-
"improvements": [
|
| 134 |
-
"Focusing on quality internal platforms, Implementing strong version control practices"
|
| 135 |
-
]
|
| 136 |
-
},
|
| 137 |
-
{
|
| 138 |
-
"id": "Small-Batch Development",
|
| 139 |
-
"signature": "Teams break down work into smallest possible chunks, deliver independently, and integrate regularly.",
|
| 140 |
-
"interpretation": "Improving feedback on units of work and team effectiveness by working in small batches.",
|
| 141 |
-
"improvements": [
|
| 142 |
-
"Friction",
|
| 143 |
-
"Individual Effectiveness"
|
| 144 |
-
]
|
| 145 |
-
},
|
| 146 |
-
{
|
| 147 |
-
"id": "3",
|
| 148 |
-
"signature": "Trunk-based development with branch-by-abstraction can support a small-batch approach.",
|
| 149 |
-
"interpretation": "Teams that use trunk-based development with branch-by-abstraction tend to have faster lead times, lower change failure rates, and lower MTTR.",
|
| 150 |
-
"improvements": [
|
| 151 |
-
"Implement trunk-based development",
|
| 152 |
-
"Use branch-by-abstraction to make large-scale refactors incrementally"
|
| 153 |
-
]
|
| 154 |
-
},
|
| 155 |
-
{
|
| 156 |
-
"id": "Cluster 1: Foundational challenges",
|
| 157 |
-
"signature": [
|
| 158 |
-
"Burnout",
|
| 159 |
-
"Friction",
|
| 160 |
-
"Low product performance",
|
| 161 |
-
"Low software delivery throughput"
|
| 162 |
-
],
|
| 163 |
-
"interpretation": "Stuck in survival mode, facing significant challenges with fundamental gaps in their processes, environment, and outcomes.",
|
| 164 |
-
"improvements": [
|
| 165 |
-
"Improve individual effectiveness",
|
| 166 |
-
"Stabilize software and operational environment",
|
| 167 |
-
"Reduce burnout and friction"
|
| 168 |
-
]
|
| 169 |
-
},
|
| 170 |
-
{
|
| 171 |
-
"id": "Cluster 2: The legacy bottleneck",
|
| 172 |
-
"signature": [
|
| 173 |
-
"Software delivery instability",
|
| 174 |
-
"Burnout",
|
| 175 |
-
"Friction"
|
| 176 |
-
],
|
| 177 |
-
"interpretation": "Constant state of reaction, where unstable systems dictate their work and undermine their morale.",
|
| 178 |
-
"improvements": [
|
| 179 |
-
"Improve software delivery stability",
|
| 180 |
-
"Reduce burnout and friction",
|
| 181 |
-
"Improve team well-being"
|
| 182 |
-
]
|
| 183 |
-
},
|
| 184 |
-
{
|
| 185 |
-
"id": "Cluster 3: Constrained by process",
|
| 186 |
-
"signature": [
|
| 187 |
-
"Burnout",
|
| 188 |
-
"Friction",
|
| 189 |
-
"Low software delivery throughput"
|
| 190 |
-
],
|
| 191 |
-
"interpretation": "Running on a treadmill, where inefficient processes consume their effort and lead to high burnout and low impact.",
|
| 192 |
-
"improvements": [
|
| 193 |
-
"Improve process efficiency",
|
| 194 |
-
"Reduce burnout and friction",
|
| 195 |
-
"Improve software delivery throughput"
|
| 196 |
-
]
|
| 197 |
-
},
|
| 198 |
-
{
|
| 199 |
-
"id": "Cluster 4: High impact, low cadence",
|
| 200 |
-
"signature": [
|
| 201 |
-
"High individual effectiveness",
|
| 202 |
-
"Low software delivery throughput",
|
| 203 |
-
"High instability"
|
| 204 |
-
],
|
| 205 |
-
"interpretation": "High-impact work, coupled with low-cadence delivery model and high instability.",
|
| 206 |
-
"improvements": [
|
| 207 |
-
"Improve software delivery throughput",
|
| 208 |
-
"Reduce instability",
|
| 209 |
-
"Maintain high individual effectiveness"
|
| 210 |
-
]
|
| 211 |
-
},
|
| 212 |
-
{
|
| 213 |
-
"id": "Cluster 5: Stable and methodical",
|
| 214 |
-
"signature": [
|
| 215 |
-
"Individual effectiveness is high",
|
| 216 |
-
"Software delivery instability is high"
|
| 217 |
-
],
|
| 218 |
-
"interpretation": "Unknown",
|
| 219 |
-
"improvements": []
|
| 220 |
-
},
|
| 221 |
-
{
|
| 222 |
-
"id": "Cluster 6: Pragmatic performers",
|
| 223 |
-
"signature": [
|
| 224 |
-
"Unknown"
|
| 225 |
-
],
|
| 226 |
-
"interpretation": "Unknown",
|
| 227 |
-
"improvements": []
|
| 228 |
-
},
|
| 229 |
-
{
|
| 230 |
-
"id": "Cluster 7: Harmonious high-achievers",
|
| 231 |
-
"signature": [
|
| 232 |
-
"High individual effectiveness",
|
| 233 |
-
"Low software delivery throughput",
|
| 234 |
-
"High instability"
|
| 235 |
-
],
|
| 236 |
-
"interpretation": "Unknown",
|
| 237 |
-
"improvements": []
|
| 238 |
-
},
|
| 239 |
-
{
|
| 240 |
-
"id": "Harmonious high-achievers",
|
| 241 |
-
"signature": "stable, low-friction environment",
|
| 242 |
-
"interpretation": "delivering high-quality work sustainably and without burnout",
|
| 243 |
-
"improvements": [
|
| 244 |
-
"improve team processes for efficiency",
|
| 245 |
-
"increase engagement drivers"
|
| 246 |
-
]
|
| 247 |
-
},
|
| 248 |
-
{
|
| 249 |
-
"id": "Pragmatic performers",
|
| 250 |
-
"signature": "consistent delivery with impressive speed and stability",
|
| 251 |
-
"interpretation": "effective teams with some room for improvement",
|
| 252 |
-
"improvements": [
|
| 253 |
-
"improve stability and reliability",
|
| 254 |
-
"foster peak engagement"
|
| 255 |
-
]
|
| 256 |
-
},
|
| 257 |
-
{
|
| 258 |
-
"id": "Stable and methodical",
|
| 259 |
-
"signature": "deliberate pace of work",
|
| 260 |
-
"interpretation": "low stability and reliability",
|
| 261 |
"improvements": [
|
| 262 |
-
"
|
| 263 |
-
"
|
|
|
|
|
|
|
| 264 |
]
|
| 265 |
},
|
| 266 |
{
|
| 267 |
-
"id": "
|
| 268 |
-
"signature": "low
|
| 269 |
-
"interpretation": " teams
|
| 270 |
"improvements": [
|
| 271 |
-
"
|
| 272 |
-
"
|
|
|
|
|
|
|
| 273 |
]
|
| 274 |
},
|
| 275 |
{
|
| 276 |
-
"id": "
|
| 277 |
-
"signature": "
|
| 278 |
-
"interpretation": "
|
| 279 |
"improvements": [
|
| 280 |
-
"
|
| 281 |
-
"
|
| 282 |
-
"
|
|
|
|
| 283 |
]
|
| 284 |
},
|
| 285 |
{
|
| 286 |
-
"id": "
|
| 287 |
-
"signature": "
|
| 288 |
-
"interpretation": "
|
| 289 |
"improvements": [
|
| 290 |
-
"
|
|
|
|
|
|
|
|
|
|
| 291 |
]
|
| 292 |
},
|
| 293 |
{
|
| 294 |
-
"id": "
|
| 295 |
-
"signature": "
|
| 296 |
-
"interpretation": "
|
| 297 |
"improvements": [
|
| 298 |
-
"
|
|
|
|
|
|
|
|
|
|
| 299 |
]
|
| 300 |
},
|
| 301 |
-
{},
|
| 302 |
-
{
|
| 303 |
-
"id": "Flowing Cluster",
|
| 304 |
-
"signature": [
|
| 305 |
-
"Less than one hour",
|
| 306 |
-
"0%-15%",
|
| 307 |
-
"Usually meet expectations",
|
| 308 |
-
"Less than one day",
|
| 309 |
-
"On demand (multiple deploys per day)"
|
| 310 |
-
],
|
| 311 |
-
"interpretation": "Flowing cluster performs well across all characteristics: high reliability, high stability, high throughput.",
|
| 312 |
-
"improvements": []
|
| 313 |
-
},
|
| 314 |
-
{
|
| 315 |
-
"id": "Slowing Cluster",
|
| 316 |
-
"signature": [
|
| 317 |
-
"Less than one day",
|
| 318 |
-
"0%-15%",
|
| 319 |
-
"Usually meet expectations",
|
| 320 |
-
"Between one week and one month",
|
| 321 |
-
"Between once per week and once per month"
|
| 322 |
-
],
|
| 323 |
-
"interpretation": "Slowing cluster: team that is incrementally improving, but they and their customers are mostly happy with the current state of their application or product.",
|
| 324 |
-
"improvements": []
|
| 325 |
-
},
|
| 326 |
-
{
|
| 327 |
-
"id": "Starting Cluster",
|
| 328 |
-
"signature": [
|
| 329 |
-
"Between one day and one week",
|
| 330 |
-
"31%-45%",
|
| 331 |
-
"Sometimes meet expectations",
|
| 332 |
-
"Between once per week and once per month"
|
| 333 |
-
],
|
| 334 |
-
"interpretation": "Starting cluster performs neither well nor poorly across any of our dimensions. This cluster might be in the early stages of their product, feature, or serviceβs development.",
|
| 335 |
-
"improvements": []
|
| 336 |
-
},
|
| 337 |
-
{
|
| 338 |
-
"id": "Retiring Cluster",
|
| 339 |
-
"signature": [
|
| 340 |
-
"46%-60%",
|
| 341 |
-
"Usually meet expectations",
|
| 342 |
-
"Between one month and six months",
|
| 343 |
-
"Between once per month and once every 6 months"
|
| 344 |
-
],
|
| 345 |
-
"interpretation": "Retiring cluster: team working on a service or application that is still valuable to them and their customers, but no longer under active development.",
|
| 346 |
-
"improvements": []
|
| 347 |
-
},
|
| 348 |
{
|
| 349 |
-
"id": "
|
| 350 |
-
"signature": "
|
| 351 |
-
"interpretation": "
|
| 352 |
"improvements": [
|
| 353 |
-
"
|
| 354 |
-
"
|
| 355 |
-
"
|
| 356 |
-
"
|
| 357 |
]
|
| 358 |
},
|
| 359 |
{
|
| 360 |
-
"id": "
|
| 361 |
-
"signature": "
|
| 362 |
-
"interpretation": "
|
| 363 |
"improvements": [
|
| 364 |
-
"
|
| 365 |
-
"
|
|
|
|
|
|
|
| 366 |
]
|
| 367 |
},
|
| 368 |
{
|
| 369 |
-
"id": "
|
| 370 |
-
"signature": "
|
| 371 |
-
"interpretation": "
|
| 372 |
"improvements": [
|
| 373 |
-
"
|
| 374 |
-
"
|
| 375 |
-
"
|
| 376 |
-
"
|
| 377 |
-
"measured service"
|
| 378 |
]
|
| 379 |
},
|
| 380 |
{
|
| 381 |
-
"id": "
|
| 382 |
-
"signature": "
|
| 383 |
-
"interpretation": "
|
| 384 |
-
"improvements": [
|
| 385 |
-
"availability"
|
| 386 |
-
]
|
| 387 |
-
},
|
| 388 |
-
{
|
| 389 |
-
"id": "...",
|
| 390 |
-
"signature": "J Curve",
|
| 391 |
-
"interpretation": "...",
|
| 392 |
-
"improvements": [
|
| 393 |
-
"adoption threshold",
|
| 394 |
-
"cross-functional collaboration"
|
| 395 |
-
]
|
| 396 |
-
},
|
| 397 |
-
{
|
| 398 |
-
"id": "loosely-coupled-architecture",
|
| 399 |
-
"signature": "41% more likely to have systems based on a loosely-coupled architecture",
|
| 400 |
-
"interpretation": "Teams that focus on building software with loosely-coupled architectures are in a better position to perform strongly across stability, reliability, and throughput.",
|
| 401 |
-
"improvements": [
|
| 402 |
-
"Loosely-coupled architecture leads to faster feedback through independent testing."
|
| 403 |
-
],
|
| 404 |
-
"data": "41%, 43% more likely to... - error proneness and deployment"
|
| 405 |
-
},
|
| 406 |
-
{
|
| 407 |
-
"id": "cd-impact-on-performance",
|
| 408 |
-
"signature": "When combined with loosely-coupled architectures and CD, teams may have a negative impact on performance.",
|
| 409 |
-
"interpretation": "The combination of loosely-coupled architectures and CD may lead to issues such as anticipation of error proneness.",
|
| 410 |
-
"improvements": [
|
| 411 |
-
"Be aware of potential issues when implementing these practices together."
|
| 412 |
-
],
|
| 413 |
-
"data": null
|
| 414 |
-
},
|
| 415 |
-
{
|
| 416 |
-
"id": "Healthy, high-performing teams have good security practices",
|
| 417 |
-
"signature": [
|
| 418 |
-
"Having good security practices",
|
| 419 |
-
"Healthy, high-performing teams"
|
| 420 |
-
],
|
| 421 |
-
"interpretation": "Good security practices are associated with healthy, high-performing teams",
|
| 422 |
-
"improvements": []
|
| 423 |
-
},
|
| 424 |
-
{
|
| 425 |
-
"id": "Trunk-based development negatively impacts software delivery performance",
|
| 426 |
-
"signature": [
|
| 427 |
-
"Trunk-based development",
|
| 428 |
-
"Negative impact on software delivery performance"
|
| 429 |
-
],
|
| 430 |
-
"interpretation": "Trunk-based development practices have a negative impact on software delivery performance compared to previous research",
|
| 431 |
-
"improvements": []
|
| 432 |
-
},
|
| 433 |
-
{
|
| 434 |
-
"id": "Documentation practices negatively impact software delivery performance",
|
| 435 |
-
"signature": [
|
| 436 |
-
"Documentation practices",
|
| 437 |
-
"Negative impact on software delivery performance"
|
| 438 |
-
],
|
| 439 |
-
"interpretation": "Documentation practices have a negative impact on software delivery performance",
|
| 440 |
-
"improvements": []
|
| 441 |
-
},
|
| 442 |
-
{
|
| 443 |
-
"id": "Some tech capabilities may predict burnout",
|
| 444 |
-
"signature": [
|
| 445 |
-
"Some tech capabilities",
|
| 446 |
-
"Predict burnout"
|
| 447 |
-
],
|
| 448 |
-
"interpretation": "Some tech capabilities may predict burnout, but more research is needed",
|
| 449 |
-
"improvements": []
|
| 450 |
-
},
|
| 451 |
-
{
|
| 452 |
-
"id": "Reliability engineering practices negatively impact software delivery performance",
|
| 453 |
-
"signature": [
|
| 454 |
-
"Reliability engineering practices",
|
| 455 |
-
"Negative impact on software delivery performance"
|
| 456 |
-
],
|
| 457 |
-
"interpretation": "Reliability engineering practices have a negative impact on software delivery performance",
|
| 458 |
-
"improvements": []
|
| 459 |
-
},
|
| 460 |
-
{
|
| 461 |
-
"id": "SLSA-related practices serve as a mechanism for technical capabilities to impact performance",
|
| 462 |
-
"signature": [
|
| 463 |
-
"SLSA-related practices",
|
| 464 |
-
"Mechanism for technical capabilities to impact performance"
|
| 465 |
-
],
|
| 466 |
-
"interpretation": "SLSA-related practices serve as a mechanism for technical capabilities to impact performance",
|
| 467 |
-
"improvements": []
|
| 468 |
-
},
|
| 469 |
-
{
|
| 470 |
-
"id": "not specified",
|
| 471 |
-
"signature": "combinations of deployment frequency, lead time, time to restore a service, and change failure rate",
|
| 472 |
-
"interpretation": " indicates performance",
|
| 473 |
"improvements": [
|
| 474 |
-
"
|
| 475 |
-
"
|
| 476 |
-
"
|
| 477 |
-
"reduce
|
| 478 |
]
|
| 479 |
}
|
| 480 |
],
|
| 481 |
"keyInsights": [
|
| 482 |
-
"
|
| 483 |
-
"
|
| 484 |
-
"
|
| 485 |
-
"
|
| 486 |
-
"
|
| 487 |
-
"
|
| 488 |
-
"
|
| 489 |
-
"
|
| 490 |
-
"
|
| 491 |
-
"
|
| 492 |
-
"
|
| 493 |
-
"
|
| 494 |
-
"
|
| 495 |
-
"
|
| 496 |
-
"
|
| 497 |
-
"
|
| 498 |
-
"
|
| 499 |
-
"
|
| 500 |
-
"
|
| 501 |
-
"
|
| 502 |
-
"
|
| 503 |
-
"
|
| 504 |
-
"
|
| 505 |
-
"
|
| 506 |
-
"
|
| 507 |
-
"
|
| 508 |
-
"
|
| 509 |
-
"
|
| 510 |
-
"
|
| 511 |
-
"
|
| 512 |
-
"
|
| 513 |
-
"
|
| 514 |
-
"
|
| 515 |
-
"
|
| 516 |
-
"Companies that adopt a mindset of continuous improvement see the highest levels of success.",
|
| 517 |
-
"The goal of the organization and your teams should be to simply be a little better than you were yesterday.",
|
| 518 |
-
"The benefits gained by speed and stability are diminished as higher performance becomes ubiquitous.",
|
| 519 |
-
"AI is an amplifier, Amplifies strengths of high-performing organizations and dysfunctions of struggling ones",
|
| 520 |
-
"Investing in foundational systems and user-centric focus can help unlock AI's potential",
|
| 521 |
-
"A healthy data ecosystem is the reliable, context-aware foundation for the entire DORA AI Capabilities Model",
|
| 522 |
-
"High-quality data is only useful if AI tools can securely connect to it",
|
| 523 |
-
"Making internal data AI accessible amplifies AI adoption's positive influence on individual effectiveness and code quality",
|
| 524 |
-
"AI-accessible internal data moderates AI's impact on individual effectiveness",
|
| 525 |
-
"AI-accessible internal data moderates AI's impact on code quality",
|
| 526 |
-
"AI-accessible internal data boosts individual effectiveness and code quality by providing relevant, internal context.",
|
| 527 |
-
"Connecting AI to internal code and documentation, or βcontext engineering,β transforms the AI from a generic assistant into a specialized expert.",
|
| 528 |
-
"Merge conflicts are often a sign of process issues.",
|
| 529 |
-
"Frequent or complex merge conflicts occur when development branches are not kept short-lived.",
|
| 530 |
-
"Working in small batches is essential for high-performing teams.",
|
| 531 |
-
"Teams that prioritize working in small batches tend to have faster lead times, lower change failure rates, and lower MTTR.",
|
| 532 |
-
"Using feature flags and trunk-based development with branch-by-abstraction can support a small-batch approach.",
|
| 533 |
-
"User-centric focus moderates AI's impact on team performance.",
|
| 534 |
-
"Regularly survey developers to track satisfaction",
|
| 535 |
-
"Use DORA metrics to measure software delivery performance",
|
| 536 |
-
"Instrument and expose metrics for every service to empower developers",
|
| 537 |
-
"Teams in Cluster 1: Foundational challenges are stuck in survival mode",
|
| 538 |
-
"Teams in Cluster 2: The legacy bottleneck are in a constant state of reaction",
|
| 539 |
-
"Teams in Cluster 3: Constrained by process are running on a treadmill",
|
| 540 |
-
"Teams in Cluster 4: High impact, low cadence produce high-impact work but with low-cadence delivery model",
|
| 541 |
-
"15% of survey respondents are in cluster 5",
|
| 542 |
-
"20% of survey respondents are in cluster 6",
|
| 543 |
-
"20% of survey respondents are in cluster 7",
|
| 544 |
-
"The total active process time (PT) is only about one day (~24.5 hours), while the total wait time is nearly four days (~92.5 hours)",
|
| 545 |
-
"The total end-to-end lead time is roughly 117 hours and a flow efficiency of ~21%",
|
| 546 |
-
"Work is sitting in a wait state for nearly four-fifths of the time",
|
| 547 |
-
"Inconsistent code reviews and a high rate of rework and bugs discovered late in the process",
|
| 548 |
-
"Developers spend a large amount of process time searching for information or understanding existing code",
|
| 549 |
-
"Teams with faster code reviews have 50% higher software delivery performance",
|
| 550 |
-
"By applying VSM to internal developer journeys, teams transform platform engineering from a cost center to a value driver",
|
| 551 |
-
"Research by DORA and others has shown that AI, in and of itself, is not assured to be beneficial. It must be adopted with care.",
|
| 552 |
-
"To achieve a return on investments made in acquiring and adapting to AI, teams should also attend to how they communicate, collaborate, and operate across their broader sociotechnical context.",
|
| 553 |
-
"Seven key predictors of success with AI have been identified, which are presented in the DORA AI Capabilities Model.",
|
| 554 |
-
"For the first time, no cluster is considered elite.",
|
| 555 |
-
"The percentage of high performers is at a 4-year low, while the percentage of low performers rose dramatically, from 7% in 2021 to 19% this year.",
|
| 556 |
-
"Over two-thirds of this yearβs respondents fall into the medium cluster.",
|
| 557 |
-
"Teams with high short-term organizational performance may experience burnout and other negative outcomes in the long term.",
|
| 558 |
-
"Organizations with high reliability and speed are more likely to have low burnout and unplanned work.",
|
| 559 |
-
"Reliability may not be enough to achieve high organizational performance without speed and stability.",
|
| 560 |
-
"The use of cloud computing has a positive impact on overall organizational performance",
|
| 561 |
-
"Respondents that used cloud were 14% more likely to exceed in organizational performance goals",
|
| 562 |
-
"Cloud users scored 16% higher on cultural outcomes",
|
| 563 |
-
"The use of cloud-native applications stands out with positive signals on everything surveyed",
|
| 564 |
-
"The use of any cloud computing platform, public or private, positively contributes to culture and work environment outcomes",
|
| 565 |
-
"The use of hybrid and multi-cloud has a positive impact on organizational performance",
|
| 566 |
-
"Practitioners who used multiple clouds showed a 1.4x higher organizational performance compared to non-cloud users",
|
| 567 |
-
"The five characteristics of cloud computing are crucial for achieving better organizational performance",
|
| 568 |
-
"High performers meet reliability targets",
|
| 569 |
-
"Respondents with higher-than-average use of CI, CD, and version control have 3.8x higher organizational performance",
|
| 570 |
-
"Continuous integration drives delivery performance",
|
| 571 |
-
"Acknowledge the J Curve of change",
|
| 572 |
-
"Teams that focus on building software with loosely-coupled architectures",
|
| 573 |
-
"Teams that combine version control and continuous delivery are 2.5x more likely to have high software delivery performance.",
|
| 574 |
-
"Loosely-coupled architecture is associated with increased stability, reliability, and throughput, and also more likely to recommend workplace.",
|
| 575 |
-
"A generative culture is associated with higher levels of organizational performance compared to organizations characterized by a bureaucratic or pathological culture.",
|
| 576 |
-
"Employees at organizations with a generative culture are more likely to belong to stable teams, produce higher-quality documentation, and spend most of their time engaged in meaningful work.",
|
| 577 |
-
"Organizations with higher levels of employee flexibility have higher organizational performance compared to organizations with more rigid work arrangements.",
|
| 578 |
-
"A generative culture is associated with lower rates of employee burnout.",
|
| 579 |
-
"Stable teams β teams whose composition hadnβt changed much over the last 12 months β were more likely to exist within high-performing organizations.",
|
| 580 |
-
"Healthier cultures have a head start: Organizational culture is a primary driver of software development security practices, with higher trust, βblamelessβ cultures are more likely to establish SLSA and SSDF practices than lower-trust organizational cultures.",
|
| 581 |
-
"Adoption has already begun: Software supply chain security practices embodied in SLSA and SSDF already see modest adoption, but there is ample room for more.",
|
| 582 |
-
"Thereβs a key integration point: Adoption of the technical aspects of software supply chain security appears to hinge on the use of CI/CD, which often provides the integration platform for many supply chain security practices.",
|
| 583 |
-
"It provides unexpected benefits: Besides a reduction in security risks, better security practices carry additional advantages, such as reduced burnout.",
|
| 584 |
-
"Having systems for source control, continuous integration, and continuous delivery were all linked with also having more firmly established SLSA practices.",
|
| 585 |
-
"CI directly precedes code reviews, and is when vulnerability scanners and other code analysis tools are run.",
|
| 586 |
-
"Security scanning on development workstations can save time and effort for software engineers.",
|
| 587 |
-
"Good security practices can reduce an organizationβs security risk and positively impact software delivery performance, but CI is necessary for these effects.",
|
| 588 |
-
"Organizational culture and modern development processes (such as continuous integration) are the biggest drivers of an organizationβs application development security.",
|
| 589 |
-
"SLSA and SSDF practices appear to work as intended",
|
| 590 |
-
"Healthy, high-performing teams have good security practices",
|
| 591 |
-
"Trunk-based development negatively impacts software delivery performance",
|
| 592 |
-
"Documentation practices negatively impact software delivery performance",
|
| 593 |
-
"Some tech capabilities may predict burnout",
|
| 594 |
-
"Reliability engineering practices negatively impact software delivery performance",
|
| 595 |
-
"SLSA-related practices serve as a mechanism for technical capabilities to impact performance",
|
| 596 |
-
"65% of respondents work on teams with 12 people or fewer.",
|
| 597 |
-
"75% of respondents work on teams with 12 people or fewer.",
|
| 598 |
-
"22% of respondents are at companies with more than 10,000 employees.",
|
| 599 |
-
"85% of respondents consist of individuals who either work on development or engineering teams (26%), work on DevOps or SRE teams (23%), work on IT ops or infrastructure teams (19%), or are managers (17%)",
|
| 600 |
-
"Technical capabilities build upon each other to create better performance.",
|
| 601 |
-
"The use of cloud has many benefits.",
|
| 602 |
-
"Workplace culture and flexibility lead to better organizational performance.",
|
| 603 |
-
"Employee burnout prevents organizations from reaching their goals.",
|
| 604 |
-
"high performers deploy on demand, multiple times per day",
|
| 605 |
-
"low performers deploy between once every month to once every six months"
|
| 606 |
]
|
| 607 |
-
}
|
|
|
|
| 1 |
{
|
| 2 |
"deploymentFrequency": {
|
| 3 |
+
"elite": "On demand β multiple deploys per day (19% of teams in 2024)",
|
| 4 |
+
"high": "Between once per day and once per week (22% of teams in 2024)",
|
| 5 |
+
"medium": "Between once per week and once per month (35% of teams in 2024)",
|
| 6 |
+
"low": "Between once per month and once every six months (25% of teams in 2024)"
|
| 7 |
},
|
| 8 |
"leadTime": {
|
| 9 |
+
"elite": "Less than one day",
|
| 10 |
+
"high": "Between one day and one week",
|
| 11 |
+
"medium": "Between one week and one month",
|
| 12 |
+
"low": "Between one month and six months"
|
| 13 |
},
|
| 14 |
"changeFailureRate": {
|
| 15 |
"elite": "5%",
|
| 16 |
"high": "20%",
|
| 17 |
+
"medium": "10% (note: medium cluster can have low CFR but poor throughput)",
|
| 18 |
"low": "40%"
|
| 19 |
},
|
| 20 |
"mttr": {
|
| 21 |
+
"elite": "Less than one hour",
|
| 22 |
+
"high": "Less than one day",
|
| 23 |
+
"medium": "Less than one day",
|
| 24 |
+
"low": "Between one week and one month"
|
| 25 |
},
|
| 26 |
"patterns": [
|
| 27 |
{
|
| 28 |
+
"id": "high-freq-high-failure",
|
| 29 |
+
"signature": "high deployment frequency + high change failure rate (>20%)",
|
| 30 |
+
"interpretation": "The team deploys often but lacks sufficient quality gates. High velocity without stability is a sign of missing automated test coverage, insufficient code review rigour, or deploying too-large changesets. This is the most dangerous quadrant β frequent failures erode user trust and engineer morale quickly.",
|
| 31 |
"improvements": [
|
| 32 |
+
"Add automated test gates to the CI pipeline β block deployments that reduce coverage",
|
| 33 |
+
"Reduce changelist size using trunk-based development with short-lived feature branches",
|
| 34 |
+
"Introduce feature flags to decouple deployment from release, allowing safer deploys",
|
| 35 |
+
"Conduct blameless post-mortems on each failure to identify systemic causes"
|
| 36 |
]
|
| 37 |
},
|
| 38 |
{
|
| 39 |
+
"id": "low-freq-low-failure",
|
| 40 |
+
"signature": "low deployment frequency + low change failure rate",
|
| 41 |
+
"interpretation": "The team prioritises stability over speed. This is typical of the 'Slowing' cluster identified in DORA 2022 β large, infrequent releases that succeed but hold back throughput. The low failure rate is likely achieved through extensive manual testing and coordination overhead, not engineering excellence. DORA research consistently shows that high frequency and low failure rate are achievable together.",
|
| 42 |
"improvements": [
|
| 43 |
+
"Decompose large releases into smaller, independently deployable changes",
|
| 44 |
+
"Invest in automated testing to reduce reliance on manual pre-release gates",
|
| 45 |
+
"Move toward continuous delivery to enable on-demand deploys without increasing risk",
|
| 46 |
+
"Pilot trunk-based development on one service to validate the approach"
|
| 47 |
]
|
| 48 |
},
|
| 49 |
{
|
| 50 |
+
"id": "fast-lead-slow-deploy",
|
| 51 |
+
"signature": "fast lead time (< 1 week) + low deployment frequency (monthly or less)",
|
| 52 |
+
"interpretation": "Changes are ready quickly but accumulate before release, suggesting a batch release process or manual deployment gates. The bottleneck is in the release process, not the development process. This pattern is common in teams with compliance requirements or manual change advisory board (CAB) approvals.",
|
| 53 |
"improvements": [
|
| 54 |
+
"Map the value stream from code-complete to production to identify where changes wait",
|
| 55 |
+
"Automate deployment pipeline stages that currently require human intervention",
|
| 56 |
+
"Work with compliance/governance teams to define automated controls that satisfy audit requirements",
|
| 57 |
+
"Consider scheduled releases with automated deployment to reduce coordination overhead"
|
| 58 |
]
|
| 59 |
},
|
| 60 |
{
|
| 61 |
+
"id": "slow-lead-high-deploy",
|
| 62 |
+
"signature": "slow lead time (> 1 month) + high deployment frequency",
|
| 63 |
+
"interpretation": "The team deploys frequently but individual changes take a long time to be ready. This is unusual and may indicate separate fast-track deployment paths for urgent fixes vs. a slow main development track. It can also signal a large monolith where code review, build, and test times are long despite frequent deployments of accumulated changes.",
|
| 64 |
"improvements": [
|
| 65 |
+
"Instrument your CI pipeline to identify where time is spent β build, test, review, or approval",
|
| 66 |
+
"Parallelize test execution to reduce pipeline duration",
|
| 67 |
+
"Consider breaking the monolith into independently deployable services",
|
| 68 |
+
"Enforce small PR sizes β large PRs dramatically increase review and merge time"
|
| 69 |
]
|
| 70 |
},
|
| 71 |
{
|
| 72 |
+
"id": "high-mttr-high-cfr",
|
| 73 |
+
"signature": "high MTTR (> 1 day) + high change failure rate (>20%)",
|
| 74 |
+
"interpretation": "The team both creates failures frequently and takes too long to recover. This is the characteristic of the low/retiring cluster and presents significant organizational risk. DORA research shows that without reliability, software delivery performance does not translate to organisational performance. The combination doubles user impact: more incidents and longer outages.",
|
| 75 |
"improvements": [
|
| 76 |
+
"Establish on-call rotations with clear incident response runbooks",
|
| 77 |
+
"Invest in observability: structured logging, distributed tracing, and alerting on SLOs",
|
| 78 |
+
"Implement automated rollback triggered by error rate thresholds",
|
| 79 |
+
"Define Service Level Objectives (SLOs) and treat breaches as top-priority work"
|
| 80 |
]
|
| 81 |
},
|
| 82 |
{
|
| 83 |
+
"id": "monolith-high-deploy",
|
| 84 |
+
"signature": "monolithic architecture + deployment frequency of daily or higher",
|
| 85 |
+
"interpretation": "Deploying a monolith frequently is achievable but requires significant engineering discipline β comprehensive automated tests, fast build pipelines, and strong trunk-based development practices. Without these, high-frequency monolith deployment typically leads to elevated change failure rates. DORA research found loosely-coupled architecture is one of the top predictors of high performance.",
|
| 86 |
"improvements": [
|
| 87 |
+
"Ensure the monolith has comprehensive automated test coverage before increasing deploy cadence",
|
| 88 |
+
"Identify seams in the monolith where independent deployment could be introduced",
|
| 89 |
+
"Invest in build optimisation (caching, parallelism) to keep pipeline duration under 10 minutes",
|
| 90 |
+
"Use feature flags to gate risky changes without reducing deployment frequency"
|
| 91 |
]
|
| 92 |
},
|
| 93 |
{
|
| 94 |
+
"id": "microservices-high-cfr",
|
| 95 |
+
"signature": "microservices architecture + high change failure rate (>15%)",
|
| 96 |
+
"interpretation": "Microservices introduce distributed system complexity β network failures, schema drift, service versioning, and cascading failures. High CFR in a microservices context often indicates insufficient contract testing, missing consumer-driven contract tests, or inadequate integration test coverage. The DORA 2022 report found that loosely-coupled architecture drives performance only when paired with strong CD practices.",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 97 |
"improvements": [
|
| 98 |
+
"Implement consumer-driven contract testing between services",
|
| 99 |
+
"Add integration tests for critical service boundaries in the CI pipeline",
|
| 100 |
+
"Use canary deployments or blue-green deployments to limit blast radius of each release",
|
| 101 |
+
"Ensure each service has independent rollback capability"
|
| 102 |
]
|
| 103 |
},
|
| 104 |
{
|
| 105 |
+
"id": "compliance-low-freq",
|
| 106 |
+
"signature": "compliance constraints + low deployment frequency (monthly or less)",
|
| 107 |
+
"interpretation": "Compliance requirements often manifest as deployment bottlenecks, but DORA research shows that high-performing regulated teams achieve both compliance and high delivery performance. The key is embedding compliance controls into the automated pipeline rather than using manual gates. Manual change advisory boards (CABs) are consistently shown to be poor predictors of stability and a significant throughput bottleneck.",
|
| 108 |
"improvements": [
|
| 109 |
+
"Map which compliance controls are currently manual and identify which can be automated",
|
| 110 |
+
"Implement automated audit trails and change evidence collection in the CI/CD pipeline",
|
| 111 |
+
"Engage compliance and audit teams early to define automated control equivalents",
|
| 112 |
+
"Pilot continuous delivery on a lower-risk service to build organisational confidence"
|
| 113 |
]
|
| 114 |
},
|
| 115 |
{
|
| 116 |
+
"id": "large-team-high-lead-time",
|
| 117 |
+
"signature": "large team (15+ engineers) + lead time > 2 weeks",
|
| 118 |
+
"interpretation": "Lead time tends to increase with team and codebase size due to coordination overhead, longer PR review queues, slower builds, and more complex merge conflicts. DORA research shows that team structure and architecture are deeply linked β Conway's Law means that organisational structure shapes system architecture and vice versa.",
|
| 119 |
"improvements": [
|
| 120 |
+
"Evaluate whether the team can be split into smaller, independently deployable product teams",
|
| 121 |
+
"Set PR size limits and response time expectations to reduce review queue buildup",
|
| 122 |
+
"Invest in inner loop tooling (local test runners, fast feedback) to reduce developer wait time",
|
| 123 |
+
"Consider whether the architecture enables team independence or creates coupling"
|
| 124 |
]
|
| 125 |
},
|
| 126 |
{
|
| 127 |
+
"id": "long-pipeline-low-deploy",
|
| 128 |
+
"signature": "pipeline duration > 30 minutes + low deployment frequency",
|
| 129 |
+
"interpretation": "Long pipelines are a primary driver of low deployment frequency. When each deployment requires 45-60+ minutes of CI time, engineers batch changes to reduce overhead, increasing changelist size and risk. DORA research identifies fast feedback as one of the foundational capabilities of high-performing teams.",
|
| 130 |
"improvements": [
|
| 131 |
+
"Profile your pipeline to identify the slowest stages β test parallelisation is usually the biggest win",
|
| 132 |
+
"Separate fast-feedback tests (unit, contract) from slow tests (integration, E2E) using pipeline stages",
|
| 133 |
+
"Cache dependencies and build artifacts aggressively",
|
| 134 |
+
"Target a pipeline duration of under 10 minutes for the core feedback loop"
|
| 135 |
]
|
| 136 |
},
|
| 137 |
{
|
| 138 |
+
"id": "high-pr-review-time",
|
| 139 |
+
"signature": "PR review time > 24 hours + medium or low deployment frequency",
|
| 140 |
+
"interpretation": "Long PR review cycles directly increase lead time and reduce deployment frequency. They also create merge conflicts as branches diverge, further slowing the process. DORA's 2022 research specifically identified code review as a key inner loop capability. High-performing teams maintain short-lived branches and fast review cycles.",
|
| 141 |
"improvements": [
|
| 142 |
+
"Establish team norms for PR review response time (e.g., first review within 4 hours)",
|
| 143 |
+
"Break large PRs into smaller, stacked PRs that are easier to review quickly",
|
| 144 |
+
"Use automated checks (linting, tests, security scanning) to reduce cognitive load on reviewers",
|
| 145 |
+
"Adopt trunk-based development to eliminate long-lived branches entirely"
|
| 146 |
]
|
| 147 |
},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 148 |
{
|
| 149 |
+
"id": "ai-adoption-stability-risk",
|
| 150 |
+
"signature": "high AI tool adoption + increasing change failure rate",
|
| 151 |
+
"interpretation": "DORA 2024 research found that increased AI adoption correlates with a 7.2% reduction in delivery stability for every 25% increase in AI adoption. The hypothesis is that AI's code generation speed leads teams to create larger changesets, which DORA consistently shows increases failure rates. The benefit of AI is real at the individual level but requires process adaptation.",
|
| 152 |
"improvements": [
|
| 153 |
+
"Enforce changelist size limits β AI makes it easy to generate more code, but smaller changes are safer",
|
| 154 |
+
"Invest in automated test coverage for AI-generated code, which may have subtle correctness issues",
|
| 155 |
+
"Treat AI adoption as a process change requiring deliberate adjustment of deployment practices",
|
| 156 |
+
"Monitor CFR and lead time closely during periods of AI tool adoption"
|
| 157 |
]
|
| 158 |
},
|
| 159 |
{
|
| 160 |
+
"id": "platform-engineering-throughput-drop",
|
| 161 |
+
"signature": "recent internal developer platform adoption + decreasing deployment frequency or increasing lead time",
|
| 162 |
+
"interpretation": "DORA 2024 found that platform engineering teams see +8% productivity but -8% throughput and -14% change stability during adoption. This aligns with the J-Curve pattern identified in DORA 2022 for SRE adoption β transformations often show short-term regressions before long-term gains. This is expected, not a signal to abandon the platform.",
|
| 163 |
"improvements": [
|
| 164 |
+
"Acknowledge the J-Curve: set stakeholder expectations for a temporary performance dip",
|
| 165 |
+
"Measure developer experience (SPACE metrics) alongside DORA metrics during platform rollout",
|
| 166 |
+
"Prioritise golden paths that reduce toil for the most common developer tasks first",
|
| 167 |
+
"Ensure platform teams apply a product mindset β treat internal developers as users"
|
| 168 |
]
|
| 169 |
},
|
| 170 |
{
|
| 171 |
+
"id": "generative-culture-reliability",
|
| 172 |
+
"signature": "strong delivery metrics + poor reliability (MTTR > 1 day, high incident frequency)",
|
| 173 |
+
"interpretation": "DORA 2022 research found that without reliability, software delivery performance does not predict organisational success. The 'Flowing' cluster had strong delivery metrics but poor reliability, and scored lower on organisational performance than expected. Reliability is described as the most important 'feature' of any product β keeping promises to users is a necessary condition for delivery speed to generate business value.",
|
| 174 |
"improvements": [
|
| 175 |
+
"Define SLOs for your most critical user journeys and treat breaches as P0 incidents",
|
| 176 |
+
"Invest in observability before further increasing deployment frequency",
|
| 177 |
+
"Establish error budgets β when the budget is exhausted, freeze feature work and fix reliability",
|
| 178 |
+
"Build a generative team culture: trust, blameless retrospectives, and psychological safety predict better reliability outcomes"
|
|
|
|
| 179 |
]
|
| 180 |
},
|
| 181 |
{
|
| 182 |
+
"id": "scheduled-release-high-lead-time",
|
| 183 |
+
"signature": "scheduled release strategy + lead time > 2 weeks",
|
| 184 |
+
"interpretation": "Scheduled releases create artificial batch boundaries that accumulate risk. Changes that are ready are held waiting for the release window, extending lead time and increasing the size of each release. Larger releases contain more changes, making it harder to diagnose failures and increasing MTTR when they occur.",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 185 |
"improvements": [
|
| 186 |
+
"Decouple your deployment schedule from your release schedule using feature flags",
|
| 187 |
+
"Identify which aspects of the scheduled release exist for coordination reasons vs. technical reasons",
|
| 188 |
+
"Move toward release trains with shorter intervals (bi-weekly β weekly β continuous)",
|
| 189 |
+
"Use canary releases or percentage rollouts to reduce the risk of each release"
|
| 190 |
]
|
| 191 |
}
|
| 192 |
],
|
| 193 |
"keyInsights": [
|
| 194 |
+
"Elite performers deploy 182x more frequently than low performers and have 127x faster lead times (DORA 2024)",
|
| 195 |
+
"Elite performers have 2293x faster failed deployment recovery times than low performers (DORA 2024)",
|
| 196 |
+
"Elite performers have 8x lower change failure rates than low performers (DORA 2024)",
|
| 197 |
+
"In 2024, only 19% of teams achieved elite performance; 25% were in the low cluster (DORA 2024)",
|
| 198 |
+
"High performers in 2022 were estimated to have 417x more deployments than low performers (DORA 2022)",
|
| 199 |
+
"Throughput and stability are correlated β the best teams do well on all four DORA metrics simultaneously (DORA 2024)",
|
| 200 |
+
"The 2024 medium performance cluster has lower throughput but higher stability than the high cluster β neither is universally better (DORA 2024)",
|
| 201 |
+
"Improving is more important than reaching a specific performance level; the best teams achieve elite improvement, not necessarily elite performance (DORA 2024)",
|
| 202 |
+
"Industry does not meaningfully affect performance levels β high-performing teams exist in every industry (DORA 2024)",
|
| 203 |
+
"Without reliability, software delivery performance does not predict organisational success (DORA 2022)",
|
| 204 |
+
"Reliability is the most important 'feature' of any product β keeping promises to users is a necessary condition for delivery speed to generate value (DORA 2022)",
|
| 205 |
+
"Teams with generative culture (trust, collaboration) are more likely to achieve good reliability outcomes (DORA 2022)",
|
| 206 |
+
"The top technical capabilities driving high performance are: version control, continuous integration, continuous delivery, and loosely-coupled architecture (DORA 2022)",
|
| 207 |
+
"High performers who meet reliability targets are 33% more likely to use version control, 39% more likely to practice CI, 46% more likely to practice CD, and 40% more likely to have loosely-coupled architecture (DORA 2022)",
|
| 208 |
+
"Continuous Integration is 1.4x more likely to be used by high performers who meet reliability targets (DORA 2022)",
|
| 209 |
+
"Trunk-based development decreases change failure rate, error-proneness, and unplanned work β but requires experience to implement successfully (DORA 2022)",
|
| 210 |
+
"Teams with 16+ years of experience who use trunk-based development see increased delivery performance and decreased change failure rate (DORA 2022)",
|
| 211 |
+
"Cloud users are 14% more likely to exceed organisational performance goals than non-cloud peers (DORA 2022)",
|
| 212 |
+
"Hybrid and multi-cloud have a negative impact on delivery performance indicators (MTTR, lead time, deployment frequency) unless teams also have high reliability (DORA 2022)",
|
| 213 |
+
"SRE adoption follows a J-Curve: early adoption may not predict better reliability, but teams that persist through the inflection point see strong reliability gains (DORA 2022)",
|
| 214 |
+
"Respondents using higher-than-average levels of all technical capabilities have 3.8x higher organisational performance (DORA 2022)",
|
| 215 |
+
"AI adoption improves individual flow (+2.6%), productivity (+2.1%), and job satisfaction (+2.2%) per 25% increase in AI reliance (DORA 2024)",
|
| 216 |
+
"AI adoption reduces delivery stability by approximately 7.2% and throughput by 1.5% for every 25% increase in AI adoption β the hypothesis is that AI increases changelist size (DORA 2024)",
|
| 217 |
+
"AI improves code quality, documentation, and review speed, but these gains do not automatically translate to better delivery performance (DORA 2024)",
|
| 218 |
+
"The DORA 2024 'vacuum hypothesis': AI helps people finish valuable work faster, creating time that gets filled with more work rather than reducing toil (DORA 2024)",
|
| 219 |
+
"Platform engineering users see 8% higher individual productivity and 10% higher team performance, but 8% lower throughput and 14% lower change stability during adoption (DORA 2024)",
|
| 220 |
+
"Platform engineering success requires a user-centred (developer-centred) product mindset β without it, the platform becomes a hindrance (DORA 2024)",
|
| 221 |
+
"AI adoption increases organisational performance (+2.3%) and team performance (+1.4%) per 25% increase, but has no clear impact on product performance (DORA 2024)",
|
| 222 |
+
"Nearly 90% of software professionals were using AI tools as of the 2025 DORA AI Capabilities Model research",
|
| 223 |
+
"Seven AI capabilities identified by DORA 2025: clear AI stance, healthy data ecosystems, quality internal platforms, strong version control, working in small batches, user-centric focus, AI-accessible internal data",
|
| 224 |
+
"Working in small batches is a foundational AI capability β AI amplifies productivity, but larger changesets created by AI tools increase delivery risk without small batch discipline",
|
| 225 |
+
"The percentage of high performers was at a 4-year low in 2022, while low performers rose from 7% to 19% β suggesting pandemic-related knowledge sharing disruption (DORA 2022)",
|
| 226 |
+
"Change failure rate is strongly correlated with rework rate β failures require remediation changes, creating a compounding effect on throughput (DORA 2024)",
|
| 227 |
+
"The 2024 report introduced 'Failed Deployment Recovery Time' as the evolved metric replacing MTTR, measured specifically from failed deployment to recovery (DORA 2024)"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 228 |
]
|
| 229 |
+
}
|
docs/architecture.md
ADDED
|
@@ -0,0 +1,62 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Architecture
|
| 2 |
+
|
| 3 |
+
```
|
| 4 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 5 |
+
β USER'S BROWSER β
|
| 6 |
+
β β
|
| 7 |
+
β βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββ β
|
| 8 |
+
β β app/page.tsx βββββΆβ app/report/ β β window. β β
|
| 9 |
+
β β β β page.tsx βββββΆβ print() β β
|
| 10 |
+
β β Step 1: β β β β (PDF) β β
|
| 11 |
+
β β MetricsForm β β ReportView β βββββββββββββββ β
|
| 12 |
+
β β Step 2: β β BenchmarkTable β β
|
| 13 |
+
β β ContextForm β β β β
|
| 14 |
+
β ββββββββββ¬βββββββββ ββββββββββ²βββββββββ β
|
| 15 |
+
β β POST β sessionStorage β
|
| 16 |
+
β β /api/interpret β "devops-report" β
|
| 17 |
+
βββββββββββββΌβββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ
|
| 18 |
+
β β
|
| 19 |
+
βββββββββββββΌβββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ
|
| 20 |
+
β βΌ NEXT.JS SERVER (HF Space / local) β
|
| 21 |
+
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
|
| 22 |
+
β β app/api/interpret/route.ts β β
|
| 23 |
+
β β β β
|
| 24 |
+
β β 1. Validate input (MetricsInputSchema, TeamContext) β β
|
| 25 |
+
β β 2. loadBenchmarks() βββ data/benchmarks.json β β
|
| 26 |
+
β β 3. buildSystemPrompt(benchmarks) β β
|
| 27 |
+
β β 4. formatMetricsMessage(metrics, context) β β
|
| 28 |
+
β β 5. chat(system, user) βββΆ lib/llm.ts β β
|
| 29 |
+
β β 6. Validate response (InterpretationReportSchema) β β
|
| 30 |
+
β β 7. Return JSON report βββββββββββββββββββββββββββββββββββββββ
|
| 31 |
+
β ββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββ
|
| 32 |
+
β β
|
| 33 |
+
β ββββββββββββββββΌβββββββββββββββ
|
| 34 |
+
β β lib/llm.ts β
|
| 35 |
+
β β OpenAI-compatible client β
|
| 36 |
+
β β baseURL from OLLAMA_BASE_URLβ
|
| 37 |
+
β ββββββββββββββββ¬βββββββββββββββ
|
| 38 |
+
βββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββ
|
| 39 |
+
β
|
| 40 |
+
βββββββββββββββββββββ΄ββββββββββββββββββββ
|
| 41 |
+
β β
|
| 42 |
+
βΌ (local dev) βΌ (HF Space)
|
| 43 |
+
βββββββββββββββββ ββββββββββββββββββββββββ
|
| 44 |
+
β Ollama β β HF Inference Router β
|
| 45 |
+
β localhost: β β router.huggingface β
|
| 46 |
+
β 11434/v1 β β .co/v1 β
|
| 47 |
+
β β β β
|
| 48 |
+
β llama3.1:8b β β Qwen2.5-72B-Instruct β
|
| 49 |
+
βββββββββββββββββ ββββββββββββββββββββββββ
|
| 50 |
+
|
| 51 |
+
ββββββββββββββββββββ ONE-TIME SETUP βββββββββββββββββββββββββββββ
|
| 52 |
+
|
| 53 |
+
data/pdfs/*.pdf
|
| 54 |
+
β
|
| 55 |
+
βΌ
|
| 56 |
+
scripts/extract-knowledge.ts
|
| 57 |
+
(pdf-parse β chunk β LLM β merge)
|
| 58 |
+
β
|
| 59 |
+
βΌ
|
| 60 |
+
data/benchmarks.json βββΆ bundled into Docker image
|
| 61 |
+
(47 patterns, 124 insights)
|
| 62 |
+
```
|