Nomearod Claude Opus 4.6 (1M context) commited on
Commit
ce7247c
·
1 Parent(s): 68d96ea

feat: K8s pilot corpus — 8 pages + config entry + JSON rewrite

Browse files

Land the minimal K8s corpus needed to run the 6-question pilot:

- Fetch 8 pages from kubernetes.io/docs (pods, deployment,
replicaset, configmap, secret, node-pressure-eviction,
network-policies, pod-security-admission) via defuddle. Flat
`k8s_*.md` naming mirrors the existing `fastapi_*.md` precedent.
- Expand `corpora.k8s` in `configs/default.yaml`: flip
`available: true`, add `golden_dataset` pointer, drop
`refusal_threshold` from 0.30 placeholder to 0.02 for the pilot
smoke test. Two-line inline comment preserves the 0.30
launch-intent rationale pending the full tuning sweep.
- Rewrite `k8s_golden_pilot.json` `expected_sources` from
path-style (`concepts/workloads/pods`) to filename stems
(`k8s_pods.md`) so the exact-string match in
`retrieval_precision_at_k` works. `source_pages` stays as the
human-readable path anchor.
- Fix three `source_snippets` that drifted from the live page text:
pilot_002 (deployment rollout sentence paraphrase), pilot_003
(secret snippet now link-free substring), pilot_006 (added
backticks to match fetched markdown).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

agent_bench/evaluation/datasets/k8s_golden_pilot.json CHANGED
@@ -12,7 +12,7 @@
12
  "id": "k8s_pilot_001",
13
  "question": "In Kubernetes, does each Pod receive its own IP address, and how do containers inside the same Pod talk to each other?",
14
  "expected_answer_keywords": ["unique", "IP address", "shared", "localhost"],
15
- "expected_sources": ["concepts/workloads/pods"],
16
  "category": "retrieval",
17
  "difficulty": "easy",
18
  "requires_calculator": false,
@@ -31,8 +31,8 @@
31
  "question": "When you update a Deployment's pod template, what mechanism does Kubernetes use to transition Pods from the old version to the new one, and what role does the ReplicaSet play?",
32
  "expected_answer_keywords": ["ReplicaSet", "new ReplicaSet", "old ReplicaSet", "controlled rate", "replicas", "selector"],
33
  "expected_sources": [
34
- "concepts/workloads/controllers/deployment",
35
- "concepts/workloads/controllers/replicaset"
36
  ],
37
  "category": "retrieval",
38
  "difficulty": "hard",
@@ -42,7 +42,7 @@
42
  "is_multi_hop": true,
43
  "source_chunk_ids": [],
44
  "source_snippets": [
45
- "A new ReplicaSet is created and the Deployment manages moving the Pods from the old ReplicaSet to the new one at a controlled rate",
46
  "A ReplicaSet is defined with fields, including a selector that specifies how to identify Pods it can acquire, a number of replicas indicating how many Pods it should be maintaining"
47
  ],
48
  "source_pages": [
@@ -56,8 +56,8 @@
56
  "question": "What is the key difference between a ConfigMap and a Secret when deciding where to store sensitive application data like database passwords?",
57
  "expected_answer_keywords": ["non-confidential", "confidential", "Secret", "ConfigMap", "encryption", "etcd"],
58
  "expected_sources": [
59
- "concepts/configuration/configmap",
60
- "concepts/configuration/secret"
61
  ],
62
  "category": "retrieval",
63
  "difficulty": "medium",
@@ -68,7 +68,7 @@
68
  "source_chunk_ids": [],
69
  "source_snippets": [
70
  "A ConfigMap is an API object used to store non-confidential data in key-value pairs",
71
- "Secrets are similar to ConfigMaps but are specifically intended to hold confidential data"
72
  ],
73
  "source_pages": [
74
  "concepts/configuration/configmap",
@@ -80,7 +80,7 @@
80
  "id": "k8s_pilot_004",
81
  "question": "If I set a custom value for one hard eviction threshold on the kubelet (e.g., memory.available) but leave the other thresholds unset, what happens to the defaults for the thresholds I didn't override?",
82
  "expected_answer_keywords": ["zero", "default", "not inherited", "custom", "all thresholds", "explicit"],
83
- "expected_sources": ["concepts/scheduling-eviction/node-pressure-eviction"],
84
  "category": "retrieval",
85
  "difficulty": "hard",
86
  "requires_calculator": false,
@@ -98,7 +98,7 @@
98
  "id": "k8s_pilot_005",
99
  "question": "How do I configure a Kubernetes NetworkPolicy to enforce mutual TLS (mTLS) between Pods in the same namespace?",
100
  "expected_answer_keywords": ["not", "does not", "NetworkPolicy", "service mesh", "TLS", "ingress controller"],
101
- "expected_sources": ["concepts/services-networking/network-policies"],
102
  "category": "retrieval",
103
  "difficulty": "medium",
104
  "requires_calculator": false,
@@ -116,7 +116,7 @@
116
  "id": "k8s_pilot_006",
117
  "question": "As of the Kubernetes v1.31 snapshot, what is the feature state (alpha, beta, or stable) of the built-in Pod Security admission controller, and in which version did it reach that state?",
118
  "expected_answer_keywords": ["stable", "v1.25", "Pod Security", "admission controller"],
119
- "expected_sources": ["concepts/security/pod-security-admission"],
120
  "category": "retrieval",
121
  "difficulty": "easy",
122
  "requires_calculator": false,
@@ -125,7 +125,7 @@
125
  "is_multi_hop": false,
126
  "source_chunk_ids": [],
127
  "source_snippets": [
128
- "FEATURE STATE: Kubernetes v1.25 [stable]"
129
  ],
130
  "source_pages": ["concepts/security/pod-security-admission"],
131
  "source_sections": [""]
 
12
  "id": "k8s_pilot_001",
13
  "question": "In Kubernetes, does each Pod receive its own IP address, and how do containers inside the same Pod talk to each other?",
14
  "expected_answer_keywords": ["unique", "IP address", "shared", "localhost"],
15
+ "expected_sources": ["k8s_pods.md"],
16
  "category": "retrieval",
17
  "difficulty": "easy",
18
  "requires_calculator": false,
 
31
  "question": "When you update a Deployment's pod template, what mechanism does Kubernetes use to transition Pods from the old version to the new one, and what role does the ReplicaSet play?",
32
  "expected_answer_keywords": ["ReplicaSet", "new ReplicaSet", "old ReplicaSet", "controlled rate", "replicas", "selector"],
33
  "expected_sources": [
34
+ "k8s_deployment.md",
35
+ "k8s_replicaset.md"
36
  ],
37
  "category": "retrieval",
38
  "difficulty": "hard",
 
42
  "is_multi_hop": true,
43
  "source_chunk_ids": [],
44
  "source_snippets": [
45
+ "A new ReplicaSet is created, and the Deployment gradually scales it up while scaling down the old ReplicaSet, ensuring Pods are replaced at a controlled rate",
46
  "A ReplicaSet is defined with fields, including a selector that specifies how to identify Pods it can acquire, a number of replicas indicating how many Pods it should be maintaining"
47
  ],
48
  "source_pages": [
 
56
  "question": "What is the key difference between a ConfigMap and a Secret when deciding where to store sensitive application data like database passwords?",
57
  "expected_answer_keywords": ["non-confidential", "confidential", "Secret", "ConfigMap", "encryption", "etcd"],
58
  "expected_sources": [
59
+ "k8s_configmap.md",
60
+ "k8s_secret.md"
61
  ],
62
  "category": "retrieval",
63
  "difficulty": "medium",
 
68
  "source_chunk_ids": [],
69
  "source_snippets": [
70
  "A ConfigMap is an API object used to store non-confidential data in key-value pairs",
71
+ "specifically intended to hold confidential data"
72
  ],
73
  "source_pages": [
74
  "concepts/configuration/configmap",
 
80
  "id": "k8s_pilot_004",
81
  "question": "If I set a custom value for one hard eviction threshold on the kubelet (e.g., memory.available) but leave the other thresholds unset, what happens to the defaults for the thresholds I didn't override?",
82
  "expected_answer_keywords": ["zero", "default", "not inherited", "custom", "all thresholds", "explicit"],
83
+ "expected_sources": ["k8s_node_pressure_eviction.md"],
84
  "category": "retrieval",
85
  "difficulty": "hard",
86
  "requires_calculator": false,
 
98
  "id": "k8s_pilot_005",
99
  "question": "How do I configure a Kubernetes NetworkPolicy to enforce mutual TLS (mTLS) between Pods in the same namespace?",
100
  "expected_answer_keywords": ["not", "does not", "NetworkPolicy", "service mesh", "TLS", "ingress controller"],
101
+ "expected_sources": ["k8s_network_policies.md"],
102
  "category": "retrieval",
103
  "difficulty": "medium",
104
  "requires_calculator": false,
 
116
  "id": "k8s_pilot_006",
117
  "question": "As of the Kubernetes v1.31 snapshot, what is the feature state (alpha, beta, or stable) of the built-in Pod Security admission controller, and in which version did it reach that state?",
118
  "expected_answer_keywords": ["stable", "v1.25", "Pod Security", "admission controller"],
119
+ "expected_sources": ["k8s_pod_security_admission.md"],
120
  "category": "retrieval",
121
  "difficulty": "easy",
122
  "requires_calculator": false,
 
125
  "is_multi_hop": false,
126
  "source_chunk_ids": [],
127
  "source_snippets": [
128
+ "FEATURE STATE: `Kubernetes v1.25 [stable]`"
129
  ],
130
  "source_pages": ["concepts/security/pod-security-admission"],
131
  "source_sections": [""]
configs/default.yaml CHANGED
@@ -103,15 +103,10 @@ corpora:
103
  label: "Kubernetes"
104
  store_path: .cache/store_k8s
105
  data_path: data/k8s_docs
106
- # PLACEHOLDER tune against K8s golden dataset once it exists.
107
- # K8s has more cross-referenced concepts than FastAPI, so relevance
108
- # spreads across more chunks; the threshold likely lands higher.
109
- refusal_threshold: 0.30
110
  top_k: 5
111
  max_iterations: 3
112
- # available=false keeps the K8s corpus in the schema (dashboard
113
- # shows the toggle as disabled) but skips it from corpus_map at
114
- # startup. Flip to true after data/k8s_docs/ is curated and
115
- # `make ingest-k8s` has built .cache/store_k8s. See
116
- # data/k8s_docs/SOURCES.md for the curation policy.
117
- available: false
 
103
  label: "Kubernetes"
104
  store_path: .cache/store_k8s
105
  data_path: data/k8s_docs
106
+ refusal_threshold: 0.02 # PILOT: matches fastapi working value for 6-pilot smoke test.
107
+ # 0.30 placeholder remains the launch-intent; full tuning sweep
108
+ # lands with the 25-question golden set (see DECISIONS.md).
 
109
  top_k: 5
110
  max_iterations: 3
111
+ golden_dataset: agent_bench/evaluation/datasets/k8s_golden_pilot.json
112
+ available: true
 
 
 
 
data/k8s_docs/k8s_configmap.md ADDED
@@ -0,0 +1,281 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ A ConfigMap is an API object used to store non-confidential data in key-value pairs. [Pods](https://kubernetes.io/docs/concepts/workloads/pods/ "A Pod represents a set of running containers in your cluster.") can consume ConfigMaps as environment variables, command-line arguments, or as configuration files in a [volume](https://kubernetes.io/docs/concepts/storage/volumes/ "A directory containing data, accessible to the containers in a pod.").
2
+
3
+ A ConfigMap allows you to decouple environment-specific configuration from your [container images](https://kubernetes.io/docs/reference/glossary/?all=true#term-image "Stored instance of a container that holds a set of software needed to run an application."), so that your applications are easily portable.
4
+
5
+ > [!caution] Caution:
6
+ > ConfigMap does not provide secrecy or encryption. If the data you want to store are confidential, use a [Secret](https://kubernetes.io/docs/concepts/configuration/secret/ "Stores sensitive information, such as passwords, OAuth tokens, and ssh keys.") rather than a ConfigMap, or use additional (third party) tools to keep your data private.
7
+
8
+ ## Motivation
9
+
10
+ Use a ConfigMap for setting configuration data separately from application code.
11
+
12
+ For example, imagine that you are developing an application that you can run on your own computer (for development) and in the cloud (to handle real traffic). You write the code to look in an environment variable named `DATABASE_HOST`. Locally, you set that variable to `localhost`. In the cloud, you set it to refer to a Kubernetes [Service](https://kubernetes.io/docs/concepts/services-networking/service/ "A way to expose an application running on a set of Pods as a network service.") that exposes the database component to your cluster. This lets you fetch a container image running in the cloud and debug the exact same code locally if needed.
13
+
14
+ > [!info] Note:
15
+ > A ConfigMap is not designed to hold large chunks of data. The data stored in a ConfigMap cannot exceed 1 MiB. If you need to store settings that are larger than this limit, you may want to consider mounting a volume or use a separate database or file service.
16
+
17
+ ## ConfigMap object
18
+
19
+ A ConfigMap is an [API object](https://kubernetes.io/docs/concepts/overview/working-with-objects/#kubernetes-objects "An entity in the Kubernetes system, representing part of the state of your cluster.") that lets you store configuration for other objects to use. Unlike most Kubernetes objects that have a `spec`, a ConfigMap has `data` and `binaryData` fields. These fields accept key-value pairs as their values. Both the `data` field and the `binaryData` are optional. The `data` field is designed to contain UTF-8 strings while the `binaryData` field is designed to contain binary data as base64-encoded strings.
20
+
21
+ The name of a ConfigMap must be a valid [DNS subdomain name](https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#dns-subdomain-names).
22
+
23
+ Each key under the `data` or the `binaryData` field must consist of alphanumeric characters, `-`, `_` or `.`. The keys stored in `data` must not overlap with the keys in the `binaryData` field.
24
+
25
+ Starting from v1.19, you can add an `immutable` field to a ConfigMap definition to create an [immutable ConfigMap](#configmap-immutable).
26
+
27
+ ## ConfigMaps and Pods
28
+
29
+ You can write a Pod `spec` that refers to a ConfigMap and configures the container(s) in that Pod based on the data in the ConfigMap. The Pod and the ConfigMap must be in the same [namespace](https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces "An abstraction used by Kubernetes to support isolation of groups of resources within a single cluster.").
30
+
31
+ > [!info] Note:
32
+ > The `spec` of a [static Pod](https://kubernetes.io/docs/tasks/configure-pod-container/static-pod/ "A pod managed directly by the kubelet daemon on a specific node.") cannot refer to a ConfigMap or any other API objects.
33
+
34
+ Here's an example ConfigMap that has some keys with single values, and other keys where the value looks like a fragment of a configuration format.
35
+
36
+ ```yaml
37
+ apiVersion: v1
38
+ kind: ConfigMap
39
+ metadata:
40
+ name: game-demo
41
+ data:
42
+ # property-like keys; each key maps to a simple value
43
+ player_initial_lives: "3"
44
+ ui_properties_file_name: "user-interface.properties"
45
+
46
+ # file-like keys
47
+ game.properties: |
48
+ enemy.types=aliens,monsters
49
+ player.maximum-lives=5
50
+ user-interface.properties: |
51
+ color.good=purple
52
+ color.bad=yellow
53
+ allow.textmode=true
54
+ ```
55
+
56
+ There are four different ways that you can use a ConfigMap to configure a container inside a Pod:
57
+
58
+ 1. Inside a container command and args
59
+ 2. Environment variables for a container
60
+ 3. Add a file in read-only volume, for the application to read
61
+ 4. Write code to run inside the Pod that uses the Kubernetes API to read a ConfigMap
62
+
63
+ These different methods lend themselves to different ways of modeling the data being consumed. For the first three methods, the [kubelet](https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet "An agent that runs on each node in the cluster. It makes sure that containers are running in a pod.") uses the data from the ConfigMap when it launches container(s) for a Pod.
64
+
65
+ The fourth method means you have to write code to read the ConfigMap and its data. However, because you're using the Kubernetes API directly, your application can subscribe to get updates whenever the ConfigMap changes, and react when that happens. By accessing the Kubernetes API directly, this technique also lets you access a ConfigMap in a different namespace.
66
+
67
+ Here's an example Pod that uses values from `game-demo` to configure a Pod:
68
+
69
+ ```yaml
70
+ apiVersion: v1
71
+ kind: Pod
72
+ metadata:
73
+ name: configmap-demo-pod
74
+ spec:
75
+ containers:
76
+ - name: demo
77
+ image: alpine
78
+ command: ["sleep", "3600"]
79
+ env:
80
+ # Define the environment variable
81
+ - name: PLAYER_INITIAL_LIVES # Notice that the case is different here
82
+ # from the key name in the ConfigMap.
83
+ valueFrom:
84
+ configMapKeyRef:
85
+ name: game-demo # The ConfigMap this value comes from.
86
+ key: player_initial_lives # The key to fetch.
87
+ - name: UI_PROPERTIES_FILE_NAME
88
+ valueFrom:
89
+ configMapKeyRef:
90
+ name: game-demo
91
+ key: ui_properties_file_name
92
+ volumeMounts:
93
+ - name: config
94
+ mountPath: "/config"
95
+ readOnly: true
96
+ volumes:
97
+ # You set volumes at the Pod level, then mount them into containers inside that Pod
98
+ - name: config
99
+ configMap:
100
+ # Provide the name of the ConfigMap you want to mount.
101
+ name: game-demo
102
+ # An array of keys from the ConfigMap to create as files
103
+ items:
104
+ - key: "game.properties"
105
+ path: "game.properties"
106
+ - key: "user-interface.properties"
107
+ path: "user-interface.properties"
108
+ ```
109
+
110
+ A ConfigMap doesn't differentiate between single line property values and multi-line file-like values. What matters is how Pods and other objects consume those values.
111
+
112
+ For this example, defining a volume and mounting it inside the `demo` container as `/config` creates two files, `/config/game.properties` and `/config/user-interface.properties`, even though there are four keys in the ConfigMap. This is because the Pod definition specifies an `items` array in the `volumes` section. If you omit the `items` array entirely, every key in the ConfigMap becomes a file with the same name as the key, and you get 4 files.
113
+
114
+ ## Using ConfigMaps
115
+
116
+ ConfigMaps can be mounted as data volumes. ConfigMaps can also be used by other parts of the system, without being directly exposed to the Pod. For example, ConfigMaps can hold data that other parts of the system should use for configuration.
117
+
118
+ The most common way to use ConfigMaps is to configure settings for containers running in a Pod in the same namespace. You can also use a ConfigMap separately.
119
+
120
+ For example, you might encounter [addons](https://kubernetes.io/docs/concepts/cluster-administration/addons/ "Resources that extend the functionality of Kubernetes.") or [operators](https://kubernetes.io/docs/concepts/extend-kubernetes/operator/ "A specialized controller used to manage a custom resource") that adjust their behavior based on a ConfigMap.
121
+
122
+ ### Using ConfigMaps as files from a Pod
123
+
124
+ To consume a ConfigMap in a volume in a Pod:
125
+
126
+ 1. Create a ConfigMap or use an existing one. Multiple Pods can reference the same ConfigMap.
127
+ 2. Modify your Pod definition to add a volume under `.spec.volumes[]`. Name the volume anything, and have a `.spec.volumes[].configMap.name` field set to reference your ConfigMap object.
128
+ 3. Add a `.spec.containers[].volumeMounts[]` to each container that needs the ConfigMap. Specify `.spec.containers[].volumeMounts[].readOnly = true` and `.spec.containers[].volumeMounts[].mountPath` to an unused directory name where you would like the ConfigMap to appear.
129
+ 4. Modify your image or command line so that the program looks for files in that directory. Each key in the ConfigMap `data` map becomes the filename under `mountPath`.
130
+
131
+ This is an example of a Pod that mounts a ConfigMap in a volume:
132
+
133
+ ```yaml
134
+ apiVersion: v1
135
+ kind: Pod
136
+ metadata:
137
+ name: mypod
138
+ spec:
139
+ containers:
140
+ - name: mypod
141
+ image: redis
142
+ volumeMounts:
143
+ - name: foo
144
+ mountPath: "/etc/foo"
145
+ readOnly: true
146
+ volumes:
147
+ - name: foo
148
+ configMap:
149
+ name: myconfigmap
150
+ ```
151
+
152
+ Each ConfigMap you want to use needs to be referred to in `.spec.volumes`.
153
+
154
+ If there are multiple containers in the Pod, then each container needs its own `volumeMounts` block, but only one `.spec.volumes` is needed per ConfigMap.
155
+
156
+ #### Mounted ConfigMaps are updated automatically
157
+
158
+ When a ConfigMap currently consumed in a volume is updated, projected keys are eventually updated as well. The kubelet checks whether the mounted ConfigMap is fresh on every periodic sync. However, the kubelet uses its local cache for getting the current value of the ConfigMap. The type of the cache is configurable using the `configMapAndSecretChangeDetectionStrategy` field in the [KubeletConfiguration struct](https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/). A ConfigMap can be either propagated by watch (default), ttl-based, or by redirecting all requests directly to the API server. As a result, the total delay from the moment when the ConfigMap is updated to the moment when new keys are projected to the Pod can be as long as the kubelet sync period + cache propagation delay, where the cache propagation delay depends on the chosen cache type (it equals to watch propagation delay, ttl of cache, or zero correspondingly).
159
+
160
+ ConfigMaps consumed as environment variables are not updated automatically and require a pod restart.
161
+
162
+ > [!info] Note:
163
+ > A container using a ConfigMap as a [subPath](https://kubernetes.io/docs/concepts/storage/volumes/#using-subpath) volume mount will not receive ConfigMap updates.
164
+
165
+ ### Using Configmaps as environment variables
166
+
167
+ To use a Configmap in an [environment variable](https://kubernetes.io/docs/concepts/containers/container-environment/ "Container environment variables are name=value pairs that provide useful information into containers running in a Pod.") in a Pod:
168
+
169
+ 1. For each container in your Pod specification, add an environment variable for each Configmap key that you want to use to the `env[].valueFrom.configMapKeyRef` field.
170
+ 2. Modify your image and/or command line so that the program looks for values in the specified environment variables.
171
+
172
+ This is an example of defining a ConfigMap as a pod environment variable:
173
+
174
+ The following ConfigMap (myconfigmap.yaml) stores two properties: username and access\_level:
175
+
176
+ ```yaml
177
+ apiVersion: v1
178
+ kind: ConfigMap
179
+ metadata:
180
+ name: myconfigmap
181
+ data:
182
+ username: k8s-admin
183
+ access_level: "1"
184
+ ```
185
+
186
+ The following command will create the ConfigMap object:
187
+
188
+ ```shell
189
+ kubectl apply -f myconfigmap.yaml
190
+ ```
191
+
192
+ The following Pod consumes the content of the ConfigMap as environment variables:
193
+
194
+ ```yaml
195
+ apiVersion: v1
196
+ kind: Pod
197
+ metadata:
198
+ name: env-configmap
199
+ spec:
200
+ containers:
201
+ - name: app
202
+ command: ["/bin/sh", "-c", "printenv"]
203
+ image: busybox:latest
204
+ envFrom:
205
+ - configMapRef:
206
+ name: myconfigmap
207
+ ```
208
+
209
+ The `envFrom` field instructs Kubernetes to create environment variables from the sources nested within it. The inner `configMapRef` refers to a ConfigMap by its name and selects all its key-value pairs. Add the Pod to your cluster, then retrieve its logs to see the output from the printenv command. This should confirm that the two key-value pairs from the ConfigMap have been set as environment variables:
210
+
211
+ ```shell
212
+ kubectl apply -f env-configmap.yaml
213
+ ```
214
+ ```shell
215
+ kubectl logs pod/env-configmap
216
+ ```
217
+
218
+ The output is similar to this:
219
+
220
+ ```console
221
+ ...
222
+ username: "k8s-admin"
223
+ access_level: "1"
224
+ ...
225
+ ```
226
+
227
+ Sometimes a Pod won't require access to all the values in a ConfigMap. For example, you could have another Pod which only uses the username value from the ConfigMap. For this use case, you can use the `env.valueFrom` syntax instead, which lets you select individual keys in a ConfigMap. The name of the environment variable can also be different from the key within the ConfigMap. For example:
228
+
229
+ ```yaml
230
+ apiVersion: v1
231
+ kind: Pod
232
+ metadata:
233
+ name: env-configmap
234
+ spec:
235
+ containers:
236
+ - name: envars-test-container
237
+ image: nginx
238
+ env:
239
+ - name: CONFIGMAP_USERNAME
240
+ valueFrom:
241
+ configMapKeyRef:
242
+ name: myconfigmap
243
+ key: username
244
+ ```
245
+
246
+ In the Pod created from this manifest, you will see that the environment variable `CONFIGMAP_USERNAME` is set to the value of the `username` value from the ConfigMap. Other keys from the ConfigMap data are not copied into the environment.
247
+
248
+ It's important to note that the range of characters allowed for environment variable names in pods is [restricted](https://kubernetes.io/docs/tasks/inject-data-application/define-environment-variable-container/#using-environment-variables-inside-of-your-config). If any keys do not meet the rules, those keys are not made available to your container, though the Pod is allowed to start.
249
+
250
+ ## Immutable ConfigMaps
251
+
252
+ FEATURE STATE: `Kubernetes v1.21 [stable]`
253
+
254
+ The Kubernetes feature *Immutable Secrets and ConfigMaps* provides an option to set individual Secrets and ConfigMaps as immutable. For clusters that extensively use ConfigMaps (at least tens of thousands of unique ConfigMap to Pod mounts), preventing changes to their data has the following advantages:
255
+
256
+ - protects you from accidental (or unwanted) updates that could cause applications outages
257
+ - improves performance of your cluster by significantly reducing load on kube-apiserver, by closing watches for ConfigMaps marked as immutable.
258
+
259
+ You can create an immutable ConfigMap by setting the `immutable` field to `true`. For example:
260
+
261
+ ```yaml
262
+ apiVersion: v1
263
+ kind: ConfigMap
264
+ metadata:
265
+ ...
266
+ data:
267
+ ...
268
+ immutable: true
269
+ ```
270
+
271
+ Once a ConfigMap is marked as immutable, it is *not* possible to revert this change nor to mutate the contents of the `data` or the `binaryData` field. You can only delete and recreate the ConfigMap. Because existing Pods maintain a mount point to the deleted ConfigMap, it is recommended to recreate these pods.
272
+
273
+ ## What's next
274
+
275
+ - Read about [Secrets](https://kubernetes.io/docs/concepts/configuration/secret/).
276
+ - Read [Configure a Pod to Use a ConfigMap](https://kubernetes.io/docs/tasks/configure-pod-container/configure-pod-configmap/).
277
+ - Read about [changing a ConfigMap (or any other Kubernetes object)](https://kubernetes.io/docs/tasks/manage-kubernetes-objects/update-api-object-kubectl-patch/)
278
+ - Read [The Twelve-Factor App](https://12factor.net/) to understand the motivation for separating code from configuration.
279
+
280
+
281
+ Last modified November 21, 2025 at 2:18 PM PST: [Fix formatting of kubectl logs command (69fb346f79)](https://github.com/kubernetes/website/commit/69fb346f79076561c9e5fdb6e65aed5b927e8ce5)
data/k8s_docs/k8s_deployment.md ADDED
@@ -0,0 +1,1092 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ A Deployment manages a set of Pods to run an application workload, usually one that doesn't maintain state.
2
+
3
+ A *Deployment* provides declarative updates for [Pods](https://kubernetes.io/docs/concepts/workloads/pods/ "A Pod represents a set of running containers in your cluster.") and [ReplicaSets](https://kubernetes.io/docs/concepts/workloads/controllers/replicaset/ "ReplicaSet ensures that a specified number of Pod replicas are running at one time").
4
+
5
+ You describe a *desired state* in a Deployment, and the Deployment [Controller](https://kubernetes.io/docs/concepts/architecture/controller/ "A control loop that watches the shared state of the cluster through the apiserver and makes changes attempting to move the current state towards the desired state.") changes the actual state to the desired state at a controlled rate. You can define Deployments to create new ReplicaSets, or to remove existing Deployments and adopt all their resources with new Deployments.
6
+
7
+ > [!info] Note:
8
+ > Do not manage ReplicaSets owned by a Deployment. Consider opening an issue in the main Kubernetes repository if your use case is not covered below.
9
+
10
+ ## Use Case
11
+
12
+ The following are typical use cases for Deployments:
13
+
14
+ - [Create a Deployment to rollout a ReplicaSet](#creating-a-deployment). The ReplicaSet creates Pods in the background. Check the status of the rollout to see if it succeeds or not.
15
+ - [Declare the new state of the Pods](#updating-a-deployment) by updating the PodTemplateSpec of the Deployment. A new ReplicaSet is created, and the Deployment gradually scales it up while scaling down the old ReplicaSet, ensuring Pods are replaced at a controlled rate. Each new ReplicaSet updates the revision of the Deployment.
16
+ - [Rollback to an earlier Deployment revision](#rolling-back-a-deployment) if the current state of the Deployment is not stable. Each rollback updates the revision of the Deployment.
17
+ - [Scale up the Deployment to facilitate more load](#scaling-a-deployment).
18
+ - [Pause the rollout of a Deployment](#pausing-and-resuming-a-deployment) to apply multiple fixes to its PodTemplateSpec and then resume it to start a new rollout.
19
+ - [Use the status of the Deployment](#deployment-status) as an indicator that a rollout has stuck.
20
+ - [Clean up older ReplicaSets](#clean-up-policy) that you don't need anymore.
21
+
22
+ ## Creating a Deployment
23
+
24
+ The following is an example of a Deployment. It creates a ReplicaSet to bring up three `nginx` Pods:
25
+
26
+ ```yaml
27
+ apiVersion: apps/v1
28
+ kind: Deployment
29
+ metadata:
30
+ name: nginx-deployment
31
+ labels:
32
+ app: nginx
33
+ spec:
34
+ replicas: 3
35
+ selector:
36
+ matchLabels:
37
+ app: nginx
38
+ template:
39
+ metadata:
40
+ labels:
41
+ app: nginx
42
+ spec:
43
+ containers:
44
+ - name: nginx
45
+ image: nginx:1.14.2
46
+ ports:
47
+ - containerPort: 80
48
+ ```
49
+
50
+ In this example:
51
+
52
+ - A Deployment named `nginx-deployment` is created, indicated by the `.metadata.name` field. This name will become the basis for the ReplicaSets and Pods which are created later. See [Writing a Deployment Spec](#writing-a-deployment-spec) for more details.
53
+ - The Deployment creates a ReplicaSet that creates three replicated Pods, indicated by the `.spec.replicas` field.
54
+ - The `.spec.selector` field defines how the created ReplicaSet finds which Pods to manage. In this case, you select a label that is defined in the Pod template (`app: nginx`). However, more sophisticated selection rules are possible, as long as the Pod template itself satisfies the rule.
55
+ > [!info] Note:
56
+ > The `.spec.selector.matchLabels` field is a map of {key,value} pairs. A single {key,value} in the `matchLabels` map is equivalent to an element of `matchExpressions`, whose `key` field is "key", the `operator` is "In", and the `values` array contains only "value". All of the requirements, from both `matchLabels` and `matchExpressions`, must be satisfied in order to match.
57
+ - The `.spec.template` field contains the following sub-fields:
58
+ - The Pods are labeled `app: nginx` using the `.metadata.labels` field.
59
+ - The Pod template's specification, or `.spec` field, indicates that the Pods run one container, `nginx`, which runs the `nginx` [Docker Hub](https://hub.docker.com/) image at version 1.14.2.
60
+ - Create one container and name it `nginx` using the `.spec.containers[0].name` field.
61
+
62
+ Before you begin, make sure your Kubernetes cluster is up and running. Follow the steps given below to create the above Deployment:
63
+
64
+ 1. Create the Deployment by running the following command:
65
+ ```shell
66
+ kubectl apply -f https://k8s.io/examples/controllers/nginx-deployment.yaml
67
+ ```
68
+ 2. Run `kubectl get deployments` to check if the Deployment was created.
69
+ If the Deployment is still being created, the output is similar to the following:
70
+ ```
71
+ NAME READY UP-TO-DATE AVAILABLE AGE
72
+ nginx-deployment 0/3 0 0 1s
73
+ ```
74
+ When you inspect the Deployments in your cluster, the following fields are displayed:
75
+ - `NAME` lists the names of the Deployments in the namespace.
76
+ - `READY` displays how many replicas of the application are available to your users. It follows the pattern ready/desired.
77
+ - `UP-TO-DATE` displays the number of replicas that have been updated to achieve the desired state.
78
+ - `AVAILABLE` displays how many replicas of the application are available to your users.
79
+ - `AGE` displays the amount of time that the application has been running.
80
+ Notice how the number of desired replicas is 3 according to `.spec.replicas` field.
81
+ 3. To see the Deployment rollout status, run `kubectl rollout status deployment/nginx-deployment`.
82
+ The output is similar to:
83
+ ```
84
+ Waiting for rollout to finish: 2 out of 3 new replicas have been updated...
85
+ deployment "nginx-deployment" successfully rolled out
86
+ ```
87
+ 4. Run the `kubectl get deployments` again a few seconds later. The output is similar to this:
88
+ ```
89
+ NAME READY UP-TO-DATE AVAILABLE AGE
90
+ nginx-deployment 3/3 3 3 18s
91
+ ```
92
+ Notice that the Deployment has created all three replicas, and all replicas are up-to-date (they contain the latest Pod template) and available.
93
+ 5. To see the ReplicaSet (`rs`) created by the Deployment, run `kubectl get rs`. The output is similar to this:
94
+ ```
95
+ NAME DESIRED CURRENT READY AGE
96
+ nginx-deployment-75675f5897 3 3 3 18s
97
+ ```
98
+ ReplicaSet output shows the following fields:
99
+ - `NAME` lists the names of the ReplicaSets in the namespace.
100
+ - `DESIRED` displays the desired number of *replicas* of the application, which you define when you create the Deployment. This is the *desired state*.
101
+ - `CURRENT` displays how many replicas are currently running.
102
+ - `READY` displays how many replicas of the application are available to your users.
103
+ - `AGE` displays the amount of time that the application has been running.
104
+ Notice that the name of the ReplicaSet is always formatted as `[DEPLOYMENT-NAME]-[HASH]`. This name will become the basis for the Pods which are created.
105
+ The `HASH` string is the same as the `pod-template-hash` label on the ReplicaSet.
106
+ 6. To see the labels automatically generated for each Pod, run `kubectl get pods --show-labels`. The output is similar to:
107
+ ```
108
+ NAME READY STATUS RESTARTS AGE LABELS
109
+ nginx-deployment-75675f5897-7ci7o 1/1 Running 0 18s app=nginx,pod-template-hash=75675f5897
110
+ nginx-deployment-75675f5897-kzszj 1/1 Running 0 18s app=nginx,pod-template-hash=75675f5897
111
+ nginx-deployment-75675f5897-qqcnn 1/1 Running 0 18s app=nginx,pod-template-hash=75675f5897
112
+ ```
113
+ The created ReplicaSet ensures that there are three `nginx` Pods.
114
+
115
+ > [!info] Note:
116
+ > You must specify an appropriate selector and Pod template labels in a Deployment (in this case, `app: nginx`).
117
+ >
118
+ > Do not overlap labels or selectors with other controllers (including other Deployments and StatefulSets). Kubernetes doesn't stop you from overlapping, and if multiple controllers have overlapping selectors those controllers might conflict and behave unexpectedly.
119
+
120
+ ### Pod-template-hash label
121
+
122
+ > [!caution] Caution:
123
+ > Do not change this label.
124
+
125
+ The `pod-template-hash` label is added by the Deployment controller to every ReplicaSet that a Deployment creates or adopts.
126
+
127
+ This label ensures that child ReplicaSets of a Deployment do not overlap. It is generated by hashing the `PodTemplate` of the ReplicaSet and using the resulting hash as the label value that is added to the ReplicaSet selector, Pod template labels, and in any existing Pods that the ReplicaSet might have.
128
+
129
+ ## Updating a Deployment
130
+
131
+ > [!info] Note:
132
+ > A Deployment's rollout is triggered if and only if the Deployment's Pod template (that is, `.spec.template`) is changed, for example if the labels or container images of the template are updated. Other updates, such as scaling the Deployment, do not trigger a rollout.
133
+
134
+ Follow the steps given below to update your Deployment:
135
+
136
+ 1. Let's update the nginx Pods to use the `nginx:1.16.1` image instead of the `nginx:1.14.2` image.
137
+ ```shell
138
+ kubectl set image deployment.v1.apps/nginx-deployment nginx=nginx:1.16.1
139
+ ```
140
+ or use the following command:
141
+ ```shell
142
+ kubectl set image deployment/nginx-deployment nginx=nginx:1.16.1
143
+ ```
144
+ where `deployment/nginx-deployment` indicates the Deployment, `nginx` indicates the Container the update will take place and `nginx:1.16.1` indicates the new image and its tag.
145
+ The output is similar to:
146
+ ```
147
+ deployment.apps/nginx-deployment image updated
148
+ ```
149
+ Alternatively, you can `edit` the Deployment and change `.spec.template.spec.containers[0].image` from `nginx:1.14.2` to `nginx:1.16.1`:
150
+ ```shell
151
+ kubectl edit deployment/nginx-deployment
152
+ ```
153
+ The output is similar to:
154
+ ```
155
+ deployment.apps/nginx-deployment edited
156
+ ```
157
+ 2. To see the rollout status, run:
158
+ ```shell
159
+ kubectl rollout status deployment/nginx-deployment
160
+ ```
161
+ The output is similar to this:
162
+ ```
163
+ Waiting for rollout to finish: 2 out of 3 new replicas have been updated...
164
+ ```
165
+ or
166
+ ```
167
+ deployment "nginx-deployment" successfully rolled out
168
+ ```
169
+
170
+ Get more details on your updated Deployment:
171
+
172
+ - After the rollout succeeds, you can view the Deployment by running `kubectl get deployments`. The output is similar to this:
173
+ ```
174
+ NAME READY UP-TO-DATE AVAILABLE AGE
175
+ nginx-deployment 3/3 3 3 36s
176
+ ```
177
+ - Run `kubectl get rs` to see that the Deployment updated the Pods by creating a new ReplicaSet and scaling it up to 3 replicas, as well as scaling down the old ReplicaSet to 0 replicas.
178
+ ```shell
179
+ kubectl get rs
180
+ ```
181
+ The output is similar to this:
182
+ ```
183
+ NAME DESIRED CURRENT READY AGE
184
+ nginx-deployment-1564180365 3 3 3 6s
185
+ nginx-deployment-2035384211 0 0 0 36s
186
+ ```
187
+ - Running `get pods` should now show only the new Pods:
188
+ ```shell
189
+ kubectl get pods
190
+ ```
191
+ The output is similar to this:
192
+ ```
193
+ NAME READY STATUS RESTARTS AGE
194
+ nginx-deployment-1564180365-khku8 1/1 Running 0 14s
195
+ nginx-deployment-1564180365-nacti 1/1 Running 0 14s
196
+ nginx-deployment-1564180365-z9gth 1/1 Running 0 14s
197
+ ```
198
+ Next time you want to update these Pods, you only need to update the Deployment's Pod template again.
199
+ Deployment ensures that only a certain number of Pods are down while they are being updated. By default, it ensures that at least 75% of the desired number of Pods are up (25% max unavailable).
200
+ Deployment also ensures that only a certain number of Pods are created above the desired number of Pods. By default, it ensures that at most 125% of the desired number of Pods are up (25% max surge).
201
+ For example, if you look at the above Deployment closely, you will see that it first creates a new Pod, then deletes an old Pod, and creates another new one. It does not kill old Pods until a sufficient number of new Pods have come up, and does not create new Pods until a sufficient number of old Pods have been killed. It makes sure that at least 3 Pods are available and that at max 4 Pods in total are available. In case of a Deployment with 4 replicas, the number of Pods would be between 3 and 5.
202
+ - Get details of your Deployment:
203
+ ```shell
204
+ kubectl describe deployments
205
+ ```
206
+ The output is similar to this:
207
+ ```
208
+ Name: nginx-deployment
209
+ Namespace: default
210
+ CreationTimestamp: Thu, 30 Nov 2017 10:56:25 +0000
211
+ Labels: app=nginx
212
+ Annotations: deployment.kubernetes.io/revision=2
213
+ Selector: app=nginx
214
+ Replicas: 3 desired | 3 updated | 3 total | 3 available | 0 unavailable
215
+ StrategyType: RollingUpdate
216
+ MinReadySeconds: 0
217
+ RollingUpdateStrategy: 25% max unavailable, 25% max surge
218
+ Pod Template:
219
+ Labels: app=nginx
220
+ Containers:
221
+ nginx:
222
+ Image: nginx:1.16.1
223
+ Port: 80/TCP
224
+ Environment: <none>
225
+ Mounts: <none>
226
+ Volumes: <none>
227
+ Conditions:
228
+ Type Status Reason
229
+ ---- ------ ------
230
+ Available True MinimumReplicasAvailable
231
+ Progressing True NewReplicaSetAvailable
232
+ OldReplicaSets: <none>
233
+ NewReplicaSet: nginx-deployment-1564180365 (3/3 replicas created)
234
+ Events:
235
+ Type Reason Age From Message
236
+ ---- ------ ---- ---- -------
237
+ Normal ScalingReplicaSet 2m deployment-controller Scaled up replica set nginx-deployment-2035384211 to 3
238
+ Normal ScalingReplicaSet 24s deployment-controller Scaled up replica set nginx-deployment-1564180365 to 1
239
+ Normal ScalingReplicaSet 22s deployment-controller Scaled down replica set nginx-deployment-2035384211 to 2
240
+ Normal ScalingReplicaSet 22s deployment-controller Scaled up replica set nginx-deployment-1564180365 to 2
241
+ Normal ScalingReplicaSet 19s deployment-controller Scaled down replica set nginx-deployment-2035384211 to 1
242
+ Normal ScalingReplicaSet 19s deployment-controller Scaled up replica set nginx-deployment-1564180365 to 3
243
+ Normal ScalingReplicaSet 14s deployment-controller Scaled down replica set nginx-deployment-2035384211 to 0
244
+ ```
245
+ Here you see that when you first created the Deployment, it created a ReplicaSet (nginx-deployment-2035384211) and scaled it up to 3 replicas directly. When you updated the Deployment, it created a new ReplicaSet (nginx-deployment-1564180365) and scaled it up to 1 and waited for it to come up. Then it scaled down the old ReplicaSet to 2 and scaled up the new ReplicaSet to 2 so that at least 3 Pods were available and at most 4 Pods were created at all times. It then continued scaling up and down the new and the old ReplicaSet, with the same rolling update strategy. Finally, you'll have 3 available replicas in the new ReplicaSet, and the old ReplicaSet is scaled down to 0.
246
+
247
+ > [!info] Note:
248
+ > Kubernetes doesn't count terminating Pods when calculating the number of `availableReplicas`, which must be between `replicas - maxUnavailable` and `replicas + maxSurge`. As a result, you might notice that there are more Pods than expected during a rollout, and that the total resources consumed by the Deployment is more than `replicas + maxSurge` until the `terminationGracePeriodSeconds` of the terminating Pods expires.
249
+
250
+ ### Rollover (aka multiple updates in-flight)
251
+
252
+ Each time a new Deployment is observed by the Deployment controller, a ReplicaSet is created to bring up the desired Pods. If the Deployment is updated, the existing ReplicaSet that controls Pods whose labels match `.spec.selector` but whose template does not match `.spec.template` is scaled down. Eventually, the new ReplicaSet is scaled to `.spec.replicas` and all old ReplicaSets is scaled to 0.
253
+
254
+ If you update a Deployment while an existing rollout is in progress, the Deployment creates a new ReplicaSet as per the update and start scaling that up, and rolls over the ReplicaSet that it was scaling up previously -- it will add it to its list of old ReplicaSets and start scaling it down.
255
+
256
+ For example, suppose you create a Deployment to create 5 replicas of `nginx:1.14.2`, but then update the Deployment to create 5 replicas of `nginx:1.16.1`, when only 3 replicas of `nginx:1.14.2` had been created. In that case, the Deployment immediately starts killing the 3 `nginx:1.14.2` Pods that it had created, and starts creating `nginx:1.16.1` Pods. It does not wait for the 5 replicas of `nginx:1.14.2` to be created before changing course.
257
+
258
+ ### Label selector updates
259
+
260
+ It is generally discouraged to make label selector updates and it is suggested to plan your selectors up front. A Deployment's label selector is **immutable** after creation; it cannot be updated via `kubectl patch`, `kubectl edit`, `kubectl apply`, or tools like `helm upgrade`.
261
+
262
+ If you must change the selector, you have to delete the Deployment and recreate it. Exercise great caution and ensure you grasp the following implications:
263
+
264
+ - **Additions:** When you create a new Deployment with a narrower selector, the new Deployment **must** also have a suitable Pod template. If you have an existing manifest and you edit the manifest to narrow the selector, you need to edit the metadata of the Pod template inside that Deployment, adding the new labels to match, as otherwise the API server returns a validation error. This is a *non-overlapping* change: the new Deployment will not "see" the old Pods (which lack the new label), causing the old ReplicaSet to be **orphaned** and a brand-new ReplicaSet to be created.
265
+ - **Value Updates:** Changing the existing value in a selector key (e.g., from `v1` to `v2`) results in the same behavior as additions (orphaning and recreation).
266
+ - **Removals:** Removing an existing key from the Deployment selector does not require any changes in the Pod template labels. This is an *overlapping* change: the new, broader selector would match the old Pods. Existing ReplicaSets are not orphaned, and a new ReplicaSet is not created, but note that the removed label still exists in any existing Pods and ReplicaSets. You can clean that up by triggering a rollout for the Deployment.
267
+
268
+ ## Rolling Back a Deployment
269
+
270
+ Sometimes, you may want to rollback a Deployment; for example, when the Deployment is not stable, such as crash looping. By default, all of the Deployment's rollout history is kept in the system so that you can rollback anytime you want (you can change that by modifying revision history limit).
271
+
272
+ > [!info] Note:
273
+ > A Deployment's revision is created when a Deployment's rollout is triggered. This means that the new revision is created if and only if the Deployment's Pod template (`.spec.template`) is changed, for example if you update the labels or container images of the template. Other updates, such as scaling the Deployment, do not create a Deployment revision, so that you can facilitate simultaneous manual- or auto-scaling. This means that when you roll back to an earlier revision, only the Deployment's Pod template part is rolled back.
274
+
275
+ - Suppose that you made a typo while updating the Deployment, by putting the image name as `nginx:1.161` instead of `nginx:1.16.1`:
276
+ ```shell
277
+ kubectl set image deployment/nginx-deployment nginx=nginx:1.161
278
+ ```
279
+ The output is similar to this:
280
+ ```
281
+ deployment.apps/nginx-deployment image updated
282
+ ```
283
+ - The rollout gets stuck. You can verify it by checking the rollout status:
284
+ ```shell
285
+ kubectl rollout status deployment/nginx-deployment
286
+ ```
287
+ The output is similar to this:
288
+ ```
289
+ Waiting for rollout to finish: 1 out of 3 new replicas have been updated...
290
+ ```
291
+ - Press Ctrl-C to stop the above rollout status watch. For more information on stuck rollouts, [read more here](#deployment-status).
292
+ - You see that the number of old replicas (adding the replica count from `nginx-deployment-1564180365` and `nginx-deployment-2035384211`) is 3, and the number of new replicas (from `nginx-deployment-3066724191`) is 1.
293
+ ```shell
294
+ kubectl get rs
295
+ ```
296
+ The output is similar to this:
297
+ ```
298
+ NAME DESIRED CURRENT READY AGE
299
+ nginx-deployment-1564180365 3 3 3 25s
300
+ nginx-deployment-2035384211 0 0 0 36s
301
+ nginx-deployment-3066724191 1 1 0 6s
302
+ ```
303
+ - Looking at the Pods created, you see that 1 Pod created by new ReplicaSet is stuck in an image pull loop.
304
+ ```shell
305
+ kubectl get pods
306
+ ```
307
+ The output is similar to this:
308
+ ```
309
+ NAME READY STATUS RESTARTS AGE
310
+ nginx-deployment-1564180365-70iae 1/1 Running 0 25s
311
+ nginx-deployment-1564180365-jbqqo 1/1 Running 0 25s
312
+ nginx-deployment-1564180365-hysrc 1/1 Running 0 25s
313
+ nginx-deployment-3066724191-08mng 0/1 ImagePullBackOff 0 6s
314
+ ```
315
+ > [!info] Note:
316
+ > The Deployment controller stops the bad rollout automatically, and stops scaling up the new ReplicaSet. This depends on the rollingUpdate parameters (`maxUnavailable` specifically) that you have specified. Kubernetes by default sets the value to 25%.
317
+ - Get the description of the Deployment:
318
+ ```shell
319
+ kubectl describe deployment
320
+ ```
321
+ The output is similar to this:
322
+ ```
323
+ Name: nginx-deployment
324
+ Namespace: default
325
+ CreationTimestamp: Tue, 15 Mar 2016 14:48:04 -0700
326
+ Labels: app=nginx
327
+ Selector: app=nginx
328
+ Replicas: 3 desired | 1 updated | 4 total | 3 available | 1 unavailable
329
+ StrategyType: RollingUpdate
330
+ MinReadySeconds: 0
331
+ RollingUpdateStrategy: 25% max unavailable, 25% max surge
332
+ Pod Template:
333
+ Labels: app=nginx
334
+ Containers:
335
+ nginx:
336
+ Image: nginx:1.161
337
+ Port: 80/TCP
338
+ Host Port: 0/TCP
339
+ Environment: <none>
340
+ Mounts: <none>
341
+ Volumes: <none>
342
+ Conditions:
343
+ Type Status Reason
344
+ ---- ------ ------
345
+ Available True MinimumReplicasAvailable
346
+ Progressing True ReplicaSetUpdated
347
+ OldReplicaSets: nginx-deployment-1564180365 (3/3 replicas created)
348
+ NewReplicaSet: nginx-deployment-3066724191 (1/1 replicas created)
349
+ Events:
350
+ FirstSeen LastSeen Count From SubObjectPath Type Reason Message
351
+ --------- -------- ----- ---- ------------- -------- ------ -------
352
+ 1m 1m 1 {deployment-controller } Normal ScalingReplicaSet Scaled up replica set nginx-deployment-2035384211 to 3
353
+ 22s 22s 1 {deployment-controller } Normal ScalingReplicaSet Scaled up replica set nginx-deployment-1564180365 to 1
354
+ 22s 22s 1 {deployment-controller } Normal ScalingReplicaSet Scaled down replica set nginx-deployment-2035384211 to 2
355
+ 22s 22s 1 {deployment-controller } Normal ScalingReplicaSet Scaled up replica set nginx-deployment-1564180365 to 2
356
+ 21s 21s 1 {deployment-controller } Normal ScalingReplicaSet Scaled down replica set nginx-deployment-2035384211 to 1
357
+ 21s 21s 1 {deployment-controller } Normal ScalingReplicaSet Scaled up replica set nginx-deployment-1564180365 to 3
358
+ 13s 13s 1 {deployment-controller } Normal ScalingReplicaSet Scaled down replica set nginx-deployment-2035384211 to 0
359
+ 13s 13s 1 {deployment-controller } Normal ScalingReplicaSet Scaled up replica set nginx-deployment-3066724191 to 1
360
+ ```
361
+ To fix this, you need to rollback to a previous revision of Deployment that is stable.
362
+
363
+ ### Checking Rollout History of a Deployment
364
+
365
+ Follow the steps given below to check the rollout history:
366
+
367
+ 1. First, check the revisions of this Deployment:
368
+ ```shell
369
+ kubectl rollout history deployment/nginx-deployment
370
+ ```
371
+ The output is similar to this:
372
+ ```
373
+ deployments "nginx-deployment"
374
+ REVISION CHANGE-CAUSE
375
+ 1 <none>
376
+ 2 <none>
377
+ 3 <none>
378
+ ```
379
+ `CHANGE-CAUSE` is copied from the Deployment annotation `kubernetes.io/change-cause` to its revisions upon creation. You can specify the `CHANGE-CAUSE` message by:
380
+ - Annotating the Deployment with `kubectl annotate deployment/nginx-deployment kubernetes.io/change-cause="image updated to 1.16.1"`
381
+ - Manually editing the manifest of the resource.
382
+ - Using tooling that sets the annotation automatically.
383
+ > [!info] Note:
384
+ > In older versions of Kubernetes, you could use the `--record` flag with kubectl commands to automatically populate the `CHANGE-CAUSE` field. This flag is deprecated and will be removed in a future release.
385
+ 2. To see the details of each revision, run:
386
+ ```shell
387
+ kubectl rollout history deployment/nginx-deployment --revision=2
388
+ ```
389
+ The output is similar to this:
390
+ ```
391
+ deployments "nginx-deployment" revision 2
392
+ Labels: app=nginx
393
+ pod-template-hash=1159050644
394
+ Containers:
395
+ nginx:
396
+ Image: nginx:1.16.1
397
+ Port: 80/TCP
398
+ QoS Tier:
399
+ cpu: BestEffort
400
+ memory: BestEffort
401
+ Environment Variables: <none>
402
+ No volumes.
403
+ ```
404
+
405
+ ### Rolling Back to a Previous Revision
406
+
407
+ Follow the steps given below to rollback the Deployment from the current version to the previous version, which is version 2.
408
+
409
+ 1. Now you've decided to undo the current rollout and rollback to the previous revision:
410
+ ```shell
411
+ kubectl rollout undo deployment/nginx-deployment
412
+ ```
413
+ The output is similar to this:
414
+ ```
415
+ deployment.apps/nginx-deployment rolled back
416
+ ```
417
+ Alternatively, you can rollback to a specific revision by specifying it with `--to-revision`:
418
+ ```shell
419
+ kubectl rollout undo deployment/nginx-deployment --to-revision=2
420
+ ```
421
+ The output is similar to this:
422
+ ```
423
+ deployment.apps/nginx-deployment rolled back
424
+ ```
425
+ For more details about rollout related commands, read [`kubectl rollout`](https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#rollout).
426
+ The Deployment is now rolled back to a previous stable revision. As you can see, a `DeploymentRollback` event for rolling back to revision 2 is generated from Deployment controller.
427
+ 2. Check if the rollback was successful and the Deployment is running as expected, run:
428
+ ```shell
429
+ kubectl get deployment nginx-deployment
430
+ ```
431
+ The output is similar to this:
432
+ ```
433
+ NAME READY UP-TO-DATE AVAILABLE AGE
434
+ nginx-deployment 3/3 3 3 30m
435
+ ```
436
+ 3. Get the description of the Deployment:
437
+ ```shell
438
+ kubectl describe deployment nginx-deployment
439
+ ```
440
+ The output is similar to this:
441
+ ```
442
+ Name: nginx-deployment
443
+ Namespace: default
444
+ CreationTimestamp: Sun, 02 Sep 2018 18:17:55 -0500
445
+ Labels: app=nginx
446
+ Annotations: deployment.kubernetes.io/revision=4
447
+ Selector: app=nginx
448
+ Replicas: 3 desired | 3 updated | 3 total | 3 available | 0 unavailable
449
+ StrategyType: RollingUpdate
450
+ MinReadySeconds: 0
451
+ RollingUpdateStrategy: 25% max unavailable, 25% max surge
452
+ Pod Template:
453
+ Labels: app=nginx
454
+ Containers:
455
+ nginx:
456
+ Image: nginx:1.16.1
457
+ Port: 80/TCP
458
+ Host Port: 0/TCP
459
+ Environment: <none>
460
+ Mounts: <none>
461
+ Volumes: <none>
462
+ Conditions:
463
+ Type Status Reason
464
+ ---- ------ ------
465
+ Available True MinimumReplicasAvailable
466
+ Progressing True NewReplicaSetAvailable
467
+ OldReplicaSets: <none>
468
+ NewReplicaSet: nginx-deployment-c4747d96c (3/3 replicas created)
469
+ Events:
470
+ Type Reason Age From Message
471
+ ---- ------ ---- ---- -------
472
+ Normal ScalingReplicaSet 12m deployment-controller Scaled up replica set nginx-deployment-75675f5897 to 3
473
+ Normal ScalingReplicaSet 11m deployment-controller Scaled up replica set nginx-deployment-c4747d96c to 1
474
+ Normal ScalingReplicaSet 11m deployment-controller Scaled down replica set nginx-deployment-75675f5897 to 2
475
+ Normal ScalingReplicaSet 11m deployment-controller Scaled up replica set nginx-deployment-c4747d96c to 2
476
+ Normal ScalingReplicaSet 11m deployment-controller Scaled down replica set nginx-deployment-75675f5897 to 1
477
+ Normal ScalingReplicaSet 11m deployment-controller Scaled up replica set nginx-deployment-c4747d96c to 3
478
+ Normal ScalingReplicaSet 11m deployment-controller Scaled down replica set nginx-deployment-75675f5897 to 0
479
+ Normal ScalingReplicaSet 11m deployment-controller Scaled up replica set nginx-deployment-595696685f to 1
480
+ Normal DeploymentRollback 15s deployment-controller Rolled back deployment "nginx-deployment" to revision 2
481
+ Normal ScalingReplicaSet 15s deployment-controller Scaled down replica set nginx-deployment-595696685f to 0
482
+ ```
483
+
484
+ ## Scaling a Deployment
485
+
486
+ You can scale a Deployment by using the following command:
487
+
488
+ ```shell
489
+ kubectl scale deployment/nginx-deployment --replicas=10
490
+ ```
491
+
492
+ The output is similar to this:
493
+
494
+ ```
495
+ deployment.apps/nginx-deployment scaled
496
+ ```
497
+
498
+ Assuming [horizontal Pod autoscaling](https://kubernetes.io/docs/concepts/workloads/autoscaling/horizontal-pod-autoscale/) is enabled in your cluster, you can set up an autoscaler for your Deployment and choose the minimum and maximum number of Pods you want to run based on the CPU utilization of your existing Pods.
499
+
500
+ ```shell
501
+ kubectl autoscale deployment/nginx-deployment --min=10 --max=15 --cpu-percent=80%
502
+ ```
503
+
504
+ The output is similar to this:
505
+
506
+ ```
507
+ deployment.apps/nginx-deployment scaled
508
+ ```
509
+
510
+ ### Proportional scaling
511
+
512
+ RollingUpdate Deployments support running multiple versions of an application at the same time. When you or an autoscaler scales a RollingUpdate Deployment that is in the middle of a rollout (either in progress or paused), the Deployment controller balances the additional replicas in the existing active ReplicaSets (ReplicaSets with Pods) in order to mitigate risk. This is called *proportional scaling*.
513
+
514
+ For example, you are running a Deployment with 10 replicas, [maxSurge](#max-surge) =3, and [maxUnavailable](#max-unavailable) =2.
515
+
516
+ - Ensure that the 10 replicas in your Deployment are running.
517
+ ```shell
518
+ kubectl get deploy
519
+ ```
520
+ The output is similar to this:
521
+ ```
522
+ NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
523
+ nginx-deployment 10 10 10 10 50s
524
+ ```
525
+ - You update to a new image which happens to be unresolvable from inside the cluster.
526
+ ```shell
527
+ kubectl set image deployment/nginx-deployment nginx=nginx:sometag
528
+ ```
529
+ The output is similar to this:
530
+ ```
531
+ deployment.apps/nginx-deployment image updated
532
+ ```
533
+ - The image update starts a new rollout with ReplicaSet nginx-deployment-1989198191, but it's blocked due to the `maxUnavailable` requirement that you mentioned above. Check out the rollout status:
534
+ ```shell
535
+ kubectl get rs
536
+ ```
537
+ The output is similar to this:
538
+ ```
539
+ NAME DESIRED CURRENT READY AGE
540
+ nginx-deployment-1989198191 5 5 0 9s
541
+ nginx-deployment-618515232 8 8 8 1m
542
+ ```
543
+ - Then a new scaling request for the Deployment comes along. The autoscaler increments the Deployment replicas to 15. The Deployment controller needs to decide where to add these new 5 replicas. If you weren't using proportional scaling, all 5 of them would be added in the new ReplicaSet. With proportional scaling, you spread the additional replicas across all ReplicaSets. Bigger proportions go to the ReplicaSets with the most replicas and lower proportions go to ReplicaSets with less replicas. Any leftovers are added to the ReplicaSet with the most replicas. ReplicaSets with zero replicas are not scaled up.
544
+
545
+ In our example above, 3 replicas are added to the old ReplicaSet and 2 replicas are added to the new ReplicaSet. The rollout process should eventually move all replicas to the new ReplicaSet, assuming the new replicas become healthy. To confirm this, run:
546
+
547
+ ```shell
548
+ kubectl get deploy
549
+ ```
550
+
551
+ The output is similar to this:
552
+
553
+ ```
554
+ NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
555
+ nginx-deployment 15 18 7 8 7m
556
+ ```
557
+
558
+ The rollout status confirms how the replicas were added to each ReplicaSet.
559
+
560
+ ```shell
561
+ kubectl get rs
562
+ ```
563
+
564
+ The output is similar to this:
565
+
566
+ ```
567
+ NAME DESIRED CURRENT READY AGE
568
+ nginx-deployment-1989198191 7 7 0 7m
569
+ nginx-deployment-618515232 11 11 11 7m
570
+ ```
571
+
572
+ ## Pausing and Resuming a rollout of a Deployment
573
+
574
+ When you update a Deployment, or plan to, you can pause rollouts for that Deployment before you trigger one or more updates. When you're ready to apply those changes, you resume rollouts for the Deployment. This approach allows you to apply multiple fixes in between pausing and resuming without triggering unnecessary rollouts.
575
+
576
+ - For example, with a Deployment that was created:
577
+ Get the Deployment details:
578
+ ```shell
579
+ kubectl get deploy
580
+ ```
581
+ The output is similar to this:
582
+ ```
583
+ NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
584
+ nginx 3 3 3 3 1m
585
+ ```
586
+ Get the rollout status:
587
+ ```shell
588
+ kubectl get rs
589
+ ```
590
+ The output is similar to this:
591
+ ```
592
+ NAME DESIRED CURRENT READY AGE
593
+ nginx-2142116321 3 3 3 1m
594
+ ```
595
+ - Pause by running the following command:
596
+ ```shell
597
+ kubectl rollout pause deployment/nginx-deployment
598
+ ```
599
+ The output is similar to this:
600
+ ```
601
+ deployment.apps/nginx-deployment paused
602
+ ```
603
+ - Then update the image of the Deployment:
604
+ ```shell
605
+ kubectl set image deployment/nginx-deployment nginx=nginx:1.16.1
606
+ ```
607
+ The output is similar to this:
608
+ ```
609
+ deployment.apps/nginx-deployment image updated
610
+ ```
611
+ - Notice that no new rollout started:
612
+ ```shell
613
+ kubectl rollout history deployment/nginx-deployment
614
+ ```
615
+ The output is similar to this:
616
+ ```
617
+ deployments "nginx"
618
+ REVISION CHANGE-CAUSE
619
+ 1 <none>
620
+ ```
621
+ - Get the rollout status to verify that the existing ReplicaSet has not changed:
622
+ ```shell
623
+ kubectl get rs
624
+ ```
625
+ The output is similar to this:
626
+ ```
627
+ NAME DESIRED CURRENT READY AGE
628
+ nginx-2142116321 3 3 3 2m
629
+ ```
630
+ - You can make as many updates as you wish, for example, update the resources that will be used:
631
+ ```shell
632
+ kubectl set resources deployment/nginx-deployment -c=nginx --limits=cpu=200m,memory=512Mi
633
+ ```
634
+ The output is similar to this:
635
+ ```
636
+ deployment.apps/nginx-deployment resource requirements updated
637
+ ```
638
+ The initial state of the Deployment prior to pausing its rollout will continue its function, but new updates to the Deployment will not have any effect as long as the Deployment rollout is paused.
639
+ - Eventually, resume the Deployment rollout and observe a new ReplicaSet coming up with all the new updates:
640
+ ```shell
641
+ kubectl rollout resume deployment/nginx-deployment
642
+ ```
643
+ The output is similar to this:
644
+ ```
645
+ deployment.apps/nginx-deployment resumed
646
+ ```
647
+ - [Watch](https://kubernetes.io/docs/reference/using-api/api-concepts/#api-verbs "A verb that is used to track changes to an object in Kubernetes as a stream.") the status of the rollout until it's done.
648
+ ```shell
649
+ kubectl get rs --watch
650
+ ```
651
+ The output is similar to this:
652
+ ```
653
+ NAME DESIRED CURRENT READY AGE
654
+ nginx-2142116321 2 2 2 2m
655
+ nginx-3926361531 2 2 0 6s
656
+ nginx-3926361531 2 2 1 18s
657
+ nginx-2142116321 1 2 2 2m
658
+ nginx-2142116321 1 2 2 2m
659
+ nginx-3926361531 3 2 1 18s
660
+ nginx-3926361531 3 2 1 18s
661
+ nginx-2142116321 1 1 1 2m
662
+ nginx-3926361531 3 3 1 18s
663
+ nginx-3926361531 3 3 2 19s
664
+ nginx-2142116321 0 1 1 2m
665
+ nginx-2142116321 0 1 1 2m
666
+ nginx-2142116321 0 0 0 2m
667
+ nginx-3926361531 3 3 3 20s
668
+ ```
669
+ - Get the status of the latest rollout:
670
+ ```shell
671
+ kubectl get rs
672
+ ```
673
+ The output is similar to this:
674
+ ```
675
+ NAME DESIRED CURRENT READY AGE
676
+ nginx-2142116321 0 0 0 2m
677
+ nginx-3926361531 3 3 3 28s
678
+ ```
679
+
680
+ > [!info] Note:
681
+ > You cannot rollback a paused Deployment until you resume it.
682
+
683
+ ## Deployment status
684
+
685
+ A Deployment enters various states during its lifecycle. It can be [progressing](#progressing-deployment) while rolling out a new ReplicaSet, it can be [complete](#complete-deployment), or it can [fail to progress](#failed-deployment).
686
+
687
+ ### Progressing Deployment
688
+
689
+ Kubernetes marks a Deployment as *progressing* when one of the following tasks is performed:
690
+
691
+ - The Deployment creates a new ReplicaSet.
692
+ - The Deployment is scaling up its newest ReplicaSet.
693
+ - The Deployment is scaling down its older ReplicaSet(s).
694
+ - New Pods become ready or available (ready for at least [MinReadySeconds](#min-ready-seconds)).
695
+
696
+ When the rollout becomes “progressing”, the Deployment controller adds a condition with the following attributes to the Deployment's `.status.conditions`:
697
+
698
+ - `type: Progressing`
699
+ - `status: "True"`
700
+ - `reason: NewReplicaSetCreated` | `reason: FoundNewReplicaSet` | `reason: ReplicaSetUpdated`
701
+
702
+ You can monitor the progress for a Deployment by using `kubectl rollout status`.
703
+
704
+ ### Complete Deployment
705
+
706
+ Kubernetes marks a Deployment as *complete* when it has the following characteristics:
707
+
708
+ - All of the replicas associated with the Deployment have been updated to the latest version you've specified, meaning any updates you've requested have been completed.
709
+ - All of the replicas associated with the Deployment are available.
710
+ - No old replicas for the Deployment are running.
711
+
712
+ When the rollout becomes “complete”, the Deployment controller sets a condition with the following attributes to the Deployment's `.status.conditions`:
713
+
714
+ - `type: Progressing`
715
+ - `status: "True"`
716
+ - `reason: NewReplicaSetAvailable`
717
+
718
+ This `Progressing` condition will retain a status value of `"True"` until a new rollout is initiated. The condition holds even when availability of replicas changes (which does instead affect the `Available` condition).
719
+
720
+ You can check if a Deployment has completed by using `kubectl rollout status`. If the rollout completed successfully, `kubectl rollout status` returns a zero exit code.
721
+
722
+ ```shell
723
+ kubectl rollout status deployment/nginx-deployment
724
+ ```
725
+
726
+ The output is similar to this:
727
+
728
+ ```
729
+ Waiting for rollout to finish: 2 of 3 updated replicas are available...
730
+ deployment "nginx-deployment" successfully rolled out
731
+ ```
732
+
733
+ and the exit status from `kubectl rollout` is 0 (success):
734
+
735
+ ```shell
736
+ echo $?
737
+ ```
738
+ ```
739
+ 0
740
+ ```
741
+
742
+ ### Failed Deployment
743
+
744
+ Your Deployment may get stuck trying to deploy its newest ReplicaSet without ever completing. This can occur due to some of the following factors:
745
+
746
+ - Insufficient quota
747
+ - Readiness probe failures
748
+ - Image pull errors
749
+ - Insufficient permissions
750
+ - Limit ranges
751
+ - Application runtime misconfiguration
752
+
753
+ One way you can detect this condition is to specify a deadline parameter in your Deployment spec: ([`.spec.progressDeadlineSeconds`](#progress-deadline-seconds)). `.spec.progressDeadlineSeconds` denotes the number of seconds the Deployment controller waits before indicating (in the Deployment status) that the Deployment progress has stalled.
754
+
755
+ The following `kubectl` command sets the spec with `progressDeadlineSeconds` to make the controller report lack of progress of a rollout for a Deployment after 10 minutes:
756
+
757
+ ```shell
758
+ kubectl patch deployment/nginx-deployment -p '{"spec":{"progressDeadlineSeconds":600}}'
759
+ ```
760
+
761
+ The output is similar to this:
762
+
763
+ ```
764
+ deployment.apps/nginx-deployment patched
765
+ ```
766
+
767
+ Once the deadline has been exceeded, the Deployment controller adds a DeploymentCondition with the following attributes to the Deployment's `.status.conditions`:
768
+
769
+ - `type: Progressing`
770
+ - `status: "False"`
771
+ - `reason: ProgressDeadlineExceeded`
772
+
773
+ This condition can also fail early and is then set to status value of `"False"` due to reasons as `ReplicaSetCreateError`. Also, the deadline is not taken into account anymore once the Deployment rollout completes.
774
+
775
+ See the [Kubernetes API conventions](https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#typical-status-properties) for more information on status conditions.
776
+
777
+ > [!info] Note:
778
+ > Kubernetes takes no action on a stalled Deployment other than to report a status condition with `reason: ProgressDeadlineExceeded`. Higher level orchestrators can take advantage of it and act accordingly, for example, rollback the Deployment to its previous version.
779
+
780
+ > [!info] Note:
781
+ > If you pause a Deployment rollout, Kubernetes does not check progress against your specified deadline. You can safely pause a Deployment rollout in the middle of a rollout and resume without triggering the condition for exceeding the deadline.
782
+
783
+ You may experience transient errors with your Deployments, either due to a low timeout that you have set or due to any other kind of error that can be treated as transient. For example, let's suppose you have insufficient quota. If you describe the Deployment you will notice the following section:
784
+
785
+ ```shell
786
+ kubectl describe deployment nginx-deployment
787
+ ```
788
+
789
+ The output is similar to this:
790
+
791
+ ```
792
+ <...>
793
+ Conditions:
794
+ Type Status Reason
795
+ ---- ------ ------
796
+ Available True MinimumReplicasAvailable
797
+ Progressing True ReplicaSetUpdated
798
+ ReplicaFailure True FailedCreate
799
+ <...>
800
+ ```
801
+
802
+ If you run `kubectl get deployment nginx-deployment -o yaml`, the Deployment status is similar to this:
803
+
804
+ ```
805
+ status:
806
+ availableReplicas: 2
807
+ conditions:
808
+ - lastTransitionTime: 2016-10-04T12:25:39Z
809
+ lastUpdateTime: 2016-10-04T12:25:39Z
810
+ message: Replica set "nginx-deployment-4262182780" is progressing.
811
+ reason: ReplicaSetUpdated
812
+ status: "True"
813
+ type: Progressing
814
+ - lastTransitionTime: 2016-10-04T12:25:42Z
815
+ lastUpdateTime: 2016-10-04T12:25:42Z
816
+ message: Deployment has minimum availability.
817
+ reason: MinimumReplicasAvailable
818
+ status: "True"
819
+ type: Available
820
+ - lastTransitionTime: 2016-10-04T12:25:39Z
821
+ lastUpdateTime: 2016-10-04T12:25:39Z
822
+ message: 'Error creating: pods "nginx-deployment-4262182780-" is forbidden: exceeded quota:
823
+ object-counts, requested: pods=1, used: pods=3, limited: pods=2'
824
+ reason: FailedCreate
825
+ status: "True"
826
+ type: ReplicaFailure
827
+ observedGeneration: 3
828
+ replicas: 2
829
+ unavailableReplicas: 2
830
+ ```
831
+
832
+ Eventually, once the Deployment progress deadline is exceeded, Kubernetes updates the status and the reason for the Progressing condition:
833
+
834
+ ```
835
+ Conditions:
836
+ Type Status Reason
837
+ ---- ------ ------
838
+ Available True MinimumReplicasAvailable
839
+ Progressing False ProgressDeadlineExceeded
840
+ ReplicaFailure True FailedCreate
841
+ ```
842
+
843
+ You can address an issue of insufficient quota by scaling down your Deployment, by scaling down other controllers you may be running, or by increasing quota in your namespace. If you satisfy the quota conditions and the Deployment controller then completes the Deployment rollout, you'll see the Deployment's status update with a successful condition (`status: "True"` and `reason: NewReplicaSetAvailable`).
844
+
845
+ ```
846
+ Conditions:
847
+ Type Status Reason
848
+ ---- ------ ------
849
+ Available True MinimumReplicasAvailable
850
+ Progressing True NewReplicaSetAvailable
851
+ ```
852
+
853
+ `type: Available` with `status: "True"` means that your Deployment has minimum availability. Minimum availability is dictated by the parameters specified in the deployment strategy. `type: Progressing` with `status: "True"` means that your Deployment is either in the middle of a rollout and it is progressing or that it has successfully completed its progress and the minimum required new replicas are available (see the Reason of the condition for the particulars - in our case `reason: NewReplicaSetAvailable` means that the Deployment is complete).
854
+
855
+ You can check if a Deployment has failed to progress by using `kubectl rollout status`. `kubectl rollout status` returns a non-zero exit code if the Deployment has exceeded the progression deadline.
856
+
857
+ ```shell
858
+ kubectl rollout status deployment/nginx-deployment
859
+ ```
860
+
861
+ The output is similar to this:
862
+
863
+ ```
864
+ Waiting for rollout to finish: 2 out of 3 new replicas have been updated...
865
+ error: deployment "nginx" exceeded its progress deadline
866
+ ```
867
+
868
+ and the exit status from `kubectl rollout` is 1 (indicating an error):
869
+
870
+ ```shell
871
+ echo $?
872
+ ```
873
+ ```
874
+ 1
875
+ ```
876
+
877
+ ### Operating on a failed deployment
878
+
879
+ All actions that apply to a complete Deployment also apply to a failed Deployment. You can scale it up/down, roll back to a previous revision, or even pause it if you need to apply multiple tweaks in the Deployment Pod template.
880
+
881
+ ## Clean up Policy
882
+
883
+ You can set `.spec.revisionHistoryLimit` field in a Deployment to specify how many old ReplicaSets for this Deployment you want to retain. The rest will be garbage-collected in the background. By default, it is 10.
884
+
885
+ > [!info] Note:
886
+ > Explicitly setting this field to 0, will result in cleaning up all the history of your Deployment thus that Deployment will not be able to roll back.
887
+
888
+ The cleanup only starts **after** a Deployment reaches a [complete state](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#complete-deployment). If you set `.spec.revisionHistoryLimit` to 0, any rollout nonetheless triggers creation of a new ReplicaSet before Kubernetes removes the old one.
889
+
890
+ Even with a non-zero revision history limit, you can have more ReplicaSets than the limit you configure. For example, if pods are crash looping, and there are multiple rolling updates events triggered over time, you might end up with more ReplicaSets than the `.spec.revisionHistoryLimit` because the Deployment never reaches a complete state.
891
+
892
+ ## Canary Deployment
893
+
894
+ If you want to roll out releases to a subset of users or servers using the Deployment, you can create multiple Deployments, one for each release, following the canary pattern described in [managing resources](https://kubernetes.io/docs/concepts/workloads/management/#canary-deployments).
895
+
896
+ ## Writing a Deployment Spec
897
+
898
+ As with all other Kubernetes configs, a Deployment needs `.apiVersion`, `.kind`, and `.metadata` fields. For general information about working with config files, see [deploying applications](https://kubernetes.io/docs/tasks/run-application/run-stateless-application-deployment/), configuring containers, and [using kubectl to manage resources](https://kubernetes.io/docs/concepts/overview/working-with-objects/object-management/) documents.
899
+
900
+ When the control plane creates new Pods for a Deployment, the `.metadata.name` of the Deployment is part of the basis for naming those Pods. The name of a Deployment must be a valid [DNS subdomain](https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#dns-subdomain-names) value, but this can produce unexpected results for the Pod hostnames. For best compatibility, the name should follow the more restrictive rules for a [DNS label](https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#dns-label-names).
901
+
902
+ A Deployment also needs a [`.spec` section](https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status).
903
+
904
+ ### Pod Template
905
+
906
+ The `.spec.template` and `.spec.selector` are the only required fields of the `.spec`.
907
+
908
+ The `.spec.template` is a [Pod template](https://kubernetes.io/docs/concepts/workloads/pods/#pod-templates). It has exactly the same schema as a [Pod](https://kubernetes.io/docs/concepts/workloads/pods/ "A Pod represents a set of running containers in your cluster."), except it is nested and does not have an `apiVersion` or `kind`.
909
+
910
+ In addition to required fields for a Pod, a Pod template in a Deployment must specify appropriate labels and an appropriate restart policy. For labels, make sure not to overlap with other controllers. See [selector](#selector).
911
+
912
+ Only a [`.spec.template.spec.restartPolicy`](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#restart-policy) equal to `Always` is allowed, which is the default if not specified.
913
+
914
+ ### Replicas
915
+
916
+ `.spec.replicas` is an optional field that specifies the number of desired Pods. It defaults to 1.
917
+
918
+ Should you manually scale a Deployment, example via `kubectl scale deployment deployment --replicas=X`, and then you update that Deployment based on a manifest (for example: by running `kubectl apply -f deployment.yaml`), then applying that manifest overwrites the manual scaling that you previously did.
919
+
920
+ If a [HorizontalPodAutoscaler](https://kubernetes.io/docs/concepts/workloads/autoscaling/horizontal-pod-autoscale/) (or any similar API for horizontal scaling) is managing scaling for a Deployment, don't set `.spec.replicas`.
921
+
922
+ Instead, allow the Kubernetes [control plane](https://kubernetes.io/docs/reference/glossary/?all=true#term-control-plane "The container orchestration layer that exposes the API and interfaces to define, deploy, and manage the lifecycle of containers.") to manage the `.spec.replicas` field automatically.
923
+
924
+ ### Selector
925
+
926
+ `.spec.selector` is a required field that specifies a [label selector](https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/) for the Pods targeted by this Deployment.
927
+
928
+ `.spec.selector` must match `.spec.template.metadata.labels`, or it will be rejected by the API.
929
+
930
+ In API version `apps/v1`, `.spec.selector` and `.metadata.labels` do not default to `.spec.template.metadata.labels` if not set. So they must be set explicitly. Also note that `.spec.selector` is immutable after creation of the Deployment in `apps/v1`.
931
+
932
+ A Deployment may terminate Pods whose labels match the selector if their template is different from `.spec.template` or if the total number of such Pods exceeds `.spec.replicas`. It brings up new Pods with `.spec.template` if the number of Pods is less than the desired number.
933
+
934
+ > [!info] Note:
935
+ > You should not create other Pods whose labels match this selector, either directly, by creating another Deployment, or by creating another controller such as a ReplicaSet or a ReplicationController. If you do so, the first Deployment thinks that it created these other Pods. Kubernetes does not stop you from doing this.
936
+
937
+ If you have multiple controllers that have overlapping selectors, the controllers will fight with each other and won't behave correctly.
938
+
939
+ ### Strategy
940
+
941
+ `.spec.strategy` specifies the strategy used to replace old Pods by new ones. `.spec.strategy.type` can be "Recreate" or "RollingUpdate". "RollingUpdate" is the default value.
942
+
943
+ #### Recreate Deployment
944
+
945
+ All existing Pods are killed before new ones are created when `.spec.strategy.type==Recreate`.
946
+
947
+ > [!info] Note:
948
+ > This will only guarantee Pod termination previous to creation for upgrades. If you upgrade a Deployment, all Pods of the old revision will be terminated immediately. Successful removal is awaited before any Pod of the new revision is created. If you manually delete a Pod, the lifecycle is controlled by the ReplicaSet and the replacement will be created immediately (even if the old Pod is still in a Terminating state). If you need an "at most" guarantee for your Pods, you should consider using a [StatefulSet](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/).
949
+
950
+ #### Rolling Update Deployment
951
+
952
+ The Deployment updates Pods in a rolling update fashion (gradually scale down the old ReplicaSets and scale up the new one) when `.spec.strategy.type==RollingUpdate`. You can specify `maxUnavailable` and `maxSurge` to control the rolling update process.
953
+
954
+ ##### Max Unavailable
955
+
956
+ `.spec.strategy.rollingUpdate.maxUnavailable` is an optional field that specifies the maximum number of Pods that can be unavailable during the update process. The value can be an absolute number (for example, 5) or a percentage of desired Pods (for example, 10%). The absolute number is calculated from percentage by rounding down. The value cannot be 0 if `.spec.strategy.rollingUpdate.maxSurge` is 0. The default value is 25%.
957
+
958
+ For example, when this value is set to 30%, the old ReplicaSet can be scaled down to 70% of desired Pods immediately when the rolling update starts. Once new Pods are ready, old ReplicaSet can be scaled down further, followed by scaling up the new ReplicaSet, ensuring that the total number of Pods available at all times during the update is at least 70% of the desired Pods.
959
+
960
+ ##### Max Surge
961
+
962
+ `.spec.strategy.rollingUpdate.maxSurge` is an optional field that specifies the maximum number of Pods that can be created over the desired number of Pods. The value can be an absolute number (for example, 5) or a percentage of desired Pods (for example, 10%). The value cannot be 0 if `maxUnavailable` is 0. The absolute number is calculated from the percentage by rounding up. The default value is 25%.
963
+
964
+ For example, when this value is set to 30%, the new ReplicaSet can be scaled up immediately when the rolling update starts, such that the total number of old and new Pods does not exceed 130% of desired Pods. Once old Pods have been killed, the new ReplicaSet can be scaled up further, ensuring that the total number of Pods running at any time during the update is at most 130% of desired Pods.
965
+
966
+ Here are some Rolling Update Deployment examples that use the `maxUnavailable` and `maxSurge`:
967
+
968
+ ```yaml
969
+ apiVersion: apps/v1
970
+ kind: Deployment
971
+ metadata:
972
+ name: nginx-deployment
973
+ labels:
974
+ app: nginx
975
+ spec:
976
+ replicas: 3
977
+ selector:
978
+ matchLabels:
979
+ app: nginx
980
+ template:
981
+ metadata:
982
+ labels:
983
+ app: nginx
984
+ spec:
985
+ containers:
986
+ - name: nginx
987
+ image: nginx:1.14.2
988
+ ports:
989
+ - containerPort: 80
990
+ strategy:
991
+ type: RollingUpdate
992
+ rollingUpdate:
993
+ maxUnavailable: 1
994
+ ```
995
+
996
+ ```yaml
997
+ apiVersion: apps/v1
998
+ kind: Deployment
999
+ metadata:
1000
+ name: nginx-deployment
1001
+ labels:
1002
+ app: nginx
1003
+ spec:
1004
+ replicas: 3
1005
+ selector:
1006
+ matchLabels:
1007
+ app: nginx
1008
+ template:
1009
+ metadata:
1010
+ labels:
1011
+ app: nginx
1012
+ spec:
1013
+ containers:
1014
+ - name: nginx
1015
+ image: nginx:1.14.2
1016
+ ports:
1017
+ - containerPort: 80
1018
+ strategy:
1019
+ type: RollingUpdate
1020
+ rollingUpdate:
1021
+ maxSurge: 1
1022
+ ```
1023
+
1024
+ ```yaml
1025
+ apiVersion: apps/v1
1026
+ kind: Deployment
1027
+ metadata:
1028
+ name: nginx-deployment
1029
+ labels:
1030
+ app: nginx
1031
+ spec:
1032
+ replicas: 3
1033
+ selector:
1034
+ matchLabels:
1035
+ app: nginx
1036
+ template:
1037
+ metadata:
1038
+ labels:
1039
+ app: nginx
1040
+ spec:
1041
+ containers:
1042
+ - name: nginx
1043
+ image: nginx:1.14.2
1044
+ ports:
1045
+ - containerPort: 80
1046
+ strategy:
1047
+ type: RollingUpdate
1048
+ rollingUpdate:
1049
+ maxSurge: 1
1050
+ maxUnavailable: 1
1051
+ ```
1052
+
1053
+ ### Progress Deadline Seconds
1054
+
1055
+ `.spec.progressDeadlineSeconds` is an optional field that specifies the number of seconds you want to wait for your Deployment to progress before the system reports back that the Deployment has [failed progressing](#failed-deployment) - surfaced as a condition with `type: Progressing`, `status: "False"`. and `reason: ProgressDeadlineExceeded` in the status of the resource. The Deployment controller will keep retrying the Deployment. This defaults to 600. In the future, once automatic rollback will be implemented, the Deployment controller will roll back a Deployment as soon as it observes such a condition.
1056
+
1057
+ If specified, this field needs to be greater than `.spec.minReadySeconds`.
1058
+
1059
+ ### Min Ready Seconds
1060
+
1061
+ `.spec.minReadySeconds` is an optional field that specifies the minimum number of seconds for which a newly created Pod should be ready without any of its containers crashing, for it to be considered available. This defaults to 0 (the Pod will be considered available as soon as it is ready). To learn more about when a Pod is considered ready, see [Container Probes](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#container-probes).
1062
+
1063
+ ### Terminating Pods
1064
+
1065
+ FEATURE STATE: `Kubernetes v1.35 [beta]` (enabled by default)
1066
+
1067
+ You can see the terminating pods only if the `DeploymentReplicaSetTerminatingReplicas` [feature gate](https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/) is enabled on the [API server](https://kubernetes.io/docs/reference/command-line-tools-reference/kube-apiserver/) and on the [kube-controller-manager](https://kubernetes.io/docs/reference/command-line-tools-reference/kube-controller-manager/)
1068
+
1069
+ Pods that become terminating due to deletion or scale down may take a long time to terminate, and may consume additional resources during that period. As a result, the total number of all pods can temporarily exceed `.spec.replicas`. Terminating pods can be tracked using the `.status.terminatingReplicas` field of the Deployment.
1070
+
1071
+ ### Revision History Limit
1072
+
1073
+ A Deployment's revision history is stored in the ReplicaSets it controls.
1074
+
1075
+ `.spec.revisionHistoryLimit` is an optional field that specifies the number of old ReplicaSets to retain to allow rollback. These old ReplicaSets consume resources in `etcd` and crowd the output of `kubectl get rs`. The configuration of each Deployment revision is stored in its ReplicaSets; therefore, once an old ReplicaSet is deleted, you lose the ability to rollback to that revision of Deployment. By default, 10 old ReplicaSets will be kept, however its ideal value depends on the frequency and stability of new Deployments.
1076
+
1077
+ More specifically, setting this field to zero means that all old ReplicaSets with 0 replicas will be cleaned up. In this case, a new Deployment rollout cannot be undone, since its revision history is cleaned up.
1078
+
1079
+ ### Paused
1080
+
1081
+ `.spec.paused` is an optional boolean field for pausing and resuming a Deployment. The only difference between a paused Deployment and one that is not paused, is that any changes into the PodTemplateSpec of the paused Deployment will not trigger new rollouts as long as it is paused. A Deployment is not paused by default when it is created.
1082
+
1083
+ ## What's next
1084
+
1085
+ - Learn more about [Pods](https://kubernetes.io/docs/concepts/workloads/pods/).
1086
+ - [Run a stateless application using a Deployment](https://kubernetes.io/docs/tasks/run-application/run-stateless-application-deployment/).
1087
+ - Read the [Deployment](https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/deployment-v1/) to understand the Deployment API.
1088
+ - Read about [PodDisruptionBudget](https://kubernetes.io/docs/concepts/workloads/pods/disruptions/) and how you can use it to manage application availability during disruptions.
1089
+ - Use kubectl to [create a Deployment](https://kubernetes.io/docs/tutorials/kubernetes-basics/deploy-app/deploy-intro/).
1090
+
1091
+
1092
+ Last modified March 15, 2026 at 3:21 PM PST: [fix: replace deprecated argument \`--cpu-percent\` with \`--cpu\` (af93a0a732)](https://github.com/kubernetes/website/commit/af93a0a732cf3057895c62e615a212a44aa6cec7)
data/k8s_docs/k8s_network_policies.md ADDED
@@ -0,0 +1,416 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ If you want to control traffic flow at the IP address or port level (OSI layer 3 or 4), NetworkPolicies allow you to specify rules for traffic flow within your cluster, and also between Pods and the outside world. Your cluster must use a network plugin that supports NetworkPolicy enforcement.
2
+
3
+ If you want to control traffic flow at the IP address or port level for TCP, UDP, and SCTP protocols, then you might consider using Kubernetes NetworkPolicies for particular applications in your cluster. NetworkPolicies are an application-centric construct which allow you to specify how a [pod](https://kubernetes.io/docs/concepts/workloads/pods/ "A Pod represents a set of running containers in your cluster.") is allowed to communicate with various network "entities" (we use the word "entity" here to avoid overloading the more common terms such as "endpoints" and "services", which have specific Kubernetes connotations) over the network. NetworkPolicies apply to a connection with a pod on one or both ends, and are not relevant to other connections.
4
+
5
+ The entities that a Pod can communicate with are identified through a combination of the following three identifiers:
6
+
7
+ 1. Other pods that are allowed (exception: a pod cannot block access to itself)
8
+ 2. Namespaces that are allowed
9
+ 3. IP blocks (exception: traffic to and from the node where a Pod is running is always allowed, regardless of the IP address of the Pod or the node)
10
+
11
+ When defining a pod- or namespace-based NetworkPolicy, you use a [selector](https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/ "Allows users to filter a list of resources based on labels.") to specify what traffic is allowed to and from the Pod(s) that match the selector.
12
+
13
+ Meanwhile, when IP-based NetworkPolicies are created, we define policies based on IP blocks (CIDR ranges).
14
+
15
+ ## Prerequisites
16
+
17
+ Network policies are implemented by the [network plugin](https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/network-plugins/). To use network policies, you must be using a networking solution which supports NetworkPolicy. Creating a NetworkPolicy resource without a controller that implements it will have no effect.
18
+
19
+ ## The two sorts of pod isolation
20
+
21
+ There are two sorts of isolation for a pod: isolation for egress, and isolation for ingress. They concern what connections may be established. "Isolation" here is not absolute, rather it means "some restrictions apply". The alternative, "non-isolated for $direction", means that no restrictions apply in the stated direction. The two sorts of isolation (or not) are declared independently, and are both relevant for a connection from one pod to another.
22
+
23
+ By default, a pod is non-isolated for egress; all outbound connections are allowed. A pod is isolated for egress if there is any NetworkPolicy that both selects the pod and has "Egress" in its `policyTypes`; we say that such a policy applies to the pod for egress. When a pod is isolated for egress, the only allowed connections from the pod are those allowed by the `egress` list of some NetworkPolicy that applies to the pod for egress. Reply traffic for those allowed connections will also be implicitly allowed. The effects of those `egress` lists combine additively.
24
+
25
+ By default, a pod is non-isolated for ingress; all inbound connections are allowed. A pod is isolated for ingress if there is any NetworkPolicy that both selects the pod and has "Ingress" in its `policyTypes`; we say that such a policy applies to the pod for ingress. When a pod is isolated for ingress, the only allowed connections into the pod are those from the pod's node and those allowed by the `ingress` list of some NetworkPolicy that applies to the pod for ingress. Reply traffic for those allowed connections will also be implicitly allowed. The effects of those `ingress` lists combine additively.
26
+
27
+ Network policies do not conflict; they are additive. If any policy or policies apply to a given pod for a given direction, the connections allowed in that direction from that pod is the union of what the applicable policies allow. Thus, order of evaluation does not affect the policy result.
28
+
29
+ For a connection from a source pod to a destination pod to be allowed, both the egress policy on the source pod and the ingress policy on the destination pod need to allow the connection. If either side does not allow the connection, it will not happen.
30
+
31
+ ## The NetworkPolicy resource
32
+
33
+ See the [NetworkPolicy](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.35/#networkpolicy-v1-networking-k8s-io) reference for a full definition of the resource.
34
+
35
+ An example NetworkPolicy might look like this:
36
+
37
+ ```yaml
38
+ apiVersion: networking.k8s.io/v1
39
+ kind: NetworkPolicy
40
+ metadata:
41
+ name: test-network-policy
42
+ namespace: default
43
+ spec:
44
+ podSelector:
45
+ matchLabels:
46
+ role: db
47
+ policyTypes:
48
+ - Ingress
49
+ - Egress
50
+ ingress:
51
+ - from:
52
+ - ipBlock:
53
+ cidr: 172.17.0.0/16
54
+ except:
55
+ - 172.17.1.0/24
56
+ - namespaceSelector:
57
+ matchLabels:
58
+ project: myproject
59
+ - podSelector:
60
+ matchLabels:
61
+ role: frontend
62
+ ports:
63
+ - protocol: TCP
64
+ port: 6379
65
+ egress:
66
+ - to:
67
+ - ipBlock:
68
+ cidr: 10.0.0.0/24
69
+ ports:
70
+ - protocol: TCP
71
+ port: 5978
72
+ ```
73
+
74
+ > [!info] Note:
75
+ > POSTing this to the API server for your cluster will have no effect unless your chosen networking solution supports network policy.
76
+
77
+ **Mandatory Fields**: As with all other Kubernetes config, a NetworkPolicy needs `apiVersion`, `kind`, and `metadata` fields. For general information about working with config files, see [Configure a Pod to Use a ConfigMap](https://kubernetes.io/docs/tasks/configure-pod-container/configure-pod-configmap/), and [Object Management](https://kubernetes.io/docs/concepts/overview/working-with-objects/object-management/).
78
+
79
+ **spec**: NetworkPolicy [spec](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#spec-and-status) has all the information needed to define a particular network policy in the given namespace.
80
+
81
+ **podSelector**: Each NetworkPolicy includes a `podSelector` which selects the grouping of pods to which the policy applies. The example policy selects pods with the label "role=db". An empty `podSelector` selects all pods in the namespace.
82
+
83
+ **policyTypes**: Each NetworkPolicy includes a `policyTypes` list which may include either `Ingress`, `Egress`, or both. The `policyTypes` field indicates whether or not the given policy applies to ingress traffic to selected pod, egress traffic from selected pods, or both. If no `policyTypes` are specified on a NetworkPolicy then by default `Ingress` will always be set and `Egress` will be set if the NetworkPolicy has any egress rules.
84
+
85
+ **ingress**: Each NetworkPolicy may include a list of allowed `ingress` rules. Each rule allows traffic which matches both the `from` and `ports` sections. The example policy contains a single rule, which matches traffic on a single port, from one of three sources, the first specified via an `ipBlock`, the second via a `namespaceSelector` and the third via a `podSelector`.
86
+
87
+ **egress**: Each NetworkPolicy may include a list of allowed `egress` rules. Each rule allows traffic which matches both the `to` and `ports` sections. The example policy contains a single rule, which matches traffic on a single port to any destination in `10.0.0.0/24`.
88
+
89
+ So, the example NetworkPolicy:
90
+
91
+ 1. isolates `role=db` pods in the `default` namespace for both ingress and egress traffic (if they weren't already isolated)
92
+ 2. (Ingress rules) allows connections to all pods in the `default` namespace with the label `role=db` on TCP port 6379 from:
93
+ - any pod in the `default` namespace with the label `role=frontend`
94
+ - any pod in a namespace with the label `project=myproject`
95
+ - IP addresses in the ranges `172.17.0.0` – `172.17.0.255` and `172.17.2.0` – `172.17.255.255` (ie, all of `172.17.0.0/16` except `172.17.1.0/24`)
96
+ 3. (Egress rules) allows connections from any pod in the `default` namespace with the label `role=db` to CIDR `10.0.0.0/24` on TCP port 5978
97
+
98
+ See the [Declare Network Policy](https://kubernetes.io/docs/tasks/administer-cluster/declare-network-policy/) walkthrough for further examples.
99
+
100
+ ## Behavior of to and from selectors
101
+
102
+ There are four kinds of selectors that can be specified in an `ingress` `from` section or `egress` `to` section:
103
+
104
+ **podSelector**: This selects particular Pods in the same namespace as the NetworkPolicy which should be allowed as ingress sources or egress destinations.
105
+
106
+ **namespaceSelector**: This selects particular namespaces for which all Pods should be allowed as ingress sources or egress destinations.
107
+
108
+ **namespaceSelector** *and* **podSelector**: A single `to` / `from` entry that specifies both `namespaceSelector` and `podSelector` selects particular Pods within particular namespaces. Be careful to use correct YAML syntax. For example:
109
+
110
+ ```yaml
111
+ ...
112
+ ingress:
113
+ - from:
114
+ - namespaceSelector:
115
+ matchLabels:
116
+ user: alice
117
+ podSelector:
118
+ matchLabels:
119
+ role: client
120
+ ...
121
+ ```
122
+
123
+ This policy contains a single `from` element allowing connections from Pods with the label `role=client` in namespaces with the label `user=alice`. But the following policy is different:
124
+
125
+ ```yaml
126
+ ...
127
+ ingress:
128
+ - from:
129
+ - namespaceSelector:
130
+ matchLabels:
131
+ user: alice
132
+ - podSelector:
133
+ matchLabels:
134
+ role: client
135
+ ...
136
+ ```
137
+
138
+ It contains two elements in the `from` array, and allows connections from Pods in the local Namespace with the label `role=client`, *or* from any Pod in any namespace with the label `user=alice`.
139
+
140
+ When in doubt, use `kubectl describe` to see how Kubernetes has interpreted the policy.
141
+
142
+ **ipBlock**: This selects particular IP CIDR ranges to allow as ingress sources or egress destinations. These should be cluster-external IPs, since Pod IPs are ephemeral and unpredictable.
143
+
144
+ Cluster ingress and egress mechanisms often require rewriting the source or destination IP of packets. In cases where this happens, it is not defined whether this happens before or after NetworkPolicy processing, and the behavior may be different for different combinations of network plugin, cloud provider, `Service` implementation, etc.
145
+
146
+ In the case of ingress, this means that in some cases you may be able to filter incoming packets based on the actual original source IP, while in other cases, the "source IP" that the NetworkPolicy acts on may be the IP of a `LoadBalancer` or of the Pod's node, etc.
147
+
148
+ For egress, this means that connections from pods to `Service` IPs that get rewritten to cluster-external IPs may or may not be subject to `ipBlock` -based policies.
149
+
150
+ ## Default policies
151
+
152
+ By default, if no policies exist in a namespace, then all ingress and egress traffic is allowed to and from pods in that namespace. The following examples let you change the default behavior in that namespace.
153
+
154
+ ### Default deny all ingress traffic
155
+
156
+ You can create a "default" ingress isolation policy for a namespace by creating a NetworkPolicy that selects all pods but does not allow any ingress traffic to those pods.
157
+
158
+ ```yaml
159
+ ---
160
+ apiVersion: networking.k8s.io/v1
161
+ kind: NetworkPolicy
162
+ metadata:
163
+ name: default-deny-ingress
164
+ spec:
165
+ podSelector: {}
166
+ policyTypes:
167
+ - Ingress
168
+ ```
169
+
170
+ This ensures that even pods that aren't selected by any other NetworkPolicy will still be isolated for ingress. This policy does not affect isolation for egress from any pod.
171
+
172
+ ### Allow all ingress traffic
173
+
174
+ If you want to allow all incoming connections to all pods in a namespace, you can create a policy that explicitly allows that.
175
+
176
+ ```yaml
177
+ ---
178
+ apiVersion: networking.k8s.io/v1
179
+ kind: NetworkPolicy
180
+ metadata:
181
+ name: allow-all-ingress
182
+ spec:
183
+ podSelector: {}
184
+ ingress:
185
+ - {}
186
+ policyTypes:
187
+ - Ingress
188
+ ```
189
+
190
+ With this policy in place, no additional policy or policies can cause any incoming connection to those pods to be denied. This policy has no effect on isolation for egress from any pod.
191
+
192
+ ### Default deny all egress traffic
193
+
194
+ You can create a "default" egress isolation policy for a namespace by creating a NetworkPolicy that selects all pods but does not allow any egress traffic from those pods.
195
+
196
+ ```yaml
197
+ ---
198
+ apiVersion: networking.k8s.io/v1
199
+ kind: NetworkPolicy
200
+ metadata:
201
+ name: default-deny-egress
202
+ spec:
203
+ podSelector: {}
204
+ policyTypes:
205
+ - Egress
206
+ ```
207
+
208
+ This ensures that even pods that aren't selected by any other NetworkPolicy will not be allowed egress traffic. This policy does not change the ingress isolation behavior of any pod.
209
+
210
+ > [!caution] Caution:
211
+ > A default deny-all egress policy also blocks DNS traffic. If your workloads need DNS resolution, you must add a separate NetworkPolicy that allows egress to your cluster's DNS service.
212
+
213
+ ### Allow all egress traffic
214
+
215
+ If you want to allow all connections from all pods in a namespace, you can create a policy that explicitly allows all outgoing connections from pods in that namespace.
216
+
217
+ ```yaml
218
+ ---
219
+ apiVersion: networking.k8s.io/v1
220
+ kind: NetworkPolicy
221
+ metadata:
222
+ name: allow-all-egress
223
+ spec:
224
+ podSelector: {}
225
+ egress:
226
+ - {}
227
+ policyTypes:
228
+ - Egress
229
+ ```
230
+
231
+ With this policy in place, no additional policy or policies can cause any outgoing connection from those pods to be denied. This policy has no effect on isolation for ingress to any pod.
232
+
233
+ ### Default deny all ingress and all egress traffic
234
+
235
+ You can create a "default" policy for a namespace which prevents all ingress AND egress traffic by creating the following NetworkPolicy in that namespace.
236
+
237
+ ```yaml
238
+ ---
239
+ apiVersion: networking.k8s.io/v1
240
+ kind: NetworkPolicy
241
+ metadata:
242
+ name: default-deny-all
243
+ spec:
244
+ podSelector: {}
245
+ policyTypes:
246
+ - Ingress
247
+ - Egress
248
+ ```
249
+
250
+ This ensures that even pods that aren't selected by any other NetworkPolicy will not be allowed ingress or egress traffic.
251
+
252
+ ## Network traffic filtering
253
+
254
+ NetworkPolicy is defined for [layer 4](https://en.wikipedia.org/wiki/OSI_model#Layer_4:_Transport_layer) connections (TCP, UDP, and optionally SCTP). For all the other protocols, the behaviour may vary across network plugins.
255
+
256
+ > [!info] Note:
257
+ > You must be using a [CNI](https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/network-plugins/ "Container network interface (CNI) plugins are a type of Network plugin that adheres to the appc/CNI specification.") plugin that supports SCTP protocol NetworkPolicies.
258
+
259
+ When a `deny all` network policy is defined, it is only guaranteed to deny TCP, UDP and SCTP connections. For other protocols, such as ARP or ICMP, the behaviour is undefined. The same applies to allow rules: when a specific pod is allowed as ingress source or egress destination, it is undefined what happens with (for example) ICMP packets. Protocols such as ICMP may be allowed by some network plugins and denied by others.
260
+
261
+ ## Targeting a range of ports
262
+
263
+ FEATURE STATE: `Kubernetes v1.25 [stable]`
264
+
265
+ When writing a NetworkPolicy, you can target a range of ports instead of a single port.
266
+
267
+ This is achievable with the usage of the `endPort` field, as the following example:
268
+
269
+ ```yaml
270
+ apiVersion: networking.k8s.io/v1
271
+ kind: NetworkPolicy
272
+ metadata:
273
+ name: multi-port-egress
274
+ namespace: default
275
+ spec:
276
+ podSelector:
277
+ matchLabels:
278
+ role: db
279
+ policyTypes:
280
+ - Egress
281
+ egress:
282
+ - to:
283
+ - ipBlock:
284
+ cidr: 10.0.0.0/24
285
+ ports:
286
+ - protocol: TCP
287
+ port: 32000
288
+ endPort: 32768
289
+ ```
290
+
291
+ The above rule allows any Pod with label `role=db` on the namespace `default` to communicate with any IP within the range `10.0.0.0/24` over TCP, provided that the target port is between the range 32000 and 32768.
292
+
293
+ The following restrictions apply when using this field:
294
+
295
+ - The `endPort` field must be equal to or greater than the `port` field.
296
+ - `endPort` can only be defined if `port` is also defined.
297
+ - Both ports must be numeric.
298
+
299
+ > [!info] Note:
300
+ > Your cluster must be using a [CNI](https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/network-plugins/ "Container network interface (CNI) plugins are a type of Network plugin that adheres to the appc/CNI specification.") plugin that supports the `endPort` field in NetworkPolicy specifications. If your [network plugin](https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/network-plugins/) does not support the `endPort` field and you specify a NetworkPolicy with that, the policy will be applied only for the single `port` field.
301
+
302
+ ## Targeting multiple namespaces by label
303
+
304
+ In this scenario, your `Egress` NetworkPolicy targets more than one namespace using their label names. For this to work, you need to label the target namespaces. For example:
305
+
306
+ ```shell
307
+ kubectl label namespace frontend namespace=frontend
308
+ kubectl label namespace backend namespace=backend
309
+ ```
310
+
311
+ Add the labels under `namespaceSelector` in your NetworkPolicy document. For example:
312
+
313
+ ```yaml
314
+ apiVersion: networking.k8s.io/v1
315
+ kind: NetworkPolicy
316
+ metadata:
317
+ name: egress-namespaces
318
+ spec:
319
+ podSelector:
320
+ matchLabels:
321
+ app: myapp
322
+ policyTypes:
323
+ - Egress
324
+ egress:
325
+ - to:
326
+ - namespaceSelector:
327
+ matchExpressions:
328
+ - key: namespace
329
+ operator: In
330
+ values: ["frontend", "backend"]
331
+ ```
332
+
333
+ > [!info] Note:
334
+ > It is not possible to directly specify the name of the namespaces in a NetworkPolicy. You must use a `namespaceSelector` with `matchLabels` or `matchExpressions` to select the namespaces based on their labels.
335
+
336
+ ## Targeting a Namespace by its name
337
+
338
+ The Kubernetes control plane sets an immutable label `kubernetes.io/metadata.name` on all namespaces, the value of the label is the namespace name.
339
+
340
+ While NetworkPolicy cannot target a namespace by its name with some object field, you can use the standardized label to target a specific namespace.
341
+
342
+ ## Pod lifecycle
343
+
344
+ > [!info] Note:
345
+ > The following applies to clusters with a conformant networking plugin and a conformant implementation of NetworkPolicy.
346
+
347
+ When a new NetworkPolicy object is created, it may take some time for a network plugin to handle the new object. If a pod that is affected by a NetworkPolicy is created before the network plugin has completed NetworkPolicy handling, that pod may be started unprotected, and isolation rules will be applied when the NetworkPolicy handling is completed.
348
+
349
+ Once the NetworkPolicy is handled by a network plugin,
350
+
351
+ 1. All newly created pods affected by a given NetworkPolicy will be isolated before they are started. Implementations of NetworkPolicy must ensure that filtering is effective throughout the Pod lifecycle, even from the very first instant that any container in that Pod is started. Because they are applied at Pod level, NetworkPolicies apply equally to init containers, sidecar containers, and regular containers.
352
+ 2. Allow rules will be applied eventually after the isolation rules (or may be applied at the same time). In the worst case, a newly created pod may have no network connectivity at all when it is first started, if isolation rules were already applied, but no allow rules were applied yet.
353
+
354
+ Every created NetworkPolicy will be handled by a network plugin eventually, but there is no way to tell from the Kubernetes API when exactly that happens.
355
+
356
+ Therefore, pods must be resilient against being started up with different network connectivity than expected. If you need to make sure the pod can reach certain destinations before being started, you can use an [init container](https://kubernetes.io/docs/concepts/workloads/pods/init-containers/) to wait for those destinations to be reachable before kubelet starts the app containers.
357
+
358
+ Every NetworkPolicy will be applied to all selected pods eventually. Because the network plugin may implement NetworkPolicy in a distributed manner, it is possible that pods may see a slightly inconsistent view of network policies when the pod is first created, or when pods or policies change. For example, a newly-created pod that is supposed to be able to reach both Pod A on Node 1 and Pod B on Node 2 may find that it can reach Pod A immediately, but cannot reach Pod B until a few seconds later.
359
+
360
+ ## NetworkPolicy and hostNetwork pods
361
+
362
+ NetworkPolicy behaviour for `hostNetwork` pods is undefined, but it should be limited to 2 possibilities:
363
+
364
+ - The network plugin can distinguish `hostNetwork` pod traffic from all other traffic (including being able to distinguish traffic from different `hostNetwork` pods on the same node), and will apply NetworkPolicy to `hostNetwork` pods just like it does to pod-network pods.
365
+ - The network plugin cannot properly distinguish `hostNetwork` pod traffic, and so it ignores `hostNetwork` pods when matching `podSelector` and `namespaceSelector`. Traffic to/from `hostNetwork` pods is treated the same as all other traffic to/from the node IP. (This is the most common implementation.)
366
+
367
+ This applies when
368
+
369
+ 1. a `hostNetwork` pod is selected by `spec.podSelector`.
370
+ ```yaml
371
+ ...
372
+ spec:
373
+ podSelector:
374
+ matchLabels:
375
+ role: client
376
+ ...
377
+ ```
378
+ 2. a `hostNetwork` pod is selected by a `podSelector` or `namespaceSelector` in an `ingress` or `egress` rule.
379
+ ```yaml
380
+ ...
381
+ ingress:
382
+ - from:
383
+ - podSelector:
384
+ matchLabels:
385
+ role: client
386
+ ...
387
+ ```
388
+
389
+ At the same time, since `hostNetwork` pods have the same IP addresses as the nodes they reside on, their connections will be treated as node connections. For example, you can allow traffic from a `hostNetwork` Pod using an `ipBlock` rule.
390
+
391
+ ## What you can't do with network policies (at least, not yet)
392
+
393
+ As of Kubernetes 1.35, the following functionality does not exist in the NetworkPolicy API, but you might be able to implement workarounds using Operating System components (such as SELinux, OpenVSwitch, IPTables, and so on) or Layer 7 technologies (Ingress controllers, Service Mesh implementations) or admission controllers. In case you are new to network security in Kubernetes, its worth noting that the following User Stories cannot (yet) be implemented using the NetworkPolicy API.
394
+
395
+ - Forcing internal cluster traffic to go through a common gateway (this might be best served with a service mesh or other proxy).
396
+ - Anything TLS related (use a service mesh or ingress controller for this).
397
+ - Node specific policies (you can use CIDR notation for these, but you cannot target nodes by their Kubernetes identities specifically).
398
+ - Targeting of services by name (you can, however, target pods or namespaces by their [labels](https://kubernetes.io/docs/concepts/overview/working-with-objects/labels "Tags objects with identifying attributes that are meaningful and relevant to users."), which is often a viable workaround).
399
+ - Creation or management of "Policy requests" that are fulfilled by a third party.
400
+ - Default policies which are applied to all namespaces or pods (there are some third party Kubernetes distributions and projects which can do this).
401
+ - Advanced policy querying and reachability tooling.
402
+ - The ability to log network security events (for example connections that are blocked or accepted).
403
+ - The ability to explicitly deny policies (currently the model for NetworkPolicies are deny by default, with only the ability to add allow rules).
404
+ - The ability to prevent loopback or incoming host traffic (Pods cannot currently block localhost access, nor do they have the ability to block access from their resident node).
405
+
406
+ ## NetworkPolicy's impact on existing connections
407
+
408
+ When the set of NetworkPolicies that applies to an existing connection changes - this could happen either due to a change in NetworkPolicies or if the relevant labels of the namespaces/pods selected by the policy (both subject and peers) are changed in the middle of an existing connection - it is implementation defined as to whether the change will take effect for that existing connection or not. Example: A policy is created that leads to denying a previously allowed connection, the underlying network plugin implementation is responsible for defining if that new policy will close the existing connections or not. It is recommended not to modify policies/pods/namespaces in ways that might affect existing connections.
409
+
410
+ ## What's next
411
+
412
+ - See the [Declare Network Policy](https://kubernetes.io/docs/tasks/administer-cluster/declare-network-policy/) walkthrough for further examples.
413
+ - See more [recipes](https://github.com/ahmetb/kubernetes-network-policy-recipes) for common scenarios enabled by the NetworkPolicy resource.
414
+
415
+
416
+ Last modified March 28, 2026 at 12:37 PM PST: [docs: add caution about DNS being blocked by deny-all egress (0a474b2b1a)](https://github.com/kubernetes/website/commit/0a474b2b1a8d5ac94d09fd5f4ee109a61e6ff511)
data/k8s_docs/k8s_node_pressure_eviction.md ADDED
@@ -0,0 +1,339 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Node-pressure eviction is the process by which the [kubelet](https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet "An agent that runs on each node in the cluster. It makes sure that containers are running in a pod.") proactively terminates pods to reclaim [resource](https://kubernetes.io/docs/reference/glossary/?all=true#term-infrastructure-resource "A defined amount of infrastructure available for consumption (CPU, memory, etc).") on nodes.
2
+
3
+ The [kubelet](https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet "An agent that runs on each node in the cluster. It makes sure that containers are running in a pod.") monitors resources like memory, disk space, and filesystem inodes on your cluster's nodes. When one or more of these resources reach specific consumption levels, the kubelet can proactively fail one or more pods on the node to reclaim resources and prevent starvation.
4
+
5
+ During a node-pressure eviction, the kubelet sets the [phase](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase) for the selected pods to `Failed`, and terminates the Pod.
6
+
7
+ Node-pressure eviction is not the same as [API-initiated eviction](https://kubernetes.io/docs/concepts/scheduling-eviction/api-eviction/).
8
+
9
+ The kubelet does not respect your configured [PodDisruptionBudget](https://kubernetes.io/docs/reference/glossary/?all=true#term-pod-disruption-budget "An object that limits the number of Pods of a replicated application that are down simultaneously from voluntary disruptions.") or the pod's `terminationGracePeriodSeconds`. If you use [soft eviction thresholds](#soft-eviction-thresholds), the kubelet respects your configured `eviction-max-pod-grace-period`. If you use [hard eviction thresholds](#hard-eviction-thresholds), the kubelet uses a `0s` grace period (immediate shutdown) for termination.
10
+
11
+ ## Self healing behavior
12
+
13
+ The kubelet attempts to [reclaim node-level resources](#reclaim-node-resources) before it terminates end-user pods. For example, it removes unused container images when disk resources are starved.
14
+
15
+ If the pods are managed by a [workload](https://kubernetes.io/docs/concepts/workloads/ "A workload is an application running on Kubernetes.") management object (such as [StatefulSet](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/ "A StatefulSet manages deployment and scaling of a set of Pods, with durable storage and persistent identifiers for each Pod.") or [Deployment](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/ "Manages a replicated application on your cluster.")) that replaces failed pods, the control plane (`kube-controller-manager`) creates new pods in place of the evicted pods.
16
+
17
+ ### Self healing for static pods
18
+
19
+ If you are running a [static pod](https://kubernetes.io/docs/concepts/workloads/pods/#static-pods) on a node that is under resource pressure, the kubelet may evict that static Pod. The kubelet then tries to create a replacement, because static Pods always represent an intent to run a Pod on that node.
20
+
21
+ The kubelet takes the *priority* of the static pod into account when creating a replacement. If the static pod manifest specifies a low priority, and there are higher-priority Pods defined within the cluster's control plane, and the node is under resource pressure, the kubelet may not be able to make room for that static pod. The kubelet continues to attempt to run all static pods even when there is resource pressure on a node.
22
+
23
+ ## Eviction signals and thresholds
24
+
25
+ The kubelet uses various parameters to make eviction decisions, like the following:
26
+
27
+ - Eviction signals
28
+ - Eviction thresholds
29
+ - Monitoring intervals
30
+
31
+ ### Eviction signals
32
+
33
+ Eviction signals are the current state of a particular resource at a specific point in time. The kubelet uses eviction signals to make eviction decisions by comparing the signals to eviction thresholds, which are the minimum amount of the resource that should be available on the node.
34
+
35
+ The kubelet uses the following eviction signals:
36
+
37
+ | Eviction Signal | Description | Linux Only |
38
+ | --- | --- | --- |
39
+ | `memory.available` | `memory.available`:= `node.status.capacity[memory]` - `node.stats.memory.workingSet` | |
40
+ | `nodefs.available` | `nodefs.available`:= `node.stats.fs.available` | |
41
+ | `nodefs.inodesFree` | `nodefs.inodesFree`:= `node.stats.fs.inodesFree` | • |
42
+ | `imagefs.available` | `imagefs.available`:= `node.stats.runtime.imagefs.available` | |
43
+ | `imagefs.inodesFree` | `imagefs.inodesFree`:= `node.stats.runtime.imagefs.inodesFree` | • |
44
+ | `containerfs.available` | `containerfs.available`:= `node.stats.runtime.containerfs.available` | |
45
+ | `containerfs.inodesFree` | `containerfs.inodesFree`:= `node.stats.runtime.containerfs.inodesFree` | • |
46
+ | `pid.available` | `pid.available`:= `node.stats.rlimit.maxpid` - `node.stats.rlimit.curproc` | • |
47
+
48
+ In this table, the **Description** column shows how kubelet gets the value of the signal. Each signal supports either a percentage or a literal value. The kubelet calculates the percentage value relative to the total capacity associated with the signal.
49
+
50
+ #### Memory signals
51
+
52
+ On Linux nodes, the value for `memory.available` is derived from the cgroupfs instead of tools like `free -m`. This is important because `free -m` does not work in a container, and if users use the [node allocatable](https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/#node-allocatable) feature, out of resource decisions are made local to the end user Pod part of the cgroup hierarchy as well as the root node. This [script](https://kubernetes.io/examples/admin/resource/memory-available.sh) or [cgroupv2 script](https://kubernetes.io/examples/admin/resource/memory-available-cgroupv2.sh) reproduces the same set of steps that the kubelet performs to calculate `memory.available`. The kubelet excludes inactive\_file (the number of bytes of file-backed memory on the inactive LRU list) from its calculation, as it assumes that memory is reclaimable under pressure.
53
+
54
+ On Windows nodes, the value for `memory.available` is derived from the node's global memory commit levels (queried through the [`GetPerformanceInfo()`](https://learn.microsoft.com/windows/win32/api/psapi/nf-psapi-getperformanceinfo) system call) by subtracting the node's global [`CommitTotal`](https://learn.microsoft.com/windows/win32/api/psapi/ns-psapi-performance_information) from the node's [`CommitLimit`](https://learn.microsoft.com/windows/win32/api/psapi/ns-psapi-performance_information). Please note that `CommitLimit` can change if the node's page-file size changes!
55
+
56
+ #### Filesystem signals
57
+
58
+ The kubelet recognizes three specific filesystem identifiers that can be used with eviction signals (`<identifier>.inodesFree` or `<identifier>.available`):
59
+
60
+ 1. `nodefs`: The node's main filesystem, used for local disk volumes, emptyDir volumes not backed by memory, log storage, ephemeral storage, and more. For example, `nodefs` contains `/var/lib/kubelet`.
61
+ 2. `imagefs`: An optional filesystem that container runtimes can use to store container images (which are the read-only layers) and container writable layers.
62
+ 3. `containerfs`: An optional filesystem that container runtime can use to store the writeable layers. Similar to the main filesystem (see `nodefs`), it's used to store local disk volumes, emptyDir volumes not backed by memory, log storage, and ephemeral storage, except for the container images. When `containerfs` is used, the `imagefs` filesystem can be split to only store images (read-only layers) and nothing else.
63
+
64
+ > [!info] Note:
65
+ > FEATURE STATE: `Kubernetes v1.31 [beta]` (enabled by default)
66
+ >
67
+ > The *split image filesystem* feature, which enables support for the `containerfs` filesystem, adds several new eviction signals, thresholds and metrics. To use `containerfs`, the Kubernetes release v1.35 requires the `KubeletSeparateDiskGC` [feature gate](https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/) to be enabled. Currently, only CRI-O (v1.29 or higher) offers the `containerfs` filesystem support.
68
+
69
+ As such, kubelet generally allows three options for container filesystems:
70
+
71
+ - Everything is on the single `nodefs`, also referred to as "rootfs" or simply "root", and there is no dedicated image filesystem.
72
+ - Container storage (see `nodefs`) is on a dedicated disk, and `imagefs` (writable and read-only layers) is separate from the root filesystem. This is often referred to as "split disk" (or "separate disk") filesystem.
73
+ - Container filesystem `containerfs` (same as `nodefs` plus writable layers) is on root and the container images (read-only layers) are stored on separate `imagefs`. This is often referred to as "split image" filesystem.
74
+
75
+ The kubelet will attempt to auto-discover these filesystems with their current configuration directly from the underlying container runtime and will ignore other local node filesystems.
76
+
77
+ The kubelet does not support other container filesystems or storage configurations, and it does not currently support multiple filesystems for images and containers.
78
+
79
+ ### Deprecated kubelet garbage collection features
80
+
81
+ Some kubelet garbage collection features are deprecated in favor of eviction:
82
+
83
+ | Existing Flag | Rationale |
84
+ | --- | --- |
85
+ | `--maximum-dead-containers` | deprecated once old logs are stored outside of container's context |
86
+ | `--maximum-dead-containers-per-container` | deprecated once old logs are stored outside of container's context |
87
+ | `--minimum-container-ttl-duration` | deprecated once old logs are stored outside of container's context |
88
+
89
+ ### Eviction thresholds
90
+
91
+ You can specify custom eviction thresholds for the kubelet to use when it makes eviction decisions. You can configure [soft](#soft-eviction-thresholds) and [hard](#hard-eviction-thresholds) eviction thresholds.
92
+
93
+ Eviction thresholds have the form `[eviction-signal][operator][quantity]`, where:
94
+
95
+ - `eviction-signal` is the [eviction signal](#eviction-signals) to use.
96
+ - `operator` is the [relational operator](https://en.wikipedia.org/wiki/Relational_operator#Standard_relational_operators) you want, such as `<` (less than).
97
+ - `quantity` is the eviction threshold amount, such as `1Gi`. The value of `quantity` must match the quantity representation used by Kubernetes. You can use either literal values or percentages (`%`).
98
+
99
+ For example, if a node has 10GiB of total memory and you want trigger eviction if the available memory falls below 1GiB, you can define the eviction threshold as either `memory.available<10%` or `memory.available<1Gi` (you cannot use both).
100
+
101
+ #### Soft eviction thresholds
102
+
103
+ A soft eviction threshold pairs an eviction threshold with a required administrator-specified grace period. The kubelet does not evict pods until the grace period is exceeded. The kubelet returns an error on startup if you do not specify a grace period.
104
+
105
+ You can specify both a soft eviction threshold grace period and a maximum allowed pod termination grace period for kubelet to use during evictions. If you specify a maximum allowed grace period and the soft eviction threshold is met, the kubelet uses the lesser of the two grace periods. If you do not specify a maximum allowed grace period, the kubelet kills evicted pods immediately without graceful termination.
106
+
107
+ You can use the following flags to configure soft eviction thresholds:
108
+
109
+ - `eviction-soft`: A set of eviction thresholds like `memory.available<1.5Gi` that can trigger pod eviction if held over the specified grace period.
110
+ - `eviction-soft-grace-period`: A set of eviction grace periods like `memory.available=1m30s` that define how long a soft eviction threshold must hold before triggering a Pod eviction.
111
+ - `eviction-max-pod-grace-period`: The maximum allowed grace period (in seconds) to use when terminating pods in response to a soft eviction threshold being met.
112
+
113
+ #### Hard eviction thresholds
114
+
115
+ A hard eviction threshold has no grace period. When a hard eviction threshold is met, the kubelet kills pods immediately without graceful termination to reclaim the starved resource.
116
+
117
+ You can use the `eviction-hard` flag to configure a set of hard eviction thresholds like `memory.available<1Gi`.
118
+
119
+ The kubelet has the following default hard eviction thresholds:
120
+
121
+ - `memory.available<100Mi` (Linux nodes)
122
+ - `memory.available<500Mi` (Windows nodes)
123
+ - `nodefs.available<10%`
124
+ - `imagefs.available<15%`
125
+ - `nodefs.inodesFree<5%` (Linux nodes)
126
+ - `imagefs.inodesFree<5%` (Linux nodes)
127
+
128
+ These default values of hard eviction thresholds will only be set if none of the parameters is changed. If you change the value of any parameter, then the values of other parameters will not be inherited as the default values and will be set to zero. In order to provide custom values, you should provide all the thresholds respectively. You can also set the kubelet config MergeDefaultEvictionSettings to true in the kubelet configuration file. If set to true and any parameter is changed, then the other parameters will inherit their default values instead of 0.
129
+
130
+ The `containerfs.available` and `containerfs.inodesFree` (Linux nodes) default eviction thresholds will be set as follows:
131
+
132
+ - If a single filesystem is used for everything, then `containerfs` thresholds are set the same as `nodefs`.
133
+ - If separate filesystems are configured for both images and containers, then `containerfs` thresholds are set the same as `imagefs`.
134
+
135
+ Setting custom overrides for thresholds related to `containersfs` is currently not supported, and a warning will be issued if an attempt to do so is made; any provided custom values will, as such, be ignored.
136
+
137
+ ## Eviction monitoring interval
138
+
139
+ The kubelet evaluates eviction thresholds based on its configured `housekeeping-interval`, which defaults to `10s`.
140
+
141
+ ## Node conditions
142
+
143
+ The kubelet reports [node conditions](https://kubernetes.io/docs/concepts/architecture/nodes/#condition) to reflect that the node is under pressure because hard or soft eviction threshold is met, independent of configured grace periods.
144
+
145
+ The kubelet maps eviction signals to node conditions as follows:
146
+
147
+ | Node Condition | Eviction Signal | Description |
148
+ | --- | --- | --- |
149
+ | `MemoryPressure` | `memory.available` | Available memory on the node has satisfied an eviction threshold |
150
+ | `DiskPressure` | `nodefs.available`, `nodefs.inodesFree`, `imagefs.available`, `imagefs.inodesFree`, `containerfs.available`, or `containerfs.inodesFree` | Available disk space and inodes on either the node's root filesystem, image filesystem, or container filesystem has satisfied an eviction threshold |
151
+ | `PIDPressure` | `pid.available` | Available processes identifiers on the (Linux) node has fallen below an eviction threshold |
152
+
153
+ The control plane also [maps](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/#taint-nodes-by-condition) these node conditions to taints.
154
+
155
+ The kubelet updates the node conditions based on the configured `--node-status-update-frequency`, which defaults to `10s`.
156
+
157
+ ### Node condition oscillation
158
+
159
+ In some cases, nodes oscillate above and below soft eviction thresholds without holding for the defined grace periods. This causes the reported node condition to constantly switch between `true` and `false`, leading to bad eviction decisions.
160
+
161
+ To protect against oscillation, you can use the `eviction-pressure-transition-period` flag, which controls how long the kubelet must wait before transitioning a node condition to a different state. The transition period has a default value of `5m`.
162
+
163
+ ### Reclaiming node level resources
164
+
165
+ The kubelet tries to reclaim node-level resources before it evicts end-user pods.
166
+
167
+ When a `DiskPressure` node condition is reported, the kubelet reclaims node-level resources based on the filesystems on the node.
168
+
169
+ #### Without imagefs or containerfs
170
+
171
+ If the node only has a `nodefs` filesystem that meets eviction thresholds, the kubelet frees up disk space in the following order:
172
+
173
+ 1. Garbage collect dead pods and containers.
174
+ 2. Delete unused images.
175
+
176
+ #### With imagefs
177
+
178
+ If the node has a dedicated `imagefs` filesystem for container runtimes to use, the kubelet does the following:
179
+
180
+ - If the `nodefs` filesystem meets the eviction thresholds, the kubelet garbage collects dead pods and containers.
181
+ - If the `imagefs` filesystem meets the eviction thresholds, the kubelet deletes all unused images.
182
+
183
+ #### With imagefs and containerfs
184
+
185
+ If the node has a dedicated `containerfs` alongside the `imagefs` filesystem configured for the container runtimes to use, then kubelet will attempt to reclaim resources as follows:
186
+
187
+ - If the `containerfs` filesystem meets the eviction thresholds, the kubelet garbage collects dead pods and containers.
188
+ - If the `imagefs` filesystem meets the eviction thresholds, the kubelet deletes all unused images.
189
+
190
+ ### Pod selection for kubelet eviction
191
+
192
+ If the kubelet's attempts to reclaim node-level resources don't bring the eviction signal below the threshold, the kubelet begins to evict end-user pods.
193
+
194
+ The kubelet uses the following parameters to determine the pod eviction order:
195
+
196
+ 1. Whether the pod's resource usage exceeds requests
197
+ 2. [Pod Priority](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/)
198
+ 3. The pod's resource usage relative to requests
199
+
200
+ As a result, kubelet ranks and evicts pods in the following order:
201
+
202
+ 1. `BestEffort` or `Burstable` pods where the usage exceeds requests. These pods are evicted based on their Priority and then by how much their usage level exceeds the request.
203
+ 2. `Guaranteed` pods and `Burstable` pods where the usage is less than requests are evicted last, based on their Priority.
204
+
205
+ > [!info] Note:
206
+ > The kubelet does not use the pod's [QoS class](https://kubernetes.io/docs/concepts/workloads/pods/pod-qos/) to determine the eviction order. You can use the QoS class to estimate the most likely pod eviction order when reclaiming resources like memory. QoS classification does not apply to EphemeralStorage requests, so the above scenario will not apply if the node is, for example, under `DiskPressure`.
207
+
208
+ `Guaranteed` pods are guaranteed only when requests and limits are specified for all the containers and they are equal. These pods will never be evicted because of another pod's resource consumption. If a system daemon (such as `kubelet` and `journald`) is consuming more resources than were reserved via `system-reserved` or `kube-reserved` allocations, and the node only has `Guaranteed` or `Burstable` pods using less resources than requests left on it, then the kubelet must choose to evict one of these pods to preserve node stability and to limit the impact of resource starvation on other pods. In this case, it will choose to evict pods of lowest Priority first.
209
+
210
+ If you are running a [static pod](https://kubernetes.io/docs/concepts/workloads/pods/#static-pods) and want to avoid having it evicted under resource pressure, set the `priority` field for that Pod directly. Static pods do not support the `priorityClassName` field.
211
+
212
+ When the kubelet evicts pods in response to inode or process ID starvation, it uses the Pods' relative priority to determine the eviction order, because inodes and PIDs have no requests.
213
+
214
+ The kubelet sorts pods differently based on whether the node has a dedicated `imagefs` or `containerfs` filesystem:
215
+
216
+ #### Without imagefs or containerfs (nodefs and imagefs use the same filesystem)
217
+
218
+ - If `nodefs` triggers evictions, the kubelet sorts pods based on their total disk usage (`local volumes + logs and a writable layer of all containers`).
219
+
220
+ #### With imagefs (nodefs and imagefs filesystems are separate)
221
+
222
+ - If `nodefs` triggers evictions, the kubelet sorts pods based on `nodefs` usage (`local volumes + logs of all containers`).
223
+ - If `imagefs` triggers evictions, the kubelet sorts pods based on the writable layer usage of all containers.
224
+
225
+ #### With imagesfs and containerfs (imagefs and containerfs have been split)
226
+
227
+ - If `containerfs` triggers evictions, the kubelet sorts pods based on `containerfs` usage (`local volumes + logs and a writable layer of all containers`).
228
+ - If `imagefs` triggers evictions, the kubelet sorts pods based on the `storage of images` rank, which represents the disk usage of a given image.
229
+
230
+ ### Minimum eviction reclaim
231
+
232
+ > [!info] Note:
233
+ > As of Kubernetes v1.35, you cannot set a custom value for the `containerfs.available` metric. The configuration for this specific metric will be set automatically to reflect values set for either the `nodefs` or `imagefs`, depending on the configuration.
234
+
235
+ In some cases, pod eviction only reclaims a small amount of the starved resource. This can lead to the kubelet repeatedly hitting the configured eviction thresholds and triggering multiple evictions.
236
+
237
+ You can use the `--eviction-minimum-reclaim` flag or a [kubelet config file](https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/) to configure a minimum reclaim amount for each resource. When the kubelet notices that a resource is starved, it continues to reclaim that resource until it reclaims the quantity you specify.
238
+
239
+ For example, the following configuration sets minimum reclaim amounts:
240
+
241
+ ```yaml
242
+ apiVersion: kubelet.config.k8s.io/v1beta1
243
+ kind: KubeletConfiguration
244
+ evictionHard:
245
+ memory.available: "500Mi"
246
+ nodefs.available: "1Gi"
247
+ imagefs.available: "100Gi"
248
+ evictionMinimumReclaim:
249
+ memory.available: "0Mi"
250
+ nodefs.available: "500Mi"
251
+ imagefs.available: "2Gi"
252
+ ```
253
+
254
+ In this example, if the `nodefs.available` signal meets the eviction threshold, the kubelet reclaims the resource until the signal reaches the threshold of 1GiB, and then continues to reclaim the minimum amount of 500MiB, until the available nodefs storage value reaches 1.5GiB.
255
+
256
+ Similarly, the kubelet tries to reclaim the `imagefs` resource until the `imagefs.available` value reaches `102Gi`, representing 102 GiB of available container image storage. If the amount of storage that the kubelet could reclaim is less than 2GiB, the kubelet doesn't reclaim anything.
257
+
258
+ The default `eviction-minimum-reclaim` is `0` for all resources.
259
+
260
+ ## Node out of memory behavior
261
+
262
+ If the node experiences an *out of memory* (OOM) event prior to the kubelet being able to reclaim memory, the node depends on the [oom\_killer](https://lwn.net/Articles/391222/) to respond.
263
+
264
+ The kubelet sets an `oom_score_adj` value for each container based on the QoS for the pod.
265
+
266
+ | Quality of Service | `oom_score_adj` |
267
+ | --- | --- |
268
+ | `Guaranteed` | \-997 |
269
+ | `BestEffort` | 1000 |
270
+ | `Burstable` | *min(max(2, 1000 - (1000 × memoryRequestBytes) / machineMemoryCapacityBytes), 999)* |
271
+
272
+ > [!info] Note:
273
+ > The kubelet also sets an `oom_score_adj` value of `-997` for any containers in Pods that have `system-node-critical` [Priority](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/#pod-priority "Pod Priority indicates the importance of a Pod relative to other Pods.").
274
+
275
+ If the kubelet can't reclaim memory before a node experiences OOM, the `oom_killer` calculates an `oom_score` based on the percentage of memory it's using on the node, and then adds the `oom_score_adj` to get an effective `oom_score` for each container. It then kills the container with the highest score.
276
+
277
+ This means that containers in low QoS pods that consume a large amount of memory relative to their scheduling requests are killed first.
278
+
279
+ Unlike pod eviction, if a container is OOM killed, the kubelet can restart it based on its `restartPolicy`.
280
+
281
+ ## Good practices
282
+
283
+ The following sections describe good practice for eviction configuration.
284
+
285
+ ### Schedulable resources and eviction policies
286
+
287
+ When you configure the kubelet with an eviction policy, you should make sure that the scheduler will not schedule pods if they will trigger eviction because they immediately induce memory pressure.
288
+
289
+ Consider the following scenario:
290
+
291
+ - Node memory capacity: 10GiB
292
+ - Operator wants to reserve 10% of memory capacity for system daemons (kernel, `kubelet`, etc.)
293
+ - Operator wants to evict Pods at 95% memory utilization to reduce incidence of system OOM.
294
+
295
+ For this to work, the kubelet is launched as follows:
296
+
297
+ ```none
298
+ --eviction-hard=memory.available<500Mi
299
+ --system-reserved=memory=1.5Gi
300
+ ```
301
+
302
+ In this configuration, the `--system-reserved` flag reserves 1.5GiB of memory for the system, which is `10% of the total memory + the eviction threshold amount`.
303
+
304
+ The node can reach the eviction threshold if a pod is using more than its request, or if the system is using more than 1GiB of memory, which makes the `memory.available` signal fall below 500MiB and triggers the threshold.
305
+
306
+ ### DaemonSets and node-pressure eviction
307
+
308
+ Pod priority is a major factor in making eviction decisions. If you do not want the kubelet to evict pods that belong to a DaemonSet, give those pods a high enough priority by specifying a suitable `priorityClassName` in the pod spec. You can also use a lower priority, or the default, to only allow pods from that DaemonSet to run when there are enough resources.
309
+
310
+ ## Known issues
311
+
312
+ The following sections describe known issues related to out of resource handling.
313
+
314
+ ### kubelet may not observe memory pressure right away
315
+
316
+ By default, the kubelet polls cAdvisor to collect memory usage stats at a regular interval. If memory usage increases within that window rapidly, the kubelet may not observe `MemoryPressure` fast enough, and the OOM killer will still be invoked.
317
+
318
+ You can use the `--kernel-memcg-notification` flag to enable the `memcg` notification API on the kubelet to get notified immediately when a threshold is crossed.
319
+
320
+ If you are not trying to achieve extreme utilization, but a sensible measure of overcommit, a viable workaround for this issue is to use the `--kube-reserved` and `--system-reserved` flags to allocate memory for the system.
321
+
322
+ ### active\_file memory is not considered as available memory
323
+
324
+ On Linux, the kernel tracks the number of bytes of file-backed memory on active least recently used (LRU) list as the `active_file` statistic. The kubelet treats `active_file` memory areas as not reclaimable. For workloads that make intensive use of block-backed local storage, including ephemeral local storage, kernel-level caches of file and block data means that many recently accessed cache pages are likely to be counted as `active_file`. If enough of these kernel block buffers are on the active LRU list, the kubelet is liable to observe this as high resource use and taint the node as experiencing memory pressure - triggering pod eviction.
325
+
326
+ For more details, see [https://github.com/kubernetes/kubernetes/issues/43916](https://github.com/kubernetes/kubernetes/issues/43916)
327
+
328
+ You can work around that behavior by setting the memory limit and memory request the same for containers likely to perform intensive I/O activity. You will need to estimate or measure an optimal memory limit value for that container.
329
+
330
+ ## What's next
331
+
332
+ - Learn about [API-initiated Eviction](https://kubernetes.io/docs/concepts/scheduling-eviction/api-eviction/)
333
+ - Learn about [Pod Priority and Preemption](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/)
334
+ - Learn about [PodDisruptionBudgets](https://kubernetes.io/docs/tasks/run-application/configure-pdb/)
335
+ - Learn about [Quality of Service](https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/) (QoS)
336
+ - Check out the [Eviction API](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.35/#create-eviction-pod-v1-core)
337
+
338
+
339
+ Last modified September 19, 2025 at 9:38 PM PST: [fix: typos (a5d40c68e0)](https://github.com/kubernetes/website/commit/a5d40c68e0dda7c44cff5c6331747b502eede79a)
data/k8s_docs/k8s_pod_security_admission.md ADDED
@@ -0,0 +1,93 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ An overview of the Pod Security Admission Controller, which can enforce the Pod Security Standards.
2
+
3
+ FEATURE STATE: `Kubernetes v1.25 [stable]`
4
+
5
+ The Kubernetes [Pod Security Standards](https://kubernetes.io/docs/concepts/security/pod-security-standards/) define different isolation levels for Pods. These standards let you define how you want to restrict the behavior of pods in a clear, consistent fashion.
6
+
7
+ Kubernetes offers a built-in *Pod Security* [admission controller](https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/ "A piece of code that intercepts requests to the Kubernetes API server prior to persistence of the object.") to enforce the Pod Security Standards. Pod security restrictions are applied at the [namespace](https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces "An abstraction used by Kubernetes to support isolation of groups of resources within a single cluster.") level when pods are created.
8
+
9
+ ### Built-in Pod Security admission enforcement
10
+
11
+ This page is part of the documentation for Kubernetes v1.35. If you are running a different version of Kubernetes, consult the documentation for that release.
12
+
13
+ ## Pod Security levels
14
+
15
+ Pod Security admission places requirements on a Pod's [Security Context](https://kubernetes.io/docs/tasks/configure-pod-container/security-context/) and other related fields according to the three levels defined by the [Pod Security Standards](https://kubernetes.io/docs/concepts/security/pod-security-standards/): `privileged`, `baseline`, and `restricted`. Refer to the [Pod Security Standards](https://kubernetes.io/docs/concepts/security/pod-security-standards/) page for an in-depth look at those requirements.
16
+
17
+ ## Pod Security Admission labels for namespaces
18
+
19
+ Once the feature is enabled or the webhook is installed, you can configure namespaces to define the admission control mode you want to use for pod security in each namespace. Kubernetes defines a set of [labels](https://kubernetes.io/docs/concepts/overview/working-with-objects/labels "Tags objects with identifying attributes that are meaningful and relevant to users.") that you can set to define which of the predefined Pod Security Standard levels you want to use for a namespace. The label you select defines what action the [control plane](https://kubernetes.io/docs/reference/glossary/?all=true#term-control-plane "The container orchestration layer that exposes the API and interfaces to define, deploy, and manage the lifecycle of containers.") takes if a potential violation is detected:
20
+
21
+ | Mode | Description |
22
+ | --- | --- |
23
+ | **enforce** | Policy violations will cause the pod to be rejected. |
24
+ | **audit** | Policy violations will trigger the addition of an audit annotation to the event recorded in the [audit log](https://kubernetes.io/docs/tasks/debug/debug-cluster/audit/), but are otherwise allowed. |
25
+ | **warn** | Policy violations will trigger a user-facing warning, but are otherwise allowed. |
26
+
27
+ A namespace can configure any or all modes, or even set a different level for different modes.
28
+
29
+ For each mode, there are two labels that determine the policy used:
30
+
31
+ ```yaml
32
+ # The per-mode level label indicates which policy level to apply for the mode.
33
+ #
34
+ # MODE must be one of \`enforce\`, \`audit\`, or \`warn\`.
35
+ # LEVEL must be one of \`privileged\`, \`baseline\`, or \`restricted\`.
36
+ pod-security.kubernetes.io/<MODE>: <LEVEL>
37
+
38
+ # Optional: per-mode version label that can be used to pin the policy to the
39
+ # version that shipped with a given Kubernetes minor version (for example v1.35).
40
+ #
41
+ # MODE must be one of \`enforce\`, \`audit\`, or \`warn\`.
42
+ # VERSION must be a valid Kubernetes minor version, or \`latest\`.
43
+ pod-security.kubernetes.io/<MODE>-version: <VERSION>
44
+ ```
45
+
46
+ Check out [Enforce Pod Security Standards with Namespace Labels](https://kubernetes.io/docs/tasks/configure-pod-container/enforce-standards-namespace-labels/) to see example usage.
47
+
48
+ ## Workload resources and Pod templates
49
+
50
+ Pods are often created indirectly, by creating a [workload object](https://kubernetes.io/docs/concepts/workloads/controllers/) such as a [Deployment](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/ "Manages a replicated application on your cluster.") or [Job](https://kubernetes.io/docs/concepts/workloads/controllers/job/ "A finite or batch task that runs to completion."). The workload object defines a *Pod template* and a [controller](https://kubernetes.io/docs/concepts/architecture/controller/ "A control loop that watches the shared state of the cluster through the apiserver and makes changes attempting to move the current state towards the desired state.") for the workload resource creates Pods based on that template. To help catch violations early, both the audit and warning modes are applied to the workload resources. However, enforce mode is **not** applied to workload resources, only to the resulting pod objects.
51
+
52
+ ## Exemptions
53
+
54
+ You can define *exemptions* from pod security enforcement in order to allow the creation of pods that would have otherwise been prohibited due to the policy associated with a given namespace. Exemptions can be statically configured in the [Admission Controller configuration](https://kubernetes.io/docs/tasks/configure-pod-container/enforce-standards-admission-controller/#configure-the-admission-controller).
55
+
56
+ Exemptions must be explicitly enumerated. Requests meeting exemption criteria are *ignored* by the Admission Controller (all `enforce`, `audit` and `warn` behaviors are skipped). Exemption dimensions include:
57
+
58
+ - **Usernames:** requests from users with an exempt authenticated (or impersonated) username are ignored.
59
+ - **RuntimeClassNames:** pods and [workload resources](#workload-resources-and-pod-templates) specifying an exempt runtime class name are ignored.
60
+ - **Namespaces:** pods and [workload resources](#workload-resources-and-pod-templates) in an exempt namespace are ignored.
61
+
62
+ > [!caution] Caution:
63
+ > Most pods are created by a controller in response to a [workload resource](#workload-resources-and-pod-templates), meaning that exempting an end user will only exempt them from enforcement when creating pods directly, but not when creating a workload resource. Controller service accounts (such as `system:serviceaccount:kube-system:replicaset-controller`) should generally not be exempted, as doing so would implicitly exempt any user that can create the corresponding workload resource.
64
+
65
+ Updates to the following pod fields are exempt from policy checks, meaning that if a pod update request only changes these fields, it will not be denied even if the pod is in violation of the current policy level:
66
+
67
+ - Any metadata updates **except** changes to the seccomp or AppArmor annotations:
68
+ - `seccomp.security.alpha.kubernetes.io/pod` (deprecated)
69
+ - `container.seccomp.security.alpha.kubernetes.io/*` (deprecated)
70
+ - `container.apparmor.security.beta.kubernetes.io/*` (deprecated)
71
+ - Valid updates to `.spec.activeDeadlineSeconds`
72
+ - Valid updates to `.spec.tolerations`
73
+
74
+ ## Metrics
75
+
76
+ Here are the Prometheus metrics exposed by kube-apiserver:
77
+
78
+ - `pod_security_errors_total`: This metric indicates the number of errors preventing normal evaluation. Non-fatal errors may result in the latest restricted profile being used for enforcement.
79
+ - `pod_security_evaluations_total`: This metric indicates the number of policy evaluations that have occurred, not counting ignored or exempt requests during exporting.
80
+ - `pod_security_exemptions_total`: This metric indicates the number of exempt requests, not counting ignored or out of scope requests.
81
+
82
+ ## What's next
83
+
84
+ - [Pod Security Standards](https://kubernetes.io/docs/concepts/security/pod-security-standards/)
85
+ - [Enforcing Pod Security Standards](https://kubernetes.io/docs/setup/best-practices/enforcing-pod-security-standards/)
86
+ - [Enforce Pod Security Standards by Configuring the Built-in Admission Controller](https://kubernetes.io/docs/tasks/configure-pod-container/enforce-standards-admission-controller/)
87
+ - [Enforce Pod Security Standards with Namespace Labels](https://kubernetes.io/docs/tasks/configure-pod-container/enforce-standards-namespace-labels/)
88
+
89
+ If you are running an older version of Kubernetes and want to upgrade to a version of Kubernetes that does not include PodSecurityPolicies, read [migrate from PodSecurityPolicy to the Built-In PodSecurity Admission Controller](https://kubernetes.io/docs/tasks/configure-pod-container/migrate-from-psp/).
90
+
91
+
92
+
93
+ Last modified March 07, 2024 at 4:54 PM PST: [AppArmor v1.30 docs update (4f11f83a45)](https://github.com/kubernetes/website/commit/4f11f83a451b55d2e79ccd0472058b9f59e562ed)
data/k8s_docs/k8s_pods.md ADDED
@@ -0,0 +1,305 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *Pods* are the smallest deployable units of computing that you can create and manage in Kubernetes.
2
+
3
+ A *Pod* (as in a pod of whales or pea pod) is a group of one or more [containers](https://kubernetes.io/docs/concepts/containers/ "A lightweight and portable executable image that contains software and all of its dependencies."), with shared storage and network resources, and a specification for how to run the containers. A Pod's contents are always co-located and co-scheduled, and run in a shared context. A Pod models an application-specific "logical host": it contains one or more application containers which are relatively tightly coupled. In non-cloud contexts, applications executed on the same physical or virtual machine are analogous to cloud applications executed on the same logical host.
4
+
5
+ As well as application containers, a Pod can contain [init containers](https://kubernetes.io/docs/concepts/workloads/pods/init-containers/ "One or more initialization containers that must run to completion before any app containers run.") that run during Pod startup. You can also inject [ephemeral containers](https://kubernetes.io/docs/concepts/workloads/pods/ephemeral-containers/ "A type of container type that you can temporarily run inside a Pod") for debugging a running Pod.
6
+
7
+ ## What is a Pod?
8
+
9
+ > [!info] Note:
10
+ > You need to install a [container runtime](https://kubernetes.io/docs/setup/production-environment/container-runtimes/) into each node in the cluster so that Pods can run there.
11
+
12
+ The shared context of a Pod is a set of Linux namespaces, cgroups, and potentially other facets of isolation - the same things that isolate a [container](https://kubernetes.io/docs/concepts/containers/ "A lightweight and portable executable image that contains software and all of its dependencies."). Within a Pod's context, the individual applications may have further sub-isolations applied.
13
+
14
+ A Pod is similar to a set of containers with shared namespaces and shared filesystem volumes.
15
+
16
+ Pods in a Kubernetes cluster are used in two main ways:
17
+
18
+ - **Pods that run a single container**. The "one-container-per-Pod" model is the most common Kubernetes use case; in this case, you can think of a Pod as a wrapper around a single container; Kubernetes manages Pods rather than managing the containers directly.
19
+ - **Pods that run multiple containers that need to work together**. A Pod can encapsulate an application composed of [multiple co-located containers](#how-pods-manage-multiple-containers) that are tightly coupled and need to share resources. These co-located containers form a single cohesive unit.
20
+ Grouping multiple co-located and co-managed containers in a single Pod is a relatively advanced use case. You should use this pattern only in specific instances in which your containers are tightly coupled.
21
+ You don't need to run multiple containers to provide replication (for resilience or capacity); if you need multiple replicas, see [Workload management](https://kubernetes.io/docs/concepts/workloads/controllers/).
22
+
23
+ ## Using Pods
24
+
25
+ The following is an example of a Pod which consists of a container running the image `nginx:1.14.2`.
26
+
27
+ ```yaml
28
+ apiVersion: v1
29
+ kind: Pod
30
+ metadata:
31
+ name: nginx
32
+ spec:
33
+ containers:
34
+ - name: nginx
35
+ image: nginx:1.14.2
36
+ ports:
37
+ - containerPort: 80
38
+ ```
39
+
40
+ To create the Pod shown above, run the following command:
41
+
42
+ ```shell
43
+ kubectl apply -f https://k8s.io/examples/pods/simple-pod.yaml
44
+ ```
45
+
46
+ Pods are generally not created directly and are created using workload resources. See [Working with Pods](#working-with-pods) for more information on how Pods are used with workload resources.
47
+
48
+ ### Workload resources for managing pods
49
+
50
+ Usually you don't need to create Pods directly, even singleton Pods. Instead, create them using workload resources such as [Deployment](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/ "Manages a replicated application on your cluster.") or [Job](https://kubernetes.io/docs/concepts/workloads/controllers/job/ "A finite or batch task that runs to completion."). If your Pods need to track state, consider the [StatefulSet](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/ "A StatefulSet manages deployment and scaling of a set of Pods, with durable storage and persistent identifiers for each Pod.") resource.
51
+
52
+ Each Pod is meant to run a single instance of a given application. If you want to scale your application horizontally (to provide more overall resources by running more instances), you should use multiple Pods, one for each instance. In Kubernetes, this is typically referred to as *replication*. Replicated Pods are usually created and managed as a group by a workload resource and its [controller](https://kubernetes.io/docs/concepts/architecture/controller/ "A control loop that watches the shared state of the cluster through the apiserver and makes changes attempting to move the current state towards the desired state.").
53
+
54
+ See [Pods and controllers](#pods-and-controllers) for more information on how Kubernetes uses workload resources, and their controllers, to implement application scaling and auto-healing.
55
+
56
+ Pods natively provide two kinds of shared resources for their constituent containers: [networking](#pod-networking) and [storage](#pod-storage).
57
+
58
+ ## Working with Pods
59
+
60
+ You'll rarely create individual Pods directly in Kubernetes—even singleton Pods. This is because Pods are designed as relatively ephemeral, disposable entities. When a Pod gets created (directly by you, or indirectly by a [controller](https://kubernetes.io/docs/concepts/architecture/controller/ "A control loop that watches the shared state of the cluster through the apiserver and makes changes attempting to move the current state towards the desired state.")), the new Pod is scheduled to run on a [Node](https://kubernetes.io/docs/concepts/architecture/nodes/ "A node is a worker machine in Kubernetes.") in your cluster. The Pod remains on that node until the Pod finishes execution, the Pod object is deleted, the Pod is *evicted* for lack of resources, or the node fails.
61
+
62
+ > [!info] Note:
63
+ > Restarting a container in a Pod should not be confused with restarting a Pod. A Pod is not a process, but an environment for running container(s). A Pod persists until it is deleted.
64
+
65
+ The name of a Pod must be a valid [DNS subdomain](https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#dns-subdomain-names) value, but this can produce unexpected results for the Pod hostname. For best compatibility, the name should follow the more restrictive rules for a [DNS label](https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#dns-label-names).
66
+
67
+ ### Pod OS
68
+
69
+ FEATURE STATE: `Kubernetes v1.25 [stable]`
70
+
71
+ You should set the `.spec.os.name` field to either `windows` or `linux` to indicate the OS on which you want the pod to run. These two are the only operating systems supported for now by Kubernetes. In the future, this list may be expanded.
72
+
73
+ In Kubernetes v1.35, the value of `.spec.os.name` does not affect how the [kube-scheduler](https://kubernetes.io/docs/reference/command-line-tools-reference/kube-scheduler/ "Control plane component that watches for newly created pods with no assigned node, and selects a node for them to run on.") picks a node for the Pod to run on. In any cluster where there is more than one operating system for running nodes, you should set the [kubernetes.io/os](https://kubernetes.io/docs/reference/labels-annotations-taints/#kubernetes-io-os) label correctly on each node, and define pods with a `nodeSelector` based on the operating system label. The kube-scheduler assigns your pod to a node based on other criteria and may or may not succeed in picking a suitable node placement where the node OS is right for the containers in that Pod. The [Pod security standards](https://kubernetes.io/docs/concepts/security/pod-security-standards/) also use this field to avoid enforcing policies that aren't relevant to the operating system.
74
+
75
+ ### Pods and controllers
76
+
77
+ You can use workload resources to create and manage multiple Pods for you. A controller for the resource handles replication and rollout and automatic healing in case of Pod failure. For example, if a Node fails, a controller notices that Pods on that Node have stopped working and creates a replacement Pod. The scheduler places the replacement Pod onto a healthy Node.
78
+
79
+ Here are some examples of workload resources that manage one or more Pods:
80
+
81
+ - [Deployment](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/ "Manages a replicated application on your cluster.")
82
+ - [StatefulSet](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/ "A StatefulSet manages deployment and scaling of a set of Pods, with durable storage and persistent identifiers for each Pod.")
83
+ - [DaemonSet](https://kubernetes.io/docs/concepts/workloads/controllers/daemonset "Ensures a copy of a Pod is running across a set of nodes in a cluster.")
84
+
85
+ ### Specifying a Workload reference
86
+
87
+ FEATURE STATE: `Kubernetes v1.35 [alpha]` (disabled by default)
88
+
89
+ By default, Kubernetes schedules every Pod individually. However, some tightly-coupled applications need a group of Pods to be scheduled simultaneously to function correctly.
90
+
91
+ You can link a Pod to a [Workload](https://kubernetes.io/docs/concepts/workloads/workload-api/) object using a [Workload reference](https://kubernetes.io/docs/concepts/workloads/pods/workload-reference/). This tells the `kube-scheduler` that the Pod is part of a specific group, enabling it to make coordinated placement decisions for the entire group at once.
92
+
93
+ ### Pod templates
94
+
95
+ Controllers for [workload](https://kubernetes.io/docs/concepts/workloads/ "A workload is an application running on Kubernetes.") resources create Pods from a *pod template* and manage those Pods on your behalf.
96
+
97
+ PodTemplates are specifications for creating Pods, and are included in workload resources such as [Deployments](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/), [Jobs](https://kubernetes.io/docs/concepts/workloads/controllers/job/), and [DaemonSets](https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/).
98
+
99
+ Each controller for a workload resource uses the `PodTemplate` inside the workload object to make actual Pods. The `PodTemplate` is part of the desired state of whatever workload resource you used to run your app.
100
+
101
+ When you create a Pod, you can include [environment variables](https://kubernetes.io/docs/tasks/inject-data-application/define-environment-variable-container/) in the Pod template for the containers that run in the Pod.
102
+
103
+ The sample below is a manifest for a simple Job with a `template` that starts one container. The container in that Pod prints a message then pauses.
104
+
105
+ ```yaml
106
+ apiVersion: batch/v1
107
+ kind: Job
108
+ metadata:
109
+ name: hello
110
+ spec:
111
+ template:
112
+ # This is the pod template
113
+ spec:
114
+ containers:
115
+ - name: hello
116
+ image: busybox:1.28
117
+ command: ['sh', '-c', 'echo "Hello, Kubernetes!" && sleep 3600']
118
+ restartPolicy: OnFailure
119
+ # The pod template ends here
120
+ ```
121
+
122
+ Modifying the pod template or switching to a new pod template has no direct effect on the Pods that already exist. If you change the pod template for a workload resource, that resource needs to create replacement Pods that use the updated template.
123
+
124
+ For example, the StatefulSet controller ensures that the running Pods match the current pod template for each StatefulSet object. If you edit the StatefulSet to change its pod template, the StatefulSet starts to create new Pods based on the updated template. Eventually, all of the old Pods are replaced with new Pods, and the update is complete.
125
+
126
+ Each workload resource implements its own rules for handling changes to the Pod template. If you want to read more about StatefulSet specifically, read [Update strategy](https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#updating-statefulsets) in the StatefulSet Basics tutorial.
127
+
128
+ On Nodes, the [kubelet](https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet "An agent that runs on each node in the cluster. It makes sure that containers are running in a pod.") does not directly observe or manage any of the details around pod templates and updates; those details are abstracted away. That abstraction and separation of concerns simplifies system semantics, and makes it feasible to extend the cluster's behavior without changing existing code.
129
+
130
+ ## Pod update and replacement
131
+
132
+ As mentioned in the previous section, when the Pod template for a workload resource is changed, the controller creates new Pods based on the updated template instead of updating or patching the existing Pods.
133
+
134
+ Kubernetes doesn't prevent you from managing Pods directly. It is possible to update some fields of a running Pod, in place. However, Pod update operations like [`patch`](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.35/#patch-pod-v1-core), and [`replace`](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.35/#replace-pod-v1-core) have some limitations:
135
+
136
+ - Most of the metadata about a Pod is immutable. For example, you cannot change the `namespace`, `name`, `uid`, or `creationTimestamp` fields.
137
+ - If the `metadata.deletionTimestamp` is set, no new entry can be added to the `metadata.finalizers` list.
138
+ - Pod updates may not change fields other than `spec.containers[*].image`, `spec.initContainers[*].image`, `spec.activeDeadlineSeconds`, `spec.terminationGracePeriodSeconds`, `spec.tolerations` or `spec.schedulingGates`. For `spec.tolerations`, you can only add new entries.
139
+ - When updating the `spec.activeDeadlineSeconds` field, two types of updates are allowed:
140
+ 1. setting the unassigned field to a positive number;
141
+ 2. updating the field from a positive number to a smaller, non-negative number.
142
+
143
+ ### Pod subresources
144
+
145
+ The above update rules apply to regular pod updates, but other pod fields can be updated through *subresources*.
146
+
147
+ - **Resize:** The `resize` subresource allows container resources (`spec.containers[*].resources`) to be updated. See [Resize Container Resources](https://kubernetes.io/docs/tasks/configure-pod-container/resize-container-resources/) for more details.
148
+ - **Ephemeral Containers:** The `ephemeralContainers` subresource allows [ephemeral containers](https://kubernetes.io/docs/concepts/workloads/pods/ephemeral-containers/ "A type of container type that you can temporarily run inside a Pod") to be added to a Pod. See [Ephemeral Containers](https://kubernetes.io/docs/concepts/workloads/pods/ephemeral-containers/) for more details.
149
+ - **Status:** The `status` subresource allows the pod status to be updated. This is typically only used by the Kubelet and other system controllers.
150
+ - **Binding:** The `binding` subresource allows setting the pod's `spec.nodeName` via a `Binding` request. This is typically only used by the [scheduler](https://kubernetes.io/docs/reference/command-line-tools-reference/kube-scheduler/ "Control plane component that watches for newly created pods with no assigned node, and selects a node for them to run on.").
151
+
152
+ ### Pod generation
153
+
154
+ - The `metadata.generation` field is unique. It will be automatically set by the system such that new pods have a `metadata.generation` of 1, and every update to mutable fields in the pod's spec will increment the `metadata.generation` by 1.
155
+
156
+ FEATURE STATE: `Kubernetes v1.35 [stable]` (enabled by default)
157
+
158
+ - `observedGeneration` is a field that is captured in the `status` section of the Pod object. The Kubelet will set `status.observedGeneration` to track the pod state to the current pod status. The pod's `status.observedGeneration` will reflect the `metadata.generation` of the pod at the point that the pod status is being reported.
159
+
160
+ > [!info] Note:
161
+ > The `status.observedGeneration` field is managed by the kubelet and external controllers should **not** modify this field.
162
+
163
+ Different status fields may either be associated with the `metadata.generation` of the current sync loop, or with the `metadata.generation` of the previous sync loop. The key distinction is whether a change in the `spec` is reflected directly in the `status` or is an indirect result of a running process.
164
+
165
+ #### Direct Status Updates
166
+
167
+ For status fields where the allocated spec is directly reflected, the `observedGeneration` will be associated with the current `metadata.generation` (Generation N).
168
+
169
+ This behavior applies to:
170
+
171
+ - **Resize Status**: The status of a resource resize operation.
172
+ - **Allocated Resources**: The resources allocated to the Pod after a resize.
173
+ - **Ephemeral Containers**: When a new ephemeral container is added, and it is in `Waiting` state.
174
+
175
+ #### Indirect Status Updates
176
+
177
+ For status fields that are an indirect result of running the spec, the `observedGeneration` will be associated with the `metadata.generation` of the previous sync loop (Generation N-1).
178
+
179
+ This behavior applies to:
180
+
181
+ - **Container Image**: The `ContainerStatus.ImageID` reflects the image from the previous generation until the new image is pulled and the container is updated.
182
+ - **Actual Resources**: During an in-progress resize, the actual resources in use still belong to the previous generation's request.
183
+ - **Container state**: During an in-progress resize, with require restart policy reflects the previous generation's request.
184
+ - **activeDeadlineSeconds** & **terminationGracePeriodSeconds** & **deletionTimestamp**: The effects of these fields on the Pod's status are a result of the previously observed specification.
185
+
186
+ ## Resource sharing and communication
187
+
188
+ Pods enable data sharing and communication among their constituent containers.
189
+
190
+ ### Storage in Pods
191
+
192
+ A Pod can specify a set of shared storage [volumes](https://kubernetes.io/docs/concepts/storage/volumes/ "A directory containing data, accessible to the containers in a pod."). All containers in the Pod can access the shared volumes, allowing those containers to share data. Volumes also allow persistent data in a Pod to survive in case one of the containers within needs to be restarted. See [Storage](https://kubernetes.io/docs/concepts/storage/) for more information on how Kubernetes implements shared storage and makes it available to Pods.
193
+
194
+ ### Pod networking
195
+
196
+ Each Pod is assigned a unique IP address for each address family. Every container in a Pod shares the network namespace, including the IP address and network ports. Inside a Pod (and **only** then), the containers that belong to the Pod can communicate with one another using `localhost`. When containers in a Pod communicate with entities *outside the Pod*, they must coordinate how they use the shared network resources (such as ports). Within a Pod, containers share an IP address and port space, and can find each other via `localhost`. The containers in a Pod can also communicate with each other using standard inter-process communications like SystemV semaphores or POSIX shared memory. Containers in different Pods have distinct IP addresses and can not communicate by OS-level IPC without special configuration. Containers that want to interact with a container running in a different Pod can use IP networking to communicate.
197
+
198
+ Containers within the Pod see the system hostname as being the same as the configured `name` for the Pod. There's more about this in the [networking](https://kubernetes.io/docs/concepts/cluster-administration/networking/) section.
199
+
200
+ ## Pod security settings
201
+
202
+ To set security constraints on Pods and containers, you use the `securityContext` field in the Pod specification. This field gives you granular control over what a Pod or individual containers can do. See [Advanced Pod Configuration](https://kubernetes.io/docs/concepts/workloads/pods/advanced-pod-config/) for more details.
203
+
204
+ For basic security configuration, you should meet the Baseline Pod security standard and run containers as non-root. You can set simple security contexts:
205
+
206
+ ```yaml
207
+ apiVersion: v1
208
+ kind: Pod
209
+ metadata:
210
+ name: security-context-demo
211
+ spec:
212
+ securityContext:
213
+ runAsUser: 1000
214
+ runAsGroup: 3000
215
+ fsGroup: 2000
216
+ containers:
217
+ - name: sec-ctx-demo
218
+ image: busybox
219
+ command: ["sh", "-c", "sleep 1h"]
220
+ ```
221
+
222
+ For advanced security context configuration including capabilities, seccomp profiles, and detailed security options, see the [security concepts](https://kubernetes.io/docs/concepts/security/) section.
223
+
224
+ - To learn about kernel-level security constraints that you can use, see [Linux kernel security constraints for Pods and containers](https://kubernetes.io/docs/concepts/security/linux-kernel-security-constraints/).
225
+ - To learn more about the Pod security context, see [Configure a Security Context for a Pod or Container](https://kubernetes.io/docs/tasks/configure-pod-container/security-context/).
226
+
227
+ ## Resource requests and limits
228
+
229
+ When you specify a Pod, you can optionally specify how much of each resource a container needs. The most common resources to specify are CPU and memory (RAM).
230
+
231
+ When you specify the resource *request* for containers in a Pod, the kube-scheduler uses this information to decide which node to place the Pod on. When you specify a resource *limit* for a container, the kubelet enforces those limits so that the running container is not allowed to use more of that resource than the limit you set.
232
+
233
+ CPU limits are enforced by CPU throttling. When a container approaches its CPU limit, the kernel restricts its access to CPU. Memory limits are enforced by the kernel with out-of-memory (OOM) kills when a container exceeds its limit.
234
+
235
+ > [!info] Note:
236
+ > Setting CPU limits involves a trade-off. CPU limits help prevent noisy neighbor problems where a single workload starves others on the same node. This is especially important in multi-tenant environments. However, CPU limits can cause throttling even when the node has spare CPU capacity, potentially degrading latency-sensitive workload performance. Whether to set CPU limits depends on your environment, workload characteristics, and isolation requirements.
237
+
238
+ For details on resource units, enforcement behavior, and configuration examples, see [Resource Management for Pods and Containers](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/).
239
+
240
+ ## Static Pods
241
+
242
+ *Static Pods* are managed directly by the kubelet daemon on a specific node, without the [API server](https://kubernetes.io/docs/concepts/architecture/#kube-apiserver "Control plane component that serves the Kubernetes API.") observing them. Whereas most Pods are managed by the control plane (for example, a [Deployment](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/ "Manages a replicated application on your cluster.")), for static Pods, the kubelet directly supervises each static Pod (and restarts it if it fails).
243
+
244
+ Static Pods are always bound to one [Kubelet](https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet "An agent that runs on each node in the cluster. It makes sure that containers are running in a pod.") on a specific node. The main use for static Pods is to run a self-hosted control plane: in other words, using the kubelet to supervise the individual [control plane components](https://kubernetes.io/docs/concepts/architecture/#control-plane-components).
245
+
246
+ The kubelet automatically tries to create a [mirror Pod](https://kubernetes.io/docs/reference/glossary/?all=true#term-mirror-pod "An object in the API server that tracks a static pod on a kubelet.") on the Kubernetes API server for each static Pod. This means that the Pods running on a node are visible on the API server, but cannot be controlled from there. See the guide [Create static Pods](https://kubernetes.io/docs/tasks/configure-pod-container/static-pod/) for more information.
247
+
248
+ > [!info] Note:
249
+ > The `spec` of a static Pod cannot refer to other API objects (e.g., [ServiceAccount](https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/ "Provides an identity for processes that run in a Pod."), [ConfigMap](https://kubernetes.io/docs/concepts/configuration/configmap/ "An API object used to store non-confidential data in key-value pairs. Can be consumed as environment variables, command-line arguments, or configuration files in a volume."), [Secret](https://kubernetes.io/docs/concepts/configuration/secret/ "Stores sensitive information, such as passwords, OAuth tokens, and ssh keys."), etc).
250
+
251
+ ## Pods with multiple containers
252
+
253
+ Pods are designed to support multiple cooperating processes (as containers) that form a cohesive unit of service. The containers in a Pod are automatically co-located and co-scheduled on the same physical or virtual machine in the cluster. The containers can share resources and dependencies, communicate with one another, and coordinate when and how they are terminated.
254
+
255
+ Pods in a Kubernetes cluster are used in two main ways:
256
+
257
+ - **Pods that run a single container**. The "one-container-per-Pod" model is the most common Kubernetes use case; in this case, you can think of a Pod as a wrapper around a single container; Kubernetes manages Pods rather than managing the containers directly.
258
+ - **Pods that run multiple containers that need to work together**. A Pod can encapsulate an application composed of multiple co-located containers that are tightly coupled and need to share resources. These co-located containers form a single cohesive unit of service—for example, one container serving data stored in a shared volume to the public, while a separate [sidecar container](https://kubernetes.io/docs/concepts/workloads/pods/sidecar-containers/ "An auxilliary container that stays running throughout the lifecycle of a Pod.") refreshes or updates those files. The Pod wraps these containers, storage resources, and an ephemeral network identity together as a single unit.
259
+
260
+ For example, you might have a container that acts as a web server for files in a shared volume, and a separate [sidecar container](https://kubernetes.io/docs/concepts/workloads/pods/sidecar-containers/) that updates those files from a remote source, as in the following diagram:
261
+
262
+ ![Pod creation diagram](https://kubernetes.io/images/docs/pod.svg)
263
+
264
+ Pod creation diagram
265
+
266
+ Some Pods have [init containers](https://kubernetes.io/docs/concepts/workloads/pods/init-containers/ "One or more initialization containers that must run to completion before any app containers run.") as well as [app containers](https://kubernetes.io/docs/reference/glossary/?all=true#term-app-container "A container used to run part of a workload. Compare with init container."). By default, init containers run and complete before the app containers are started.
267
+
268
+ You can also have [sidecar containers](https://kubernetes.io/docs/concepts/workloads/pods/sidecar-containers/) that provide auxiliary services to the main application Pod (for example: a service mesh).
269
+
270
+ FEATURE STATE: `Kubernetes v1.33 [stable]` (enabled by default)
271
+
272
+ Enabled by default, the `SidecarContainers` [feature gate](https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/) allows you to specify `restartPolicy: Always` for init containers. Setting the `Always` restart policy ensures that the containers where you set it are treated as *sidecars* that are kept running during the entire lifetime of the Pod. Containers that you explicitly define as sidecar containers start up before the main application Pod and remain running until the Pod is shut down.
273
+
274
+ ## Container probes
275
+
276
+ A *probe* is a diagnostic performed periodically by the kubelet on a container. To perform a diagnostic, the kubelet can invoke different actions:
277
+
278
+ - `ExecAction` (performed with the help of the container runtime)
279
+ - `TCPSocketAction` (checked directly by the kubelet)
280
+ - `HTTPGetAction` (checked directly by the kubelet)
281
+
282
+ You can read more about [probes](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#container-probes) in the Pod Lifecycle documentation.
283
+
284
+ ## What's next
285
+
286
+ - Learn about the [lifecycle of a Pod](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/).
287
+ - Read about [PodDisruptionBudget](https://kubernetes.io/docs/concepts/workloads/pods/disruptions/) and how you can use it to manage application availability during disruptions.
288
+ - Pod is a top-level resource in the Kubernetes REST API. The [Pod](https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/) object definition describes the object in detail.
289
+ - [The Distributed System Toolkit: Patterns for Composite Containers](https://kubernetes.io/blog/2015/06/the-distributed-system-toolkit-patterns/) explains common layouts for Pods with more than one container.
290
+ - Read about [Pod topology spread constraints](https://kubernetes.io/docs/concepts/scheduling-eviction/topology-spread-constraints/)
291
+ - Read [Advanced Pod Configuration](https://kubernetes.io/docs/concepts/workloads/pods/advanced-pod-config/) to learn the topic in detail. That page covers aspects of Pod configuration beyond the essentials, including:
292
+ - PriorityClasses
293
+ - RuntimeClasses
294
+ - advanced ways to configure *scheduling*: the way that Kubernetes decides which node a Pod should run on.
295
+
296
+ To understand the context for why Kubernetes wraps a common Pod API in other resources (such as [StatefulSets](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/ "A StatefulSet manages deployment and scaling of a set of Pods, with durable storage and persistent identifiers for each Pod.") or [Deployments](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/ "Manages a replicated application on your cluster.")), you can read about the prior art, including:
297
+
298
+ - [Aurora](https://aurora.apache.org/documentation/latest/reference/configuration/#job-schema)
299
+ - [Borg](https://research.google/pubs/large-scale-cluster-management-at-google-with-borg/)
300
+ - [Marathon](https://github.com/d2iq-archive/marathon)
301
+ - [Omega](https://research.google/pubs/pub41684/)
302
+ - [Tupperware](https://engineering.fb.com/data-center-engineering/tupperware/).
303
+
304
+
305
+ Last modified February 28, 2026 at 10:29 PM PST: [add resource requests and limits trade-off (79b3410c32)](https://github.com/kubernetes/website/commit/79b3410c328e4225eb7a9384ca2a6cb0a3b7c5ce)
data/k8s_docs/k8s_replicaset.md ADDED
@@ -0,0 +1,399 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ A ReplicaSet's purpose is to maintain a stable set of replica Pods running at any given time. Usually, you define a Deployment and let that Deployment manage ReplicaSets automatically.
2
+
3
+ A ReplicaSet's purpose is to maintain a stable set of replica Pods running at any given time. As such, it is often used to guarantee the availability of a specified number of identical Pods.
4
+
5
+ ## How a ReplicaSet works
6
+
7
+ A ReplicaSet is defined with fields, including a selector that specifies how to identify Pods it can acquire, a number of replicas indicating how many Pods it should be maintaining, and a pod template specifying the data of new Pods it should create to meet the number of replicas criteria. A ReplicaSet then fulfills its purpose by creating and deleting Pods as needed to reach the desired number. When a ReplicaSet needs to create new Pods, it uses its Pod template.
8
+
9
+ A ReplicaSet is linked to its Pods via the Pods' [metadata.ownerReferences](https://kubernetes.io/docs/concepts/architecture/garbage-collection/#owners-dependents) field, which specifies what resource the current object is owned by. All Pods acquired by a ReplicaSet have their owning ReplicaSet's identifying information within their ownerReferences field. It's through this link that the ReplicaSet knows of the state of the Pods it is maintaining and plans accordingly.
10
+
11
+ A ReplicaSet identifies new Pods to acquire by using its selector. If there is a Pod that has no OwnerReference or the OwnerReference is not a [Controller](https://kubernetes.io/docs/concepts/architecture/controller/ "A control loop that watches the shared state of the cluster through the apiserver and makes changes attempting to move the current state towards the desired state.") and it matches a ReplicaSet's selector, it will be immediately acquired by said ReplicaSet.
12
+
13
+ ## When to use a ReplicaSet
14
+
15
+ A ReplicaSet ensures that a specified number of pod replicas are running at any given time. However, a Deployment is a higher-level concept that manages ReplicaSets and provides declarative updates to Pods along with a lot of other useful features. Therefore, we recommend using Deployments instead of directly using ReplicaSets, unless you require custom update orchestration or don't require updates at all.
16
+
17
+ This actually means that you may never need to manipulate ReplicaSet objects: use a Deployment instead, and define your application in the spec section.
18
+
19
+ ## Example
20
+
21
+ ```yaml
22
+ apiVersion: apps/v1
23
+ kind: ReplicaSet
24
+ metadata:
25
+ name: frontend
26
+ labels:
27
+ app: guestbook
28
+ tier: frontend
29
+ spec:
30
+ # modify replicas according to your case
31
+ replicas: 3
32
+ selector:
33
+ matchLabels:
34
+ tier: frontend
35
+ template:
36
+ metadata:
37
+ labels:
38
+ tier: frontend
39
+ spec:
40
+ containers:
41
+ - name: php-redis
42
+ image: us-docker.pkg.dev/google-samples/containers/gke/gb-frontend:v5
43
+ ```
44
+
45
+ Saving this manifest into `frontend.yaml` and submitting it to a Kubernetes cluster will create the defined ReplicaSet and the Pods that it manages.
46
+
47
+ ```shell
48
+ kubectl apply -f https://kubernetes.io/examples/controllers/frontend.yaml
49
+ ```
50
+
51
+ You can then get the current ReplicaSets deployed:
52
+
53
+ ```shell
54
+ kubectl get rs
55
+ ```
56
+
57
+ And see the frontend one you created:
58
+
59
+ ```
60
+ NAME DESIRED CURRENT READY AGE
61
+ frontend 3 3 3 6s
62
+ ```
63
+
64
+ You can also check on the state of the ReplicaSet:
65
+
66
+ ```shell
67
+ kubectl describe rs/frontend
68
+ ```
69
+
70
+ And you will see output similar to:
71
+
72
+ ```
73
+ Name: frontend
74
+ Namespace: default
75
+ Selector: tier=frontend
76
+ Labels: app=guestbook
77
+ tier=frontend
78
+ Annotations: <none>
79
+ Replicas: 3 current / 3 desired
80
+ Pods Status: 3 Running / 0 Waiting / 0 Succeeded / 0 Failed
81
+ Pod Template:
82
+ Labels: tier=frontend
83
+ Containers:
84
+ php-redis:
85
+ Image: us-docker.pkg.dev/google-samples/containers/gke/gb-frontend:v5
86
+ Port: <none>
87
+ Host Port: <none>
88
+ Environment: <none>
89
+ Mounts: <none>
90
+ Volumes: <none>
91
+ Events:
92
+ Type Reason Age From Message
93
+ ---- ------ ---- ---- -------
94
+ Normal SuccessfulCreate 13s replicaset-controller Created pod: frontend-gbgfx
95
+ Normal SuccessfulCreate 13s replicaset-controller Created pod: frontend-rwz57
96
+ Normal SuccessfulCreate 13s replicaset-controller Created pod: frontend-wkl7w
97
+ ```
98
+
99
+ And lastly you can check for the Pods brought up:
100
+
101
+ ```shell
102
+ kubectl get pods
103
+ ```
104
+
105
+ You should see Pod information similar to:
106
+
107
+ ```
108
+ NAME READY STATUS RESTARTS AGE
109
+ frontend-gbgfx 1/1 Running 0 10m
110
+ frontend-rwz57 1/1 Running 0 10m
111
+ frontend-wkl7w 1/1 Running 0 10m
112
+ ```
113
+
114
+ You can also verify that the owner reference of these pods is set to the frontend ReplicaSet. To do this, get the yaml of one of the Pods running:
115
+
116
+ ```shell
117
+ kubectl get pods frontend-gbgfx -o yaml
118
+ ```
119
+
120
+ The output will look similar to this, with the frontend ReplicaSet's info set in the metadata's ownerReferences field:
121
+
122
+ ```yaml
123
+ apiVersion: v1
124
+ kind: Pod
125
+ metadata:
126
+ creationTimestamp: "2024-02-28T22:30:44Z"
127
+ generateName: frontend-
128
+ labels:
129
+ tier: frontend
130
+ name: frontend-gbgfx
131
+ namespace: default
132
+ ownerReferences:
133
+ - apiVersion: apps/v1
134
+ blockOwnerDeletion: true
135
+ controller: true
136
+ kind: ReplicaSet
137
+ name: frontend
138
+ uid: e129deca-f864-481b-bb16-b27abfd92292
139
+ ...
140
+ ```
141
+
142
+ ## Non-Template Pod acquisitions
143
+
144
+ While you can create bare Pods with no problems, it is strongly recommended to make sure that the bare Pods do not have labels which match the selector of one of your ReplicaSets. The reason for this is because a ReplicaSet is not limited to owning Pods specified by its template-- it can acquire other Pods in the manner specified in the previous sections.
145
+
146
+ Take the previous frontend ReplicaSet example, and the Pods specified in the following manifest:
147
+
148
+ ```yaml
149
+ apiVersion: v1
150
+ kind: Pod
151
+ metadata:
152
+ name: pod1
153
+ labels:
154
+ tier: frontend
155
+ spec:
156
+ containers:
157
+ - name: hello1
158
+ image: gcr.io/google-samples/hello-app:2.0
159
+
160
+ ---
161
+
162
+ apiVersion: v1
163
+ kind: Pod
164
+ metadata:
165
+ name: pod2
166
+ labels:
167
+ tier: frontend
168
+ spec:
169
+ containers:
170
+ - name: hello2
171
+ image: gcr.io/google-samples/hello-app:1.0
172
+ ```
173
+
174
+ As those Pods do not have a Controller (or any object) as their owner reference and match the selector of the frontend ReplicaSet, they will immediately be acquired by it.
175
+
176
+ Suppose you create the Pods after the frontend ReplicaSet has been deployed and has set up its initial Pod replicas to fulfill its replica count requirement:
177
+
178
+ ```shell
179
+ kubectl apply -f https://kubernetes.io/examples/pods/pod-rs.yaml
180
+ ```
181
+
182
+ The new Pods will be acquired by the ReplicaSet, and then immediately terminated as the ReplicaSet would be over its desired count.
183
+
184
+ Fetching the Pods:
185
+
186
+ ```shell
187
+ kubectl get pods
188
+ ```
189
+
190
+ The output shows that the new Pods are either already terminated, or in the process of being terminated:
191
+
192
+ ```
193
+ NAME READY STATUS RESTARTS AGE
194
+ frontend-b2zdv 1/1 Running 0 10m
195
+ frontend-vcmts 1/1 Running 0 10m
196
+ frontend-wtsmm 1/1 Running 0 10m
197
+ pod1 0/1 Terminating 0 1s
198
+ pod2 0/1 Terminating 0 1s
199
+ ```
200
+
201
+ If you create the Pods first:
202
+
203
+ ```shell
204
+ kubectl apply -f https://kubernetes.io/examples/pods/pod-rs.yaml
205
+ ```
206
+
207
+ And then create the ReplicaSet however:
208
+
209
+ ```shell
210
+ kubectl apply -f https://kubernetes.io/examples/controllers/frontend.yaml
211
+ ```
212
+
213
+ You shall see that the ReplicaSet has acquired the Pods and has only created new ones according to its spec until the number of its new Pods and the original matches its desired count. As fetching the Pods:
214
+
215
+ ```shell
216
+ kubectl get pods
217
+ ```
218
+
219
+ Will reveal in its output:
220
+
221
+ ```
222
+ NAME READY STATUS RESTARTS AGE
223
+ frontend-hmmj2 1/1 Running 0 9s
224
+ pod1 1/1 Running 0 36s
225
+ pod2 1/1 Running 0 36s
226
+ ```
227
+
228
+ In this manner, a ReplicaSet can own a non-homogeneous set of Pods
229
+
230
+ ## Writing a ReplicaSet manifest
231
+
232
+ As with all other Kubernetes API objects, a ReplicaSet needs the `apiVersion`, `kind`, and `metadata` fields. For ReplicaSets, the `kind` is always a ReplicaSet.
233
+
234
+ When the control plane creates new Pods for a ReplicaSet, the `.metadata.name` of the ReplicaSet is part of the basis for naming those Pods. The name of a ReplicaSet must be a valid [DNS subdomain](https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#dns-subdomain-names) value, but this can produce unexpected results for the Pod hostnames. For best compatibility, the name should follow the more restrictive rules for a [DNS label](https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#dns-label-names).
235
+
236
+ A ReplicaSet also needs a [`.spec` section](https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status).
237
+
238
+ ### Pod Template
239
+
240
+ The `.spec.template` is a [pod template](https://kubernetes.io/docs/concepts/workloads/pods/#pod-templates) which is also required to have labels in place. In our `frontend.yaml` example we had one label: `tier: frontend`. Be careful not to overlap with the selectors of other controllers, lest they try to adopt this Pod.
241
+
242
+ For the template's [restart policy](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#restart-policy) field, `.spec.template.spec.restartPolicy`, the only allowed value is `Always`, which is the default.
243
+
244
+ ### Pod Selector
245
+
246
+ The `.spec.selector` field is a [label selector](https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/). As discussed [earlier](#how-a-replicaset-works) these are the labels used to identify potential Pods to acquire. In our `frontend.yaml` example, the selector was:
247
+
248
+ ```yaml
249
+ matchLabels:
250
+ tier: frontend
251
+ ```
252
+
253
+ In the ReplicaSet, `.spec.template.metadata.labels` must match `spec.selector`, or it will be rejected by the API.
254
+
255
+ > [!info] Note:
256
+ > For 2 ReplicaSets specifying the same `.spec.selector` but different `.spec.template.metadata.labels` and `.spec.template.spec` fields, each ReplicaSet ignores the Pods created by the other ReplicaSet.
257
+
258
+ ### Replicas
259
+
260
+ You can specify how many Pods should run concurrently by setting `.spec.replicas`. The ReplicaSet will create/delete its Pods to match this number.
261
+
262
+ If you do not specify `.spec.replicas`, then it defaults to 1.
263
+
264
+ ## Working with ReplicaSets
265
+
266
+ ### Deleting a ReplicaSet and its Pods
267
+
268
+ To delete a ReplicaSet and all of its Pods, use [`kubectl delete`](https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#delete). The [Garbage collector](https://kubernetes.io/docs/concepts/architecture/garbage-collection/) automatically deletes all of the dependent Pods by default.
269
+
270
+ When using the REST API or the `client-go` library, you must set `propagationPolicy` to `Background` or `Foreground` in the `-d` option. For example:
271
+
272
+ ```shell
273
+ kubectl proxy --port=8080
274
+ curl -X DELETE 'localhost:8080/apis/apps/v1/namespaces/default/replicasets/frontend' \
275
+ -d '{"kind":"DeleteOptions","apiVersion":"v1","propagationPolicy":"Foreground"}' \
276
+ -H "Content-Type: application/json"
277
+ ```
278
+
279
+ ### Deleting just a ReplicaSet
280
+
281
+ You can delete a ReplicaSet without affecting any of its Pods using [`kubectl delete`](https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#delete) with the `--cascade=orphan` option. When using the REST API or the `client-go` library, you must set `propagationPolicy` to `Orphan`. For example:
282
+
283
+ ```shell
284
+ kubectl proxy --port=8080
285
+ curl -X DELETE 'localhost:8080/apis/apps/v1/namespaces/default/replicasets/frontend' \
286
+ -d '{"kind":"DeleteOptions","apiVersion":"v1","propagationPolicy":"Orphan"}' \
287
+ -H "Content-Type: application/json"
288
+ ```
289
+
290
+ Once the original is deleted, you can create a new ReplicaSet to replace it. As long as the old and new `.spec.selector` are the same, then the new one will adopt the old Pods. However, it will not make any effort to make existing Pods match a new, different pod template. To update Pods to a new spec in a controlled way, use a [Deployment](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#creating-a-deployment), as ReplicaSets do not support a rolling update directly.
291
+
292
+ ### Terminating Pods
293
+
294
+ FEATURE STATE: `Kubernetes v1.35 [beta]` (enabled by default)
295
+
296
+ You can enable this feature by setting the `DeploymentReplicaSetTerminatingReplicas` [feature gate](https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/) on the [API server](https://kubernetes.io/docs/reference/command-line-tools-reference/kube-apiserver/) and on the [kube-controller-manager](https://kubernetes.io/docs/reference/command-line-tools-reference/kube-controller-manager/)
297
+
298
+ Pods that become terminating due to deletion or scale down may take a long time to terminate, and may consume additional resources during that period. As a result, the total number of all pods can temporarily exceed `.spec.replicas`. Terminating pods can be tracked using the `.status.terminatingReplicas` field of the ReplicaSet.
299
+
300
+ ### Isolating Pods from a ReplicaSet
301
+
302
+ You can remove Pods from a ReplicaSet by changing their labels. This technique may be used to remove Pods from service for debugging, data recovery, etc. Pods that are removed in this way will be replaced automatically ( assuming that the number of replicas is not also changed).
303
+
304
+ ### Scaling a ReplicaSet
305
+
306
+ A ReplicaSet can be easily scaled up or down by simply updating the `.spec.replicas` field. The ReplicaSet controller ensures that a desired number of Pods with a matching label selector are available and operational.
307
+
308
+ When scaling down, the ReplicaSet controller chooses which pods to delete by sorting the available pods to prioritize scaling down pods based on the following general algorithm:
309
+
310
+ 1. Pending (and unschedulable) pods are scaled down first
311
+ 2. If `controller.kubernetes.io/pod-deletion-cost` annotation is set, then the pod with the lower value will come first.
312
+ 3. Pods on nodes with more replicas come before pods on nodes with fewer replicas.
313
+ 4. If the pods' creation times differ, the pod that was created more recently comes before the older pod (the creation times are bucketed on an integer log scale).
314
+
315
+ If all of the above match, then selection is random.
316
+
317
+ ### Pod deletion cost
318
+
319
+ FEATURE STATE: `Kubernetes v1.22 [beta]`
320
+
321
+ Using the [`controller.kubernetes.io/pod-deletion-cost`](https://kubernetes.io/docs/reference/labels-annotations-taints/#pod-deletion-cost) annotation, users can set a preference regarding which pods to remove first when downscaling a ReplicaSet.
322
+
323
+ The annotation should be set on the pod, the range is \[-2147483648, 2147483647\]. It represents the cost of deleting a pod compared to other pods belonging to the same ReplicaSet. Pods with lower deletion cost are preferred to be deleted before pods with higher deletion cost.
324
+
325
+ The implicit value for this annotation for pods that don't set it is 0; negative values are permitted. Invalid values will be rejected by the API server.
326
+
327
+ This feature is beta and enabled by default. You can disable it using the [feature gate](https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/) `PodDeletionCost` in both kube-apiserver and kube-controller-manager.
328
+
329
+ > [!info] Note:
330
+ > - This is honored on a best-effort basis, so it does not offer any guarantees on pod deletion order.
331
+ > - Users should avoid updating the annotation frequently, such as updating it based on a metric value, because doing so will generate a significant number of pod updates on the apiserver.
332
+
333
+ #### Example Use Case
334
+
335
+ The different pods of an application could have different utilization levels. On scale down, the application may prefer to remove the pods with lower utilization. To avoid frequently updating the pods, the application should update `controller.kubernetes.io/pod-deletion-cost` once before issuing a scale down (setting the annotation to a value proportional to pod utilization level). This works if the application itself controls the down scaling; for example, the driver pod of a Spark deployment.
336
+
337
+ ### ReplicaSet as a Horizontal Pod Autoscaler Target
338
+
339
+ A ReplicaSet can also be a target for [Horizontal Pod Autoscalers (HPA)](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/). That is, a ReplicaSet can be auto-scaled by an HPA. Here is an example HPA targeting the ReplicaSet we created in the previous example.
340
+
341
+ ```yaml
342
+ apiVersion: autoscaling/v1
343
+ kind: HorizontalPodAutoscaler
344
+ metadata:
345
+ name: frontend-scaler
346
+ spec:
347
+ scaleTargetRef:
348
+ apiVersion: apps/v1
349
+ kind: ReplicaSet
350
+ name: frontend
351
+ minReplicas: 3
352
+ maxReplicas: 10
353
+ targetCPUUtilizationPercentage: 50
354
+ ```
355
+
356
+ Saving this manifest into `hpa-rs.yaml` and submitting it to a Kubernetes cluster should create the defined HPA that autoscales the target ReplicaSet depending on the CPU usage of the replicated Pods.
357
+
358
+ ```shell
359
+ kubectl apply -f https://k8s.io/examples/controllers/hpa-rs.yaml
360
+ ```
361
+
362
+ Alternatively, you can use the `kubectl autoscale` command to accomplish the same (and it's easier!)
363
+
364
+ ```shell
365
+ kubectl autoscale rs frontend --max=10 --min=3 --cpu=50%
366
+ ```
367
+
368
+ ## Alternatives to ReplicaSet
369
+
370
+ ### Deployment (recommended)
371
+
372
+ [`Deployment`](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/) is an object which can own ReplicaSets and update them and their Pods via declarative, server-side rolling updates. While ReplicaSets can be used independently, today they're mainly used by Deployments as a mechanism to orchestrate Pod creation, deletion and updates. When you use Deployments you don't have to worry about managing the ReplicaSets that they create. Deployments own and manage their ReplicaSets. As such, it is recommended to use Deployments when you want ReplicaSets.
373
+
374
+ ### Bare Pods
375
+
376
+ Unlike the case where a user directly created Pods, a ReplicaSet replaces Pods that are deleted or terminated for any reason, such as in the case of node failure or disruptive node maintenance, such as a kernel upgrade. For this reason, we recommend that you use a ReplicaSet even if your application requires only a single Pod. Think of it similarly to a process supervisor, only it supervises multiple Pods across multiple nodes instead of individual processes on a single node. A ReplicaSet delegates local container restarts to some agent on the node such as Kubelet.
377
+
378
+ ### Job
379
+
380
+ Use a [`Job`](https://kubernetes.io/docs/concepts/workloads/controllers/job/) instead of a ReplicaSet for Pods that are expected to terminate on their own (that is, batch jobs).
381
+
382
+ ### DaemonSet
383
+
384
+ Use a [`DaemonSet`](https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/) instead of a ReplicaSet for Pods that provide a machine-level function, such as machine monitoring or machine logging. These Pods have a lifetime that is tied to a machine lifetime: the Pod needs to be running on the machine before other Pods start, and are safe to terminate when the machine is otherwise ready to be rebooted/shutdown.
385
+
386
+ ### ReplicationController
387
+
388
+ ReplicaSets are the successors to [ReplicationControllers](https://kubernetes.io/docs/concepts/workloads/controllers/replicationcontroller/). The two serve the same purpose, and behave similarly, except that a ReplicationController does not support set-based selector requirements as described in the [labels user guide](https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#label-selectors). As such, ReplicaSets are preferred over ReplicationControllers
389
+
390
+ ## What's next
391
+
392
+ - Learn about [Pods](https://kubernetes.io/docs/concepts/workloads/pods/).
393
+ - Learn about [Deployments](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/).
394
+ - [Run a Stateless Application Using a Deployment](https://kubernetes.io/docs/tasks/run-application/run-stateless-application-deployment/), which relies on ReplicaSets to work.
395
+ - `ReplicaSet` is a top-level resource in the Kubernetes REST API. Read the [ReplicaSet](https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/replica-set-v1/) object definition to understand the API for replica sets.
396
+ - Read about [PodDisruptionBudget](https://kubernetes.io/docs/concepts/workloads/pods/disruptions/) and how you can use it to manage application availability during disruptions.
397
+
398
+
399
+ Last modified September 26, 2025 at 6:20 PM PST: [Fix HPA CLI example in ReplicaSet doc (55add008ed)](https://github.com/kubernetes/website/commit/55add008edd6efd03de533257d4cf79628f58103)
data/k8s_docs/k8s_secret.md ADDED
@@ -0,0 +1,549 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ A Secret is an object that contains a small amount of sensitive data such as a password, a token, or a key. Such information might otherwise be put in a [Pod](https://kubernetes.io/docs/concepts/workloads/pods/ "A Pod represents a set of running containers in your cluster.") specification or in a [container image](https://kubernetes.io/docs/reference/glossary/?all=true#term-image "Stored instance of a container that holds a set of software needed to run an application."). Using a Secret means that you don't need to include confidential data in your application code.
2
+
3
+ Because Secrets can be created independently of the Pods that use them, there is less risk of the Secret (and its data) being exposed during the workflow of creating, viewing, and editing Pods. Kubernetes, and applications that run in your cluster, can also take additional precautions with Secrets, such as avoiding writing sensitive data to nonvolatile storage.
4
+
5
+ Secrets are similar to [ConfigMaps](https://kubernetes.io/docs/concepts/configuration/configmap/ "An API object used to store non-confidential data in key-value pairs. Can be consumed as environment variables, command-line arguments, or configuration files in a volume.") but are specifically intended to hold confidential data.
6
+
7
+ > [!caution] Caution:
8
+ > Kubernetes Secrets are, by default, stored unencrypted in the API server's underlying data store (etcd). Anyone with API access can retrieve or modify a Secret, and so can anyone with access to etcd. Additionally, anyone who is authorized to create a Pod in a namespace can use that access to read any Secret in that namespace; this includes indirect access such as the ability to create a Deployment.
9
+ >
10
+ > In order to safely use Secrets, take at least the following steps:
11
+ >
12
+ > 1. [Enable Encryption at Rest](https://kubernetes.io/docs/tasks/administer-cluster/encrypt-data/) for Secrets.
13
+ > 2. [Enable or configure RBAC rules](https://kubernetes.io/docs/reference/access-authn-authz/authorization/) with least-privilege access to Secrets.
14
+ > 3. Restrict Secret access to specific containers.
15
+ > 4. [Consider using external Secret store providers](https://secrets-store-csi-driver.sigs.k8s.io/concepts.html#provider-for-the-secrets-store-csi-driver).
16
+ >
17
+ > For more guidelines to manage and improve the security of your Secrets, refer to [Good practices for Kubernetes Secrets](https://kubernetes.io/docs/concepts/security/secrets-good-practices/).
18
+
19
+ See [Information security for Secrets](#information-security-for-secrets) for more details.
20
+
21
+ ## Uses for Secrets
22
+
23
+ You can use Secrets for purposes such as the following:
24
+
25
+ - [Set environment variables for a container](https://kubernetes.io/docs/tasks/inject-data-application/distribute-credentials-secure/#define-container-environment-variables-using-secret-data).
26
+ - [Provide credentials such as SSH keys or passwords to Pods](https://kubernetes.io/docs/tasks/inject-data-application/distribute-credentials-secure/#provide-prod-test-creds).
27
+ - [Allow the kubelet to pull container images from private registries](https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/).
28
+
29
+ The Kubernetes control plane also uses Secrets; for example, [bootstrap token Secrets](#bootstrap-token-secrets) are a mechanism to help automate node registration.
30
+
31
+ ### Use case: dotfiles in a secret volume
32
+
33
+ You can make your data "hidden" by defining a key that begins with a dot. This key represents a dotfile or "hidden" file. For example, when the following Secret is mounted into a volume, `secret-volume`, the volume will contain a single file, called `.secret-file`, and the `dotfile-test-container` will have this file present at the path `/etc/secret-volume/.secret-file`.
34
+
35
+ > [!info] Note:
36
+ > Files beginning with dot characters are hidden from the output of `ls -l`; you must use `ls -la` to see them when listing directory contents.
37
+
38
+ ```yaml
39
+ apiVersion: v1
40
+ kind: Secret
41
+ metadata:
42
+ name: dotfile-secret
43
+ data:
44
+ .secret-file: dmFsdWUtMg0KDQo=
45
+ ---
46
+ apiVersion: v1
47
+ kind: Pod
48
+ metadata:
49
+ name: secret-dotfiles-pod
50
+ spec:
51
+ volumes:
52
+ - name: secret-volume
53
+ secret:
54
+ secretName: dotfile-secret
55
+ containers:
56
+ - name: dotfile-test-container
57
+ image: registry.k8s.io/busybox
58
+ command:
59
+ - ls
60
+ - "-l"
61
+ - "/etc/secret-volume"
62
+ volumeMounts:
63
+ - name: secret-volume
64
+ readOnly: true
65
+ mountPath: "/etc/secret-volume"
66
+ ```
67
+
68
+ ### Use case: Secret visible to one container in a Pod
69
+
70
+ Consider a program that needs to handle HTTP requests, do some complex business logic, and then sign some messages with an HMAC. Because it has complex application logic, there might be an unnoticed remote file reading exploit in the server, which could expose the private key to an attacker.
71
+
72
+ This could be divided into two processes in two containers: a frontend container which handles user interaction and business logic, but which cannot see the private key; and a signer container that can see the private key, and responds to simple signing requests from the frontend (for example, over localhost networking).
73
+
74
+ With this partitioned approach, an attacker now has to trick the application server into doing something rather arbitrary, which may be harder than getting it to read a file.
75
+
76
+ ### Alternatives to Secrets
77
+
78
+ Rather than using a Secret to protect confidential data, you can pick from alternatives.
79
+
80
+ Here are some of your options:
81
+
82
+ - If your cloud-native component needs to authenticate to another application that you know is running within the same Kubernetes cluster, you can use a [ServiceAccount](https://kubernetes.io/docs/reference/access-authn-authz/authentication/#service-account-tokens) and its tokens to identify your client.
83
+ - There are third-party tools that you can run, either within or outside your cluster, that manage sensitive data. For example, a service that Pods access over HTTPS, that reveals a Secret if the client correctly authenticates (for example, with a ServiceAccount token).
84
+ - For authentication, you can implement a custom signer for X.509 certificates, and use [CertificateSigningRequests](https://kubernetes.io/docs/reference/access-authn-authz/certificate-signing-requests/) to let that custom signer issue certificates to Pods that need them.
85
+ - You can use a [device plugin](https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/) to expose node-local encryption hardware to a specific Pod. For example, you can schedule trusted Pods onto nodes that provide a Trusted Platform Module, configured out-of-band.
86
+
87
+ You can also combine two or more of those options, including the option to use Secret objects themselves.
88
+
89
+ For example: implement (or deploy) an [operator](https://kubernetes.io/docs/concepts/extend-kubernetes/operator/ "A specialized controller used to manage a custom resource") that fetches short-lived session tokens from an external service, and then creates Secrets based on those short-lived session tokens. Pods running in your cluster can make use of the session tokens, and operator ensures they are valid. This separation means that you can run Pods that are unaware of the exact mechanisms for issuing and refreshing those session tokens.
90
+
91
+ ## Types of Secret
92
+
93
+ When creating a Secret, you can specify its type using the `type` field of the [Secret](https://kubernetes.io/docs/reference/kubernetes-api/config-and-storage-resources/secret-v1/) resource, or certain equivalent `kubectl` command line flags (if available). The Secret type is used to facilitate programmatic handling of the Secret data.
94
+
95
+ Kubernetes provides several built-in types for some common usage scenarios. These types vary in terms of the validations performed and the constraints Kubernetes imposes on them.
96
+
97
+ | Built-in Type | Usage |
98
+ | --- | --- |
99
+ | `Opaque` | arbitrary user-defined data |
100
+ | `kubernetes.io/service-account-token` | ServiceAccount token |
101
+ | `kubernetes.io/dockercfg` | serialized `~/.dockercfg` file |
102
+ | `kubernetes.io/dockerconfigjson` | serialized `~/.docker/config.json` file |
103
+ | `kubernetes.io/basic-auth` | credentials for basic authentication |
104
+ | `kubernetes.io/ssh-auth` | credentials for SSH authentication |
105
+ | `kubernetes.io/tls` | data for a TLS client or server |
106
+ | `bootstrap.kubernetes.io/token` | bootstrap token data |
107
+
108
+ You can define and use your own Secret type by assigning a non-empty string as the `type` value for a Secret object (an empty string is treated as an `Opaque` type).
109
+
110
+ Kubernetes doesn't impose any constraints on the type name. However, if you are using one of the built-in types, you must meet all the requirements defined for that type.
111
+
112
+ If you are defining a type of Secret that's for public use, follow the convention and structure the Secret type to have your domain name before the name, separated by a `/`. For example: `cloud-hosting.example.net/cloud-api-credentials`.
113
+
114
+ ### Opaque Secrets
115
+
116
+ `Opaque` is the default Secret type if you don't explicitly specify a type in a Secret manifest. When you create a Secret using `kubectl`, you must use the `generic` subcommand to indicate an `Opaque` Secret type. For example, the following command creates an empty Secret of type `Opaque`:
117
+
118
+ ```shell
119
+ kubectl create secret generic empty-secret
120
+ kubectl get secret empty-secret
121
+ ```
122
+
123
+ The output looks like:
124
+
125
+ ```
126
+ NAME TYPE DATA AGE
127
+ empty-secret Opaque 0 2m6s
128
+ ```
129
+
130
+ The `DATA` column shows the number of data items stored in the Secret. In this case, `0` means you have created an empty Secret.
131
+
132
+ ### ServiceAccount token Secrets
133
+
134
+ A `kubernetes.io/service-account-token` type of Secret is used to store a token credential that identifies a [ServiceAccount](https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/ "Provides an identity for processes that run in a Pod."). This is a legacy mechanism that provides long-lived ServiceAccount credentials to Pods.
135
+
136
+ In Kubernetes v1.22 and later, the recommended approach is to obtain a short-lived, automatically rotating ServiceAccount token by using the [`TokenRequest`](https://kubernetes.io/docs/reference/kubernetes-api/authentication-resources/token-request-v1/) API instead. You can get these short-lived tokens using the following methods:
137
+
138
+ - Call the `TokenRequest` API either directly or by using an API client like `kubectl`. For example, you can use the [`kubectl create token`](https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#-em-token-em-) command.
139
+ - Request a mounted token in a [projected volume](https://kubernetes.io/docs/reference/access-authn-authz/service-accounts-admin/#bound-service-account-token-volume) in your Pod manifest. Kubernetes creates the token and mounts it in the Pod. The token is automatically invalidated when the Pod that it's mounted in is deleted. For details, see [Launch a Pod using service account token projection](https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/#launch-a-pod-using-service-account-token-projection).
140
+
141
+ > [!info] Note:
142
+ > You should only create a ServiceAccount token Secret if you can't use the `TokenRequest` API to obtain a token, and the security exposure of persisting a non-expiring token credential in a readable API object is acceptable to you. For instructions, see [Manually create a long-lived API token for a ServiceAccount](https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/#manually-create-an-api-token-for-a-serviceaccount).
143
+
144
+ When using this Secret type, you need to ensure that the `kubernetes.io/service-account.name` annotation is set to an existing ServiceAccount name. If you are creating both the ServiceAccount and the Secret objects, you should create the ServiceAccount object first.
145
+
146
+ After the Secret is created, a Kubernetes [controller](https://kubernetes.io/docs/concepts/architecture/controller/ "A control loop that watches the shared state of the cluster through the apiserver and makes changes attempting to move the current state towards the desired state.") fills in some other fields such as the `kubernetes.io/service-account.uid` annotation, and the `token` key in the `data` field, which is populated with an authentication token.
147
+
148
+ The following example configuration declares a ServiceAccount token Secret:
149
+
150
+ ```yaml
151
+ apiVersion: v1
152
+ kind: Secret
153
+ metadata:
154
+ name: secret-sa-sample
155
+ annotations:
156
+ kubernetes.io/service-account.name: "sa-name"
157
+ type: kubernetes.io/service-account-token
158
+ data:
159
+ extra: YmFyCg==
160
+ ```
161
+
162
+ After creating the Secret, wait for Kubernetes to populate the `token` key in the `data` field.
163
+
164
+ See the [ServiceAccount](https://kubernetes.io/docs/concepts/security/service-accounts/) documentation for more information on how ServiceAccounts work. You can also check the `automountServiceAccountToken` field and the `serviceAccountName` field of the [`Pod`](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.35/#pod-v1-core) for information on referencing ServiceAccount credentials from within Pods.
165
+
166
+ ### Docker config Secrets
167
+
168
+ If you are creating a Secret to store credentials for accessing a container image registry, you must use one of the following `type` values for that Secret:
169
+
170
+ - `kubernetes.io/dockercfg`: store a serialized `~/.dockercfg` which is the legacy format for configuring Docker command line. The Secret `data` field contains a `.dockercfg` key whose value is the content of a base64 encoded `~/.dockercfg` file.
171
+ - `kubernetes.io/dockerconfigjson`: store a serialized JSON that follows the same format rules as the `~/.docker/config.json` file, which is a new format for `~/.dockercfg`. The Secret `data` field must contain a `.dockerconfigjson` key for which the value is the content of a base64 encoded `~/.docker/config.json` file.
172
+
173
+ Below is an example for a `kubernetes.io/dockercfg` type of Secret:
174
+
175
+ ```yaml
176
+ apiVersion: v1
177
+ kind: Secret
178
+ metadata:
179
+ name: secret-dockercfg
180
+ type: kubernetes.io/dockercfg
181
+ data:
182
+ .dockercfg: |
183
+ eyJhdXRocyI6eyJodHRwczovL2V4YW1wbGUvdjEvIjp7ImF1dGgiOiJvcGVuc2VzYW1lIn19fQo=
184
+ ```
185
+
186
+ > [!info] Note:
187
+ > If you do not want to perform the base64 encoding, you can choose to use the `stringData` field instead.
188
+
189
+ When you create Docker config Secrets using a manifest, the API server checks whether the expected key exists in the `data` field, and it verifies if the value provided can be parsed as a valid JSON. The API server doesn't validate if the JSON actually is a Docker config file.
190
+
191
+ You can also use `kubectl` to create a Secret for accessing a container registry, such as when you don't have a Docker configuration file:
192
+
193
+ ```shell
194
+ kubectl create secret docker-registry secret-tiger-docker \
195
+ --docker-email=tiger@acme.example \
196
+ --docker-username=tiger \
197
+ --docker-password=pass1234 \
198
+ --docker-server=my-registry.example:5000
199
+ ```
200
+
201
+ This command creates a Secret of type `kubernetes.io/dockerconfigjson`.
202
+
203
+ Retrieve the `.data.dockerconfigjson` field from that new Secret and decode the data:
204
+
205
+ ```shell
206
+ kubectl get secret secret-tiger-docker -o jsonpath='{.data.*}' | base64 -d
207
+ ```
208
+
209
+ The output is equivalent to the following JSON document (which is also a valid Docker configuration file):
210
+
211
+ ```json
212
+ {
213
+ "auths": {
214
+ "my-registry.example:5000": {
215
+ "username": "tiger",
216
+ "password": "pass1234",
217
+ "email": "tiger@acme.example",
218
+ "auth": "dGlnZXI6cGFzczEyMzQ="
219
+ }
220
+ }
221
+ }
222
+ ```
223
+
224
+ > [!caution] Caution:
225
+ > The `auth` value there is base64 encoded; it is obscured but not secret. Anyone who can read that Secret can learn the registry access bearer token.
226
+ >
227
+ > It is suggested to use [credential providers](https://kubernetes.io/docs/tasks/administer-cluster/kubelet-credential-provider/) to dynamically and securely provide pull secrets on-demand.
228
+
229
+ ### Basic authentication Secret
230
+
231
+ The `kubernetes.io/basic-auth` type is provided for storing credentials needed for basic authentication. When using this Secret type, the `data` field of the Secret must contain one of the following two keys:
232
+
233
+ - `username`: the user name for authentication
234
+ - `password`: the password or token for authentication
235
+
236
+ Both values for the above two keys are base64 encoded strings. You can alternatively provide the clear text content using the `stringData` field in the Secret manifest.
237
+
238
+ The following manifest is an example of a basic authentication Secret:
239
+
240
+ ```yaml
241
+ apiVersion: v1
242
+ kind: Secret
243
+ metadata:
244
+ name: secret-basic-auth
245
+ type: kubernetes.io/basic-auth
246
+ stringData:
247
+ username: admin # required field for kubernetes.io/basic-auth
248
+ password: t0p-Secret # required field for kubernetes.io/basic-auth
249
+ ```
250
+
251
+ > [!info] Note:
252
+ > The `stringData` field for a Secret does not work well with server-side apply.
253
+
254
+ The basic authentication Secret type is provided only for convenience. You can create an `Opaque` type for credentials used for basic authentication. However, using the defined and public Secret type (`kubernetes.io/basic-auth`) helps other people to understand the purpose of your Secret, and sets a convention for what key names to expect.
255
+
256
+ ### SSH authentication Secrets
257
+
258
+ The builtin type `kubernetes.io/ssh-auth` is provided for storing data used in SSH authentication. When using this Secret type, you will have to specify a `ssh-privatekey` key-value pair in the `data` (or `stringData`) field as the SSH credential to use.
259
+
260
+ The following manifest is an example of a Secret used for SSH public/private key authentication:
261
+
262
+ ```yaml
263
+ apiVersion: v1
264
+ kind: Secret
265
+ metadata:
266
+ name: secret-ssh-auth
267
+ type: kubernetes.io/ssh-auth
268
+ data:
269
+ # the data is abbreviated in this example
270
+ ssh-privatekey: |
271
+ UG91cmluZzYlRW1vdGljb24lU2N1YmE=
272
+ ```
273
+
274
+ The SSH authentication Secret type is provided only for convenience. You can create an `Opaque` type for credentials used for SSH authentication. However, using the defined and public Secret type (`kubernetes.io/ssh-auth`) helps other people to understand the purpose of your Secret, and sets a convention for what key names to expect. The Kubernetes API verifies that the required keys are set for a Secret of this type.
275
+
276
+ > [!caution] Caution:
277
+ > SSH private keys do not establish trusted communication between an SSH client and host server on their own. A secondary means of establishing trust is needed to mitigate "man in the middle" attacks, such as a `known_hosts` file added to a ConfigMap.
278
+
279
+ ### TLS Secrets
280
+
281
+ The `kubernetes.io/tls` Secret type is for storing a certificate and its associated key that are typically used for TLS.
282
+
283
+ One common use for TLS Secrets is to configure encryption in transit for an [Ingress](https://kubernetes.io/docs/concepts/services-networking/ingress/), but you can also use it with other resources or directly in your workload. When using this type of Secret, the `tls.key` and the `tls.crt` key must be provided in the `data` (or `stringData`) field of the Secret configuration, although the API server doesn't actually validate the values for each key.
284
+
285
+ As an alternative to using `stringData`, you can use the `data` field to provide the base64 encoded certificate and private key. For details, see [Constraints on Secret names and data](#restriction-names-data).
286
+
287
+ The following YAML contains an example config for a TLS Secret:
288
+
289
+ ```yaml
290
+ apiVersion: v1
291
+ kind: Secret
292
+ metadata:
293
+ name: secret-tls
294
+ type: kubernetes.io/tls
295
+ data:
296
+ # values are base64 encoded, which obscures them but does NOT provide
297
+ # any useful level of confidentiality
298
+ # Replace the following values with your own base64-encoded certificate and key.
299
+ tls.crt: "REPLACE_WITH_BASE64_CERT"
300
+ tls.key: "REPLACE_WITH_BASE64_KEY"
301
+ ```
302
+
303
+ The TLS Secret type is provided only for convenience. You can create an `Opaque` type for credentials used for TLS authentication. However, using the defined and public Secret type (`kubernetes.io/tls`) helps ensure the consistency of Secret format in your project. The API server verifies if the required keys are set for a Secret of this type.
304
+
305
+ To create a TLS Secret using `kubectl`, use the `tls` subcommand:
306
+
307
+ ```shell
308
+ kubectl create secret tls my-tls-secret \
309
+ --cert=path/to/cert/file \
310
+ --key=path/to/key/file
311
+ ```
312
+
313
+ The public/private key pair must exist before hand. The public key certificate for `--cert` must be.PEM encoded and must match the given private key for `--key`.
314
+
315
+ ### Bootstrap token Secrets
316
+
317
+ The `bootstrap.kubernetes.io/token` Secret type is for tokens used during the node bootstrap process. It stores tokens used to sign well-known ConfigMaps.
318
+
319
+ A bootstrap token Secret is usually created in the `kube-system` namespace and named in the form `bootstrap-token-<token-id>` where `<token-id>` is a 6 character string of the token ID.
320
+
321
+ As a Kubernetes manifest, a bootstrap token Secret might look like the following:
322
+
323
+ ```yaml
324
+ apiVersion: v1
325
+ kind: Secret
326
+ metadata:
327
+ name: bootstrap-token-5emitj
328
+ namespace: kube-system
329
+ type: bootstrap.kubernetes.io/token
330
+ data:
331
+ auth-extra-groups: c3lzdGVtOmJvb3RzdHJhcHBlcnM6a3ViZWFkbTpkZWZhdWx0LW5vZGUtdG9rZW4=
332
+ expiration: MjAyMC0wOS0xM1QwNDozOToxMFo=
333
+ token-id: NWVtaXRq
334
+ token-secret: a3E0Z2lodnN6emduMXAwcg==
335
+ usage-bootstrap-authentication: dHJ1ZQ==
336
+ usage-bootstrap-signing: dHJ1ZQ==
337
+ ```
338
+
339
+ A bootstrap token Secret has the following keys specified under `data`:
340
+
341
+ - `token-id`: A random 6 character string as the token identifier. Required.
342
+ - `token-secret`: A random 16 character string as the actual token Secret. Required.
343
+ - `description`: A human-readable string that describes what the token is used for. Optional.
344
+ - `expiration`: An absolute UTC time using [RFC3339](https://datatracker.ietf.org/doc/html/rfc3339) specifying when the token should be expired. Optional.
345
+ - `usage-bootstrap-<usage>`: A boolean flag indicating additional usage for the bootstrap token.
346
+ - `auth-extra-groups`: A comma-separated list of group names that will be authenticated as in addition to the `system:bootstrappers` group.
347
+
348
+ You can alternatively provide the values in the `stringData` field of the Secret without base64 encoding them:
349
+
350
+ ```yaml
351
+ apiVersion: v1
352
+ kind: Secret
353
+ metadata:
354
+ # Note how the Secret is named
355
+ name: bootstrap-token-5emitj
356
+ # A bootstrap token Secret usually resides in the kube-system namespace
357
+ namespace: kube-system
358
+ type: bootstrap.kubernetes.io/token
359
+ stringData:
360
+ auth-extra-groups: "system:bootstrappers:kubeadm:default-node-token"
361
+ expiration: "2020-09-13T04:39:10Z"
362
+ # This token ID is used in the name
363
+ token-id: "5emitj"
364
+ token-secret: "kq4gihvszzgn1p0r"
365
+ # This token can be used for authentication
366
+ usage-bootstrap-authentication: "true"
367
+ # and it can be used for signing
368
+ usage-bootstrap-signing: "true"
369
+ ```
370
+
371
+ > [!info] Note:
372
+ > The `stringData` field for a Secret does not work well with server-side apply.
373
+
374
+ ## Working with Secrets
375
+
376
+ ### Creating a Secret
377
+
378
+ There are several options to create a Secret:
379
+
380
+ - [Use `kubectl`](https://kubernetes.io/docs/tasks/configmap-secret/managing-secret-using-kubectl/)
381
+ - [Use a configuration file](https://kubernetes.io/docs/tasks/configmap-secret/managing-secret-using-config-file/)
382
+ - [Use the Kustomize tool](https://kubernetes.io/docs/tasks/configmap-secret/managing-secret-using-kustomize/)
383
+
384
+ #### Constraints on Secret names and data
385
+
386
+ The name of a Secret object must be a valid [DNS subdomain name](https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#dns-subdomain-names).
387
+
388
+ You can specify the `data` and/or the `stringData` field when creating a configuration file for a Secret. The `data` and the `stringData` fields are optional. The values for all keys in the `data` field have to be base64-encoded strings. If the conversion to base64 string is not desirable, you can choose to specify the `stringData` field instead, which accepts arbitrary strings as values.
389
+
390
+ The keys of `data` and `stringData` must consist of alphanumeric characters, `-`, `_` or `.`. All key-value pairs in the `stringData` field are internally merged into the `data` field. If a key appears in both the `data` and the `stringData` field, the value specified in the `stringData` field takes precedence.
391
+
392
+ #### Size limit
393
+
394
+ Individual Secrets are limited to 1MiB in size. This is to discourage creation of very large Secrets that could exhaust the API server and kubelet memory. However, creation of many smaller Secrets could also exhaust memory. You can use a [resource quota](https://kubernetes.io/docs/concepts/policy/resource-quotas/) to limit the number of Secrets (or other resources) in a namespace.
395
+
396
+ ### Editing a Secret
397
+
398
+ You can edit an existing Secret unless it is [immutable](#secret-immutable). To edit a Secret, use one of the following methods:
399
+
400
+ - [Use `kubectl`](https://kubernetes.io/docs/tasks/configmap-secret/managing-secret-using-kubectl/#edit-secret)
401
+ - [Use a configuration file](https://kubernetes.io/docs/tasks/configmap-secret/managing-secret-using-config-file/#edit-secret)
402
+
403
+ You can also edit the data in a Secret using the [Kustomize tool](https://kubernetes.io/docs/tasks/configmap-secret/managing-secret-using-kustomize/#edit-secret). However, this method creates a new `Secret` object with the edited data.
404
+
405
+ Depending on how you created the Secret, as well as how the Secret is used in your Pods, updates to existing `Secret` objects are propagated automatically to Pods that use the data. For more information, refer to [Using Secrets as files from a Pod](#using-secrets-as-files-from-a-pod) section.
406
+
407
+ ### Using a Secret
408
+
409
+ Secrets can be mounted as data volumes or exposed as [environment variables](https://kubernetes.io/docs/concepts/containers/container-environment/ "Container environment variables are name=value pairs that provide useful information into containers running in a Pod.") to be used by a container in a Pod. Secrets can also be used by other parts of the system, without being directly exposed to the Pod. For example, Secrets can hold credentials that other parts of the system should use to interact with external systems on your behalf.
410
+
411
+ Secret volume sources are validated to ensure that the specified object reference actually points to an object of type Secret. Therefore, a Secret needs to be created before any Pods that depend on it.
412
+
413
+ If the Secret cannot be fetched (perhaps because it does not exist, or due to a temporary lack of connection to the API server) the kubelet periodically retries running that Pod. The kubelet also reports an Event for that Pod, including details of the problem fetching the Secret.
414
+
415
+ #### Optional Secrets
416
+
417
+ When you reference a Secret in a Pod, you can mark the Secret as *optional*, such as in the following example. If an optional Secret doesn't exist, Kubernetes ignores it.
418
+
419
+ ```yaml
420
+ apiVersion: v1
421
+ kind: Pod
422
+ metadata:
423
+ name: mypod
424
+ spec:
425
+ containers:
426
+ - name: mypod
427
+ image: redis
428
+ volumeMounts:
429
+ - name: foo
430
+ mountPath: "/etc/foo"
431
+ readOnly: true
432
+ volumes:
433
+ - name: foo
434
+ secret:
435
+ secretName: mysecret
436
+ optional: true
437
+ ```
438
+
439
+ By default, Secrets are required. None of a Pod's containers will start until all non-optional Secrets are available.
440
+
441
+ If a Pod references a specific key in a non-optional Secret and that Secret does exist, but is missing the named key, the Pod fails during startup.
442
+
443
+ ### Using Secrets as files from a Pod
444
+
445
+ If you want to access data from a Secret in a Pod, one way to do that is to have Kubernetes make the value of that Secret be available as a file inside the filesystem of one or more of the Pod's containers.
446
+
447
+ For instructions, refer to [Create a Pod that has access to the secret data through a Volume](https://kubernetes.io/docs/tasks/inject-data-application/distribute-credentials-secure/#create-a-pod-that-has-access-to-the-secret-data-through-a-volume).
448
+
449
+ When a volume contains data from a Secret, and that Secret is updated, Kubernetes tracks this and updates the data in the volume, using an eventually-consistent approach.
450
+
451
+ > [!info] Note:
452
+ > A container using a Secret as a [subPath](https://kubernetes.io/docs/concepts/storage/volumes/#using-subpath) volume mount does not receive automated Secret updates.
453
+
454
+ The kubelet keeps a cache of the current keys and values for the Secrets that are used in volumes for pods on that node. You can configure the way that the kubelet detects changes from the cached values. The `configMapAndSecretChangeDetectionStrategy` field in the [kubelet configuration](https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/) controls which strategy the kubelet uses. The default strategy is `Watch`.
455
+
456
+ Updates to Secrets can be either propagated by an API watch mechanism (the default), based on a cache with a defined time-to-live, or polled from the cluster API server on each kubelet synchronisation loop.
457
+
458
+ As a result, the total delay from the moment when the Secret is updated to the moment when new keys are projected to the Pod can be as long as the kubelet sync period + cache propagation delay, where the cache propagation delay depends on the chosen cache type (following the same order listed in the previous paragraph, these are: watch propagation delay, the configured cache TTL, or zero for direct polling).
459
+
460
+ ### Using Secrets as environment variables
461
+
462
+ To use a Secret in an [environment variable](https://kubernetes.io/docs/concepts/containers/container-environment/ "Container environment variables are name=value pairs that provide useful information into containers running in a Pod.") in a Pod:
463
+
464
+ 1. For each container in your Pod specification, add an environment variable for each Secret key that you want to use to the `env[].valueFrom.secretKeyRef` field.
465
+ 2. Modify your image and/or command line so that the program looks for values in the specified environment variables.
466
+
467
+ For instructions, refer to [Define container environment variables using Secret data](https://kubernetes.io/docs/tasks/inject-data-application/distribute-credentials-secure/#define-container-environment-variables-using-secret-data).
468
+
469
+ It's important to note that the range of characters allowed for environment variable names in pods is [restricted](https://kubernetes.io/docs/tasks/inject-data-application/define-environment-variable-container/#using-environment-variables-inside-of-your-config). If any keys do not meet the rules, those keys are not made available to your container, though the Pod is allowed to start.
470
+
471
+ ### Container image pull Secrets
472
+
473
+ If you want to fetch container images from a private repository, you need a way for the kubelet on each node to authenticate to that repository. You can configure *image pull Secrets* to make this possible. These Secrets are configured at the Pod level.
474
+
475
+ #### Using imagePullSecrets
476
+
477
+ The `imagePullSecrets` field is a list of references to Secrets in the same namespace. You can use an `imagePullSecrets` to pass a Secret that contains a Docker (or other) image registry password to the kubelet. The kubelet uses this information to pull a private image on behalf of your Pod. See the [PodSpec API](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.35/#podspec-v1-core) for more information about the `imagePullSecrets` field.
478
+
479
+ ##### Manually specifying an imagePullSecret
480
+
481
+ You can learn how to specify `imagePullSecrets` from the [container images](https://kubernetes.io/docs/concepts/containers/images/#specifying-imagepullsecrets-on-a-pod) documentation.
482
+
483
+ ##### Arranging for imagePullSecrets to be automatically attached
484
+
485
+ You can manually create `imagePullSecrets`, and reference these from a ServiceAccount. Any Pods created with that ServiceAccount or created with that ServiceAccount by default, will get their `imagePullSecrets` field set to that of the service account. See [Add ImagePullSecrets to a service account](https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/#add-imagepullsecrets-to-a-service-account) for a detailed explanation of that process.
486
+
487
+ ### Using Secrets with static Pods
488
+
489
+ You cannot use ConfigMaps or Secrets with [static Pods](https://kubernetes.io/docs/tasks/configure-pod-container/static-pod/ "A pod managed directly by the kubelet daemon on a specific node.").
490
+
491
+ ## Immutable Secrets
492
+
493
+ FEATURE STATE: `Kubernetes v1.21 [stable]`
494
+
495
+ Kubernetes lets you mark specific Secrets (and ConfigMaps) as *immutable*. Preventing changes to the data of an existing Secret has the following benefits:
496
+
497
+ - protects you from accidental (or unwanted) updates that could cause applications outages
498
+ - (for clusters that extensively use Secrets - at least tens of thousands of unique Secret to Pod mounts), switching to immutable Secrets improves the performance of your cluster by significantly reducing load on kube-apiserver. The kubelet does not need to maintain a \[watch\] on any Secrets that are marked as immutable.
499
+
500
+ ### Marking a Secret as immutable
501
+
502
+ You can create an immutable Secret by setting the `immutable` field to `true`. For example,
503
+
504
+ ```yaml
505
+ apiVersion: v1
506
+ kind: Secret
507
+ metadata: ...
508
+ data: ...
509
+ immutable: true
510
+ ```
511
+
512
+ You can also update any existing mutable Secret to make it immutable.
513
+
514
+ > [!info] Note:
515
+ > Once a Secret or ConfigMap is marked as immutable, it is *not* possible to revert this change nor to mutate the contents of the `data` field. You can only delete and recreate the Secret. Existing Pods maintain a mount point to the deleted Secret - it is recommended to recreate these pods.
516
+
517
+ ## Information security for Secrets
518
+
519
+ Although ConfigMap and Secret work similarly, Kubernetes applies some additional protection for Secret objects.
520
+
521
+ Secrets often hold values that span a spectrum of importance, many of which can cause escalations within Kubernetes (e.g. service account tokens) and to external systems. Even if an individual app can reason about the power of the Secrets it expects to interact with, other apps within the same namespace can render those assumptions invalid.
522
+
523
+ Authorization configuration affects how Secret data can be accessed within a namespace. For example, granting **list** or **watch** permissions on Secrets allows a subject to read all Secret data in that namespace, not only the Secrets explicitly referenced by its Pods. Restrict access to the minimum set of permissions required for a workload to function, and avoid granting broad roles such as `cluster-admin` unless required for administrative purposes.
524
+
525
+ Also see the [Authorization documentation](https://kubernetes.io/docs/reference/access-authn-authz/rbac/).
526
+
527
+ A Secret is only sent to a node if a Pod on that node requires it. For mounting Secrets into Pods, the kubelet stores a copy of the data into a `tmpfs` so that the confidential data is not written to durable storage. Once the Pod that depends on the Secret is deleted, the kubelet deletes its local copy of the confidential data from the Secret.
528
+
529
+ There may be several containers in a Pod. By default, containers you define only have access to the default ServiceAccount and its related Secret. You must explicitly define environment variables or map a volume into a container in order to provide access to any other Secret.
530
+
531
+ There may be Secrets for several Pods on the same node. However, only the Secrets that a Pod requests are potentially visible within its containers. Therefore, one Pod does not have access to the Secrets of another Pod.
532
+
533
+ ### Configure least-privilege access to Secrets
534
+
535
+ To enhance the security measures around Secrets, use separate namespaces to isolate access to mounted secrets.
536
+
537
+ > [!danger] Warning:
538
+ > Any containers that run with `privileged: true` on a node can access all Secrets used on that node.
539
+
540
+ ## What's next
541
+
542
+ - For guidelines to manage and improve the security of your Secrets, refer to [Good practices for Kubernetes Secrets](https://kubernetes.io/docs/concepts/security/secrets-good-practices/).
543
+ - Learn how to [manage Secrets using `kubectl`](https://kubernetes.io/docs/tasks/configmap-secret/managing-secret-using-kubectl/)
544
+ - Learn how to [manage Secrets using config file](https://kubernetes.io/docs/tasks/configmap-secret/managing-secret-using-config-file/)
545
+ - Learn how to [manage Secrets using kustomize](https://kubernetes.io/docs/tasks/configmap-secret/managing-secret-using-kustomize/)
546
+ - Read the [API reference](https://kubernetes.io/docs/reference/kubernetes-api/config-and-storage-resources/secret-v1/) for `Secret`
547
+
548
+
549
+ Last modified March 17, 2026 at 1:33 AM PST: [Improve security clarification for Kubernetes Secrets (#54644) (8af7916eb8)](https://github.com/kubernetes/website/commit/8af7916eb81024c5da7a9b4c4477db18e5fffda2)