The magic of mutating K8s webhook controllers and how they can lead to unexpected behaviours

Shahe Islam
2 min readJun 20, 2021

A subject of recent interest has been implementing Open Policy Agent (OPA) Rego policies within Kubernetes clusters using the Gatekeeper admission controller, specifically for setting resource limits & requests. However, what was thought to be a Rego rule issue turned out to be a modification by webhook admission controllers at the validation level. A better understanding of this mechanism allowed the elimination of redundant code within policies.

Gatekeeper Policies

Building out a global policy for a Kubernetes cluster is a relatively simple process. Rules are defined within a CRD ConstraintTemplate, implementing the specific logic through Rego. This rule applies to specific API groups & kinds through a template i.e., a globally applied for Pods but not Deployments or vice versa. Once these resources are deployed, when a new resource creation is defined and specified within the Constraint, Gatekeeper, which acts as a validation controller, will either accept or deny the creation of this resource.

Example Constraint Template for undefined resources violation.
Example Constraint

The Kubernetes API server Pipeline

For those new to the Kubernetes validation & authentication pipeline, developing an understanding of what occurs can be very useful. For a better understanding, please review the following;

https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/

Mutating admission controllers

Where the issue with resources & limits comes in, when a resource is applied and validated, at some point, the mutating admission webhook controller modified the object to set the requests to the value of the limit. Due to the way the Kubernetes API server pipeline functions, this occurs before the resource reaches the validation webhooks Gatekeeper.

Validation of resources occurs after mutation; hence the original resource submission is never seen by OPA.

Gatekeeper sees this newly modified resource and recognises this as an acceptable limit. In this instance, going to the effort of implementing Rego logic to set violation rules in the scenario of a missing request value is pointless unless one were to disable the mutating webhook that implements the request’s values (beyond the scope of this article).

Tip: When specific rules have not caused violations despite explicitly being defined within a resource, describe the resource within Kubernetes after the creation. It may be the case some internal admission controller magic is the cause, and it is not an engineering oversight.

To learn more about OPA & Rego policies, please check out; https://www.openpolicyagent.org/

--

--