Kubernetes
Gremlin allows targeting objects within your Kubernetes clusters. After selecting a cluster, you can filter the visible set of objects by selecting a namespace. Select any of your Deployments
, ReplicaSets
, StatefulSets
, DaemonSets
, or Pods
. When one object is selected, all child objects will also be targeted. For example, when selecting a DaemonSet, all of the pods within will be selected.
Only parent Kubernetes objects are available to target. Pods will be listed only if they don't belong to a Set or Deployment.
Installation
The simplest way to install Gremlin on Kubernetes is with Helm. Check out Gremlin's Helm Chart Repository for full documentation and usage.
1helm repo add gremlin https://helm.gremlin.com/2kubectl create namespace gremlin3helm install gremlin gremlin/gremlin --namespace gremlin \4 --set gremlin.hostPID=true \5 --set gremlin.container.driver=docker-runc \6 --set gremlin.secret.managed=true \7 --set gremlin.secret.type=secret \8 --set gremlin.secret.teamID=$GREMLIN_TEAM_ID \9 --set gremlin.secret.clusterID=$GREMLIN_CLUSTER_ID \10 --set gremlin.secret.teamSecret=$GREMLIN_TEAM_SECRET
Some environments require more configuration, check out the resources below to help you find the best configuration for your environment.
Cri-O and Containerd
As of version 2.16.0, you can now install Gremlin on Kubernetes running Cri-O or Containerd. Follow this guide to get started.
OpenShift
As of version 2.16.0, you can now install Gremlin on OpenShift 3 and OpenShift 4 running Cri-O or Containerd.
Install Manually
If the above sections are not what you're looking for, follow this guide to install Gremlin manually, from nothing but YAML files and a text editor.
Other Considerations
Enabling Gremlin on the Kubernetes Master
Most Kubernetes deployments configure master nodes with the node-role.kubernetes.io/master:NoSchedule
taint. You can run the following command to see if any of your nodes have this taint:
1kubectl get no -o=custom-columns=NAME:.metadata.name,TAINTS:.spec.taints
1NAME TAINTS2kube-01 [map[effect:NoSchedule key:node-role.kubernetes.io/master]]3kube-02 <none>
If you wish to install Gremlin on a Kubernetes master that has been tainted, add a tolerations section to the PodSpec of the Gremlin Client Manifest.
1tolerations:2 - key: node-role.kubernetes.io/master3 operator: Exists4 effect: NoSchedule
You will need to reapply the Gremlin client manifest after making this change.
Proxy Configuration
Both Gremlin and Chao can be configured to use a proxy for outgoing HTTP traffic. The conventional https_proxy
and no_proxy
variables can be passed as environment variables for this purpose.
Proxy Certificate Authorities
When proxies support HTTPS communication, or are otherwise configured with a TLS certificate, it can be necessary to configure Gremlin to trust the proxy's certificate authority. This is done by passing the SSL_CERT_FILE
environment variable where the value is a path on the file system to a PEM encoded file containing the certificate authority's certificate.
Confguring Gremlin
1- name: https_proxy2 value: http://proxy.local:31283# Pass SSL_CERT_FILE when the proxy requires trusting a TLS certificate4- name: SSL_CERT_FILE5 value: /etc/gremlin/ssl/proxy-ca.pem
Configuring Chao
Because the Gremlin Kubernetes Client (Chao) communicates with the local Kubernetes ApiServer in addition to the internet, it's important to bypass internet proxies for traffic bound to apiserver
1- name: https_proxy2 value: http://proxy.local:31283- name: no_proxy4 value: $(KUBERNETES_SERVICE_HOST):$(KUBERNETES_SERVICE_PORT)5# Pass SSL_CERT_FILE when the proxy requires trusting a TLS certificate6- name: SSL_CERT_FILE7 value: /etc/gremlin/ssl/proxy-ca.pem8# Pass SSL_CERT_DIR when SSL_CERT_FILE contains only the proxy certificate. This will ensure Chao trusts api.gremlin.com9# The value of SSL_CERT_DIR varies depending on the operating system on which the cluster hosts run10# See https://www.gremlin.com/docs/infrastructure-layer/kubernetes/#ssl_cert_dir11- name: SSL_CERT_DIR12 value: /etc/ssl/
SSL_CERT_DIR
Supplying SSL_CERT_DIR
ensures Chao is still configured with the necessary certificate authories to trust api.gremlin.com
. However it is not needed for most Gremlin installations because Chao will trust Gremlin servers by default. This variable is only required for Chao deployments when both of the following conditions are true:
- Chao is configured with
https_proxy
and this proxy is configured to accept TLS connections - Chao is also configured with
SSL_CERT_FILE
, and the file it points to contains only the certificate authority for the https proxy
The value of SSL_CERT_DIR
should point to the root of the certificate authority directory for the operating system on which Chao runs.
Path | OS |
---|---|
/etc/ssl/certs/ | Debian/Ubuntu |
/etc/pki/tls/ | Fedora/RHEL 6/OpenELEC |
/etc/ssl/ | OpenSUSE / Alpine Linux |
/etc/pki/ca-trust/extracted/pem/ | CentOS/RHEL 7 |
Using a PodSecurityPolicy
Gremlin does not support running within the restricted
PodSecurityPolicy (PSP) that is configured by default on clusters
that enable such policies. You can install a gremlin
PodSecurityPolicy to grant chao
and gremlin
everything they
need, and nothing they don't need.
When installing Gremlin with Helm, you can supply --set gremlin.podSecurity.podSecurityPolicy.create=true
to install
Gremlin's custom pod security policies. Check out Gremlin's Helm Chart Repository for full documentation and
usage.
Without Helm, you can download Gremlin's PSP files and install them with kubectl
1mkdir gremlin-psp2wget -P gremlin-psp/ https://k8s.gremlin.com/resources/psp/v1/chao-psp.yaml3wget -P gremlin-psp/ https://k8s.gremlin.com/resources/psp/v1/gremlin-psp.yaml4kubectl create -f gremlin-psp/
Using a Custom Seccomp Policy
All Gremlin behavior works under Docker's default seccomp policy. However some environments use seccomp profiles that are more restrictive, and prevent Gremlin behavior when using their default seccomp profile.
Gremlin has a custom seccomp profile which can be automatically installed when you install with Helm and pass
--set gremlin.podSecurity.seccomp.enabled=true
. Check out Gremlin's Helm Chart Repository for full
documentation and usage.
You can also download this seccomp policy in order to install it manually.
1mkdir gremlin-psp2wget -P gremlin-psp/ https://k8s.gremlin.com/resources/psp/v1/chao-psp.yaml3wget -P gremlin-psp/ https://k8s.gremlin.com/resources/psp/v1/gremlin-psp.yaml4kubectl create -f gremlin-psp/
Gremlin Container Drivers
Gremlin currently has 4 different drivers for integrating with the underlying container runtime powering Kubernetes:
Driver | Requirements and file access | More info |
---|---|---|
docker |
| No support for systemd cgroup driver |
docker-runc |
| Recommended for the Docker runtime |
crio-runc |
| Used with the Cri-O container runtime |
containerd-runc |
| Used with the Cri-O container runtime |
Gremlin automatically chooses any of the above cgroup drivers when the associated requirements are met. Users installing with Helm can automatically provide all requirements by declaring the intended container driver with
1--set gremlin.container.driver=$driver
Verify your Installation
Last you need to check that Gremlin is installed properly
1kubectl get pods -n gremlin
This should list a Gremlin agent per node (physical/virtual machine in your cluster) plus one for chao
Example
1kubectl get pods -n gremlin23NAME READY STATUS RESTARTS AGE4chao-78bbc7cbf6-9hn7q 1/1 Running 0 5d20h5gremlin-9r4t7 1/1 Running 0 5d20h6gremlin-bwmtz 1/1 Running 1 126d7gremlin-bx6dn 1/1 Running 0 5d20h
Pending Pods
If any pods are pending this means your installation is incomplete and you should contact your cluster administrator to debug why you are unable to run gremlin on those nodes
1kubectl get pods -n gremlin23NAME READY STATUS RESTARTS AGE4chao-78bbc7cbf6-9hn7q 1/1 Running 0 5d20h5gremlin-c25ld 0/1 Pending 0 112d6gremlin-n5gt7 0/1 Pending 0 112d7gremlin-zn4kq 1/1 Running 0 126d
Any applications running on nodes Gremlin is not running on cannot successfully execute attacks and those attacks will error out. If Chao is not running you will be unable to target the cluster primitives at all.
Selecting Containers
For state and resource attack types, you can choose to target one, all, or specific containers within a selected pod. Once targets have been selected, all state and resource attack types will present this configuration. Selecting 'any' will target a single container within each pod at runtime. If you've selected more than one target (eg. Deployment), you can select from a list of common containers across all of these targets.
Running an attack
Once you select the Kubernetes objects to be targeted, select and configure your desired Gremlin attack. When the attack is run, the underlying containers within the objects selected will be impacted.
Containers share resources with their hosts. Running resource attacks on Kubernetes objects will impact the hosts where the targeted containers are running, including the host's full set of containers.
Namespace Access Control
With the Kubernetes client installed on your cluster you can share individual namespaces with other Gremlin teams. Once installed head to the Clients section to view all of the clusters installed across your company.
By sharing individual namespaces to teams across your company, you can provide access for users to run attacks only on relevant services while also limiting access to the hosts or nodes themselves.
An updated Kubernetes Client is required, no earlier than July 6th, 2020, to manage access control on your cluster.
Managing Cluster Access
As the Team Manager
on a team where a Kubernetes cluster is installed or as a Company Manager
, you can click the gear icon to manage access. On the cluster view, to share a namespace
with a team use the search box to filter down the list of available teams
. Then use the search box on the team row and click on the namespace
you'd like to share. Use the options menu to share all of the namespaces.
To remove access of a namespace to a team, click on the x
on the blue namesapce bubbles. Using the options menu you can also remove all namespaces at once.
Requesting Namespace Access
As a member of a different team of your company, you can view the list of clusters installed across your company. To request access to a namespace on one of these clusters not installed on your team, click the Request Access
button. You can then check off the namespaces you'd like access to, or you can use the select all switch.
You can also request access to a namespace within a cluster when creating an attack. Once you've selected a cluster, the drop down list of namespaces will have an option to request access.
Approving Access Requests
As a Team Manager
where a cluster is installed on your team, you'll recieve an email when a user in your company has requested access to a namespace. Open the view of the cluster where you can approve or deny the request.
Troubleshooting
Run Chao in Debug Mode
Chao supports the GODEBUG
environment variable, which can be used to enable debug features such as verbose logging of HTTP activity. You can enable verbose http logs by adding the following variable to the environment
section of the Chao deployment.
NOTE: Verbose logging prints sensitive information like HTTP request and response bodies. This configuration is intended to be a troubleshooting measure only, and should be removed when unused.
1- name: GODEBUG2 value: http2debug=2
Chao's logs will now contain verbose logs for http requests.
Run Gremlin Checks
Gremlin's check
subcommand can be run on Kubernetes clusters in order to troubleshoot common issues with configuration or compatibility with the environment. The following is an example Job that can be run to get gremlin check
output
1apiVersion: batch/v12kind: Job3metadata:4 name: gremlin-check5 namespace: gremlin6 labels:7 k8s-app: gremlin8 version: v19spec:10 template:11 metadata:12 labels:13 app.kubernetes.io/name: gremlin-check14 spec:15 restartPolicy: Never16 containers:17 - name: gremlin18 image: gremlin/gremlin19 # You can also pass subcommands (like `proxy` to check only proxy information)20 args: [ "check" ]21 env:22 # # Pass the same environment you would pass to the Gremlin DaemonSet, including secrets, and proxy information23 - name: GREMLIN_TEAM_PRIVATE_KEY_OR_FILE24 value: file:///var/lib/gremlin/cert/gremlin.key25 - name: GREMLIN_TEAM_CERTIFICATE_OR_FILE26 value: file:///var/lib/gremlin/cert/gremlin.cert27 - name: GREMLIN_IDENTIFIER28 valueFrom:29 fieldRef:30 fieldPath: spec.nodeName31 # # Example proxy configuration32 # - name: https_proxy33 # value: http://my-proxy:312834 # - name: SSL_CERT_FILE35 # value: /etc/gremlin/ssl/proxy-ca.pem36 # - name: GREMLIN_TEAM_ID37 # value: my-team-id38 volumeMounts:39 - name: docker-sock40 mountPath: /var/run/docker.sock41 - name: gremlin-state42 mountPath: /var/lib/gremlin43 - name: gremlin-logs44 mountPath: /var/log/gremlin45 - name: gremlin-cert46 mountPath: /var/lib/gremlin/cert47 readOnly: true48 # # Example proxy configuration49 # - name: proxy-ca50 # mountPath: /etc/gremlin/ssl51 volumes:52 - name: docker-sock53 hostPath:54 path: /var/run/docker.sock55 - name: gremlin-state56 hostPath:57 path: /var/lib/gremlin58 - name: gremlin-logs59 hostPath:60 path: /var/log/gremlin61 - name: gremlin-cert62 secret:63 secretName: gremlin-secret64 # # Example proxy configuration65 # - name: proxy-ca66 # configMap:67 # name: proxy-ca68 backoffLimit: 4
Once deployed, you can get the output of gremlin check
by pulling the logs of the Pod associated with the Job:
1kubectl logs --follow \2 --namespace gremlin \3 $(kubectl get pods --namespace gremlin --selector=job-name=gremlin-check --output=jsonpath='{.items[*].metadata.name}')
1proxy2====================================================3https_proxy : http://proxy.local:31284http_proxy : (unset)5SSL_CERT_FILE : /etc/gremlin/ssl/proxy-ca.pem6Service Ping : OK