Following on from PodSecurityPolicy is Dead, Long Live...? this tutorial covers the practical use of a new tool from the Kubernetes Node Specialist Interest Group.
The Linux kernel (the same marvel that brings us containers) provides a few capabilities for bridging the last mile in security management limiting the actual capabilities of the processes that are running. This should be seen as one of the most impactful changes you can make to disrupting the Cyber Kill Chain in your organisation.
The three technologies: seccomp, AppArmor, SELinux are best used in a microservice architecture where each service handles only a small discrete task that can be effectively limited to do only that.
It would likely be largely ineffective applied to a monolithic application that has broad capabilities, where limiting it to its ‘business-as-usual’ wouldn’t really rule much if anything out.
seccomp (short for secure computing mode) is a computer security facility in the Linux kernel. "seccomp allows a process to make a one-way transition into a secure" state where it can only make limited system calls. Should it attempt any other system calls, the kernel will either just log or terminate the process.
Source: https://en.wikipedia.org/wiki/Seccomp
AppArmor ("Application Armor") is a Linux kernel security module that allows the system administrator to restrict programs' capabilities with per-program profiles. Profiles can allow capabilities like network access, raw socket access, and the permission to read, write, or execute files on matching paths. AppArmor supplements the traditional Unix discretionary access control (DAC) model by providing mandatory access control (MAC).
Source: https://en.wikipedia.org/wiki/Seccomp
SELinux is a set of kernel modifications and user-space tools that have been added to various Linux distributions. Its architecture strives to separate enforcement of security decisions from the security policy, and streamlines the amount of software involved with security policy enforcement.
Source: https://en.wikipedia.org/wiki/Security-Enhanced_Linux
However, managing them is not easy, so unsurprisingly lots of commercial products have entered the space with all sorts of buzzwords like ‘artificial intelligence’ and ‘machine learning’.
These commercial offerings are great and can simplify the implementation but it’s worth understanding how things are working under the hood and electing how much control you might relinquish to an algorithm.
Relatively recently a Kubernetes special interest group has developed the Kubernetes Security Profiles Operator which works to expose the power of seccomp, SELinux and AppArmor to end users.
The technologies are not mutually exclusive, and I would encourage combining them, but for the sake of this article I’ll be focusing on seccomp since it is currently the best supported by the Security Profiles Operator at time of writing and it has been cited as mitigating some recent high profile vulnerabilities e.g. Polkit Pwnkit CVE-2021-4034.
In short to inform your technology choices:
This is a tutorial on how to use the Security Profiles Operator to
I'm going to demo this using Docker Desktop and also Podman machine for a mac. You can follow the same steps if you are on a Linux or Windows machine.
Docker machine things might be a lot easier for you if you're using a linux machine with auditd/syslog enabled, but since the vm that Docker Desktop (linuxkit) or podman-machine (fedora core) doesn't ship with that running, we'll have to run our own.
This assumes you're using docker (inc Docker Desktop) or Podman; podman machine requires a few tweaks, I've added these as comments and suffixed the line with PODMAN ONLY and PODMAN MACHINE ONLY where necessary you'll need to just uncomment these line.
We’re going to use KiND to run a local kubernetes cluster.
You need to mount force the /proc to be mounted through to the nodes, if you have multiple nodes you'll need to add the extraMounts section to each node.
# export KIND_EXPERIMENTAL_PROVIDER=podman # PODMAN ONLY
# podman machine init --cpus=4 --memory=8096 # PODMAN MACHINE ONLY
# podman machine start # PODMAN MACHINE ONLY
# podman system connection default podman-machine-default-root # PODMAN MACHINE ONLY
kind create cluster --config - << EOF
apiVersion: kind.x-k8s.io/v1alpha4
kind: Cluster
name: kind
networking:
# apiServerAddress: "0.0.0.0" # PODMAN ONLY
nodes:
- role: control-plane
image: kindest/node:v1.23.3
extraMounts:
- hostPath: /proc
containerPath: /hostproc
EOF
# sed -i '' 's/https:\/\/:/https:\/\/localhost:/g' ~/.kube/config # PODMAN ONLY
Mounting the /proc is important since it allows us to match the process ids fro the kernel level audit logs, through to the namespaced process ids within the KiND namespaced cgroup.
Podman machine and Docker Desktop use a vm that doesn't ship with syslog or auditd which you'll need to write the logs for the log enricher to then collect, this needs to be deployed as a DaemonSet across the cluster. You may be able to skip this step if you're using a linux workstation or podman-machine which can use eBPF instead of log-enrichment.
kubectl apply -k github.com/chrisns/syslog-auditd
kubectl --namespace kube-system wait --for condition=ready pods -l name=syslog
kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.6.1/cert-manager.yaml
kubectl --namespace cert-manager wait --for condition=ready pod -l app.kubernetes.io/instance=cert-manage
kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/security-profiles-operator/main/deploy/operator.yaml
kubectl --namespace security-profiles-operator wait --for condition=ready ds name=spo
kubectl --namespace security-profiles-operator patch spod spod --type=merge -p '{"spec":{"hostProcVolumePath":"/hostproc"}}'
kubectl --namespace security-profiles-operator patch spod spod --type=merge -p '{"spec":{"enableLogEnricher":true}}' # DOCKER DESKTOP ONLY
# kubectl --namespace security-profiles-operator patch spod spod --type=merge -p '{"spec":{"enableBpfRecorder":true}}' # PODMAN / LINUX HOST ONLY
kubectl --namespace security-profiles-operator wait --for condition=ready pod -l name=spo
$ kubectl apply -f https://raw.githubusercontent.com/appvia/security-profiles-operator-demo/main/demo-recorder.yaml
$ kubectl run my-pod --image=nginx --labels app=demo && kubectl wait --for condition=ready --timeout=-1s pod my-pod && kubectl delete pod my-pod
pod/my-pod created
pod/my-pod condition met
pod "my-pod" deleted
$ kubectl run --rm -it my-pod --image=alpine --labels app=demo -- sh
If you don't see a command prompt, try pressing enter.
/ # ls
bin dev etc home lib media mnt opt proc root run sbin srv sys tmp usr var
/ # exit
Session ended, resume using 'kubectl attach my-pod -c my-pod -i -t' command when the pod is running
pod "my-pod" delete
You'll now have a profile thats ready to use (note it is only aggregated and created when the pod exits).
We can check what that looks like with and export it to keep it in our version control kubectl neat get sp demo-recorder-my-pod -o yaml should give you a yaml that looks like:
I'm using kubectl-neat to make the output less verbose.
apiVersion: security-profiles-operator.x-k8s.io/v1beta1
kind: SeccompProfile
metadata:
labels:
spo.x-k8s.io/profile-id: SeccompProfile-demo-recorder-my-pod
name: demo-recorder-my-pod
namespace: default
spec:
architectures:
- SCMP_ARCH_AARCH64
defaultAction: SCMP_ACT_ERRNO
syscalls:
- action: SCMP_ACT_ALLOW
names:
- brk
- capget
- capset
- chdir
- clone
- close
- epoll_ctl
- execve
- exit_group
- fchown
- fcntl
- fstat
- fstatfs
- futex
- getcwd
- getdents64
- geteuid
- getpgid
- getpid
- getppid
- getuid
- ioctl
- lseek
- madvise
- mmap
- mprotect
- munmap
- nanosleep
- newfstatat
- openat
- ppoll
- prctl
- read
- rt_sigaction
- rt_sigprocmask
- rt_sigreturn
- set_tid_address
- setgid
- setgroups
- setpgid
- setuid
- wait4
- writ
- writev
For shorthand we're gonna use --overrides to force in some extra things to the podspec.
$ kubectl run --rm -ti my-pod --image=alpine --overrides='{ "spec": {"securityContext": {"seccompProfile": {"type": "Localhost", "localhostProfile": "operator/default/demo-recorder-my-pod.json"}}}}' -- sh
/ # ls
bin dev etc home lib media mnt opt proc root run sbin srv sys tmp usr var
/ # exit
Session ended, resume using 'kubectl attach my-pod -c my-pod -i -t' command when the pod is running
pod "my-pod" deleted
Ok, so we've not broken anything.
$ kubectl run --rm -ti my-pod --image=alpine -- sh
If you don't see a command prompt, try pressing enter.
/ # mkdir foo
/ # touch bar
/ # rm /etc/alpine-release
/ # ping -c 1 1.1.1.1
PING 1.1.1.1 (1.1.1.1): 56 data bytes
64 bytes from 1.1.1.1: seq=0 ttl=37 time=20.657 ms
--- 1.1.1.1 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 20.657/20.657/20.657 ms
/ # nslookup google.com
Server: 10.96.0.10
Address: 10.96.0.10:53
Non-authoritative answer:
Name: google.com
Address: 142.250.187.238
Non-authoritative answer:
Name: google.com
Address: 2a00:1450:4009:81f::200e
/ # wget -q 1.1.1.1
/ # exit
Session ended, resume using 'kubectl attach my-pod -c my-pod -i -t' command when the pod is running
pod "my-pod" delete
$ kubectl run --rm -ti my-pod --image=alpine --overrides='{ "spec": {"securityContext": {"seccompProfile": {"type": "Localhost", "localhostProfile": "operator/default/demo-recorder-my-pod.json"}}}}' -- sh
/ # mkdir foo
mkdir: can't create directory 'foo': Operation not permitted
/ # touch bar
touch: bar: Operation not permitted
/ # rm /etc/alpine-release
rm: remove '/etc/alpine-release'? y
rm: can't remove '/etc/alpine-release': Operation not permitted
/ # ping -c 1 1.1.1.1
PING 1.1.1.1 (1.1.1.1): 56 data bytes
ping: permission denied (are you root?)
/ # nslookup google.com
nslookup: socket(AF_INET,2,0): Operation not permitted
/ # wget -q 1.1.1.1
wget: socket(AF_INET,1,0): Operation not permitted
Cool, so we're pretty trapped, but this is quite a contrived example, lets try with something a bit more real
For this exercise we'll deploy Wordpress which needs MySQL/MariaDB and we'll also throw in phpMyAdmin for 'fun'.
First let's deploy our recorder.
kubectl apply -f https://raw.githubusercontent.com/appvia/security-profiles-operator-demo/main/wordpress-recorder.yaml
Now let’s deploy our apps
kubectl apply -k github.com/appvia/wordpress-kustomization-demo
kubectl wait --for condition=ready pod mysql-0
Now let's go to the wordpress gui and check
kubectl port-forward svc/wordpress 8080:http
Open a browser to http://localhost:8080.
It doesn't really matter what config you give it, you're not going to keep this installation, you can imagine if this were your app, you might run your end to end tests right now.
Do some other things now, like create a blog post, upload images etc.
Now lets try phpmyadmin.
kubectl port-forward svc/phpmyadmin 8081:http
And go to it in the browser http://localhost:8081/?db=mydb&table=wp_posts which proves it’s all talking to one another, you can click around and do some other things like upload/download a file etc if you like.
Now lets delete our pods collect our profiles and stop recording
kubectl delete -k github.com/appvia/wordpress-kustomization-demo
kubectl delete -f https://raw.githubusercontent.com/appvia/security-profiles-operator-demo/main/wordpress-recorder.yaml
kubectl neat get sp wordpress-mysql -o yaml > mysql-seccomp-profile.yaml
kubectl neat get sp wordpress-phpmyadmin-0 -o yaml > phpmyadmin-seccomp-profile.yaml
kubectl neat get sp wordpress-wordpress-0 -o yaml > wordpress-seccomp-profile.yaml
Now we've got our profiles, we can either update our deployment code and include the seccomp profile with our infra code, or if perhaps this isn't in your control, perhaps its a public helm chart you've got no influence over you can use a Security Profiles Operator provided binding instead.
kubectl apply -f - << EOF
apiVersion: security-profiles-operator.x-k8s.io/v1alpha1
kind: ProfileBinding
metadata:
name: wordpress-wordpress
spec:
profileRef:
kind: SeccompProfile
name: wordpress-wordpress-0
image: wordpress:5.8.2-php7.4-apache
---
apiVersion: security-profiles-operator.x-k8s.io/v1alpha1
kind: ProfileBinding
metadata:
name: wordpress-phpmyadmin
spec:
profileRef:
kind: SeccompProfile
name: wordpress-phpmyadmin-0
image: phpmyadmin:5.1.1-apache
---
apiVersion: security-profiles-operator.x-k8s.io/v1alpha1
kind: ProfileBinding
metadata:
name: wordpress-mysql
spec:
profileRef:
kind: SeccompProfile
name: wordpress-mysql
image: mariadb:10.6.5-focal
EOF
If we look at the pods you'll find that the Security Profiles Operator has mutated the pod specs and injected something like:
securityContext:
seccompProfile:
localhostProfile: operator/default/wordpress-mysql.json
type: Localhost
Now let's check it's all enforcing as we expect:
$ kubectl exec -ti deploy/phpmyadmin -- sh
# touch f
touch: setting times of 'f': Operation not permitted
# su
su: write error: Operation not permitted
You may find to your disappointment as I did that many community (and commercial) products often dance like no one is watching and require quite liberal access to kernel syscalls, but we can at least now monitor what they can do and be aware when the required permissions change.
It's worth noting that some recent CVEs may been mitigated with even the default seccomp profile.
If you’ve followed along with this tutorial you should now be all set to start capturing seccomp profiles for your workload, and have all the tools you need to work that into your continuous integration + deployment pipelines to be able to enforce them in your clusters.