Accessing Local Data from inside Kind!
Following on from the recent kind pvc post. In this post we will explore how to bring up a kind cluster and use it to access data that you have locally on your machine via Persistent Volume Claims.
This gives us the ability to model pretty interesting deployments of applications that require access to a data pool!
Let’s get to it!
Summary
For this article I am going to use a txt file of a book and we can do some simple word counting.
For our book we are going to use The Project Gutenberg EBook of Pride and Prejudice, by Jane Austen
We are going to create a multi node kind cluster and access that txt file from pods running in our cluster!
Let’s make a directory locally that we will use to store our data
$ mkdir -p data/pride-and-prejudice
$ cd data/pride-and-prejudice/
$ curl -LO https://www.gutenberg.org/files/1342/1342-0.txt
$ wc -w 1342-0.txt
124707 data/pride-and-prejudice/1342-0.txt
Now for a kind config that mounts our data into our worker nodes!
kind-data.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
extraMounts:
- hostPath: ./data
containerPath: /tmp/data
- role: worker
extraMounts:
- hostPath: ./data
containerPath: /tmp/data
Let’s bring up the cluster!
Access Models
There are a couple of different ways we can provide access to this data! In
Kubernetes we have the ability to configure the pod with access to hostPath
$ kubectl explain pod.spec.volumes.hostpath
KIND: Pod
VERSION: v1
RESOURCE: hostPath <Object>
DESCRIPTION:
HostPath represents a pre-existing file or directory on the host machine
that is directly exposed to the container. This is generally used for
system agents or other privileged things that are allowed to see the host
machine. Most containers will NOT need this. More info:
https://kubernetes.io/docs/concepts/storage/volumes#hostpath
Represents a host path mapped into a pod. Host path volumes do not support
ownership management or SELinux relabeling.
FIELDS:
path <string> -required-
Path of the directory on the host. If the path is a symlink, it will follow
the link to the real path. More info:
https://kubernetes.io/docs/concepts/storage/volumes#hostpath
type <string>
Type for HostPath Volume Defaults to "" More info:
https://kubernetes.io/docs/concepts/storage/volumes#hostpath
For LOTS of good reasons this pattern is not a good one. Allowing hostPath
as
a volume for pods amounts to giving complete access to the underlying node.
A malicious or curious user of the cluster could mount the /var/run/docker.sock into their pod and have the ability to completely take over the underlying node. Since most nodes host workloads from many different applications this can compromise the security of your cluster pretty significantly!
All that said we will demonstrate how this works.
The other model is to provide access to the underlying hostPath
as a defined
persistent volume. This is better move because the person defining the pv has to
have the ability to define the pv at the cluster level and requires elevated
permissions.
Quick reminder here that persistent volumes are defined at cluster scope but persistent volume claims are namespaced!
If you are ever wondering what resources are namespaced and what aren’t check this out!
So TL;DR do this with Persistent Volumes not with hostPath!
The Setup!
I assume that you have already setup kind and all that comes with that.
I’ve made all the resources used in the following demonstrations available here
You can fetch them with
git clone https://gist.github.com/mauilion/c40b161822598e5b1720d3b34487fb82
pvc-books
And follow along!
hostPath
In this demo we will:
- configure a deployment to use hostPath
- bring up a pod and play with the data!
- show why hostpath is crazy town!
- cleanup
Persistent Volumes
In this demo we will:
- define a Persistent Volume
- configure a deployment and a persistent volume claim
- bring up the deployment and play with the data!
- cleanup
Persistent Volume Tricks!
Ever wondered how to ensure that a specific Persistent Volume will connect to a specific Persistent Volume Claim?
One of the most foolproof ways is to populate the claimRef with information that indicates where the pvc will be created.
We do this in our example pv.yaml
This way if you have multiple pvs you are “restoring” or “loading into a cluster” you can have some control over which pvc will attach to which pv.
Thanks!
In Closing
Giving a consumer hostpath access via Persistent Volume is very much a more sane way to provide that access!
- They can’t arbitrarily change the path to something else.
- Only someone with cluster level permission can define a Persistent Volume
Thanks for checking this out! I hope that it was helpful. If you have questions or ideas about something you’d like to see a post on hit me up on twitter!