When I first encountered Kubernetes, its networking model seemed sufficiently advanced to look like magic. The problem is: I hate magic, and I hope you do too.
In this blog post, I hope to ruin your esoteric expectations about Kubernetes Networking.
This is the brief guide to Kubernetes networking I wish I had when I started writing Ergomake
.
We'll start this post by explaining what an Ergomake
environment is and what happens within our Kubernetes cluster when you run ergomake up
.
Then, I'll go through each of those steps manually and explain everything that Kubernetes itself does when you create pods and services.
If you want to follow along, I'd recommend creating your own three-node Minikube cluster and giving it quite a bit of memory, just in case.
minikube start --nodes=2 --memory=max
What is an Ergomake environment?
At Ergomake, we take your docker-compose.yml
files, run them on "the cloud", and make everything look like localhost
.
Imagine you have the following docker-compose.yml
file, for example.
version: '3.7'
services:
elasticsearch:
image: elasticsearch:7.17.0
ports:
- '9200'
environment:
- discovery.type=single-node
- 'ES_JAVA_OPTS=-Xms512m -Xmx512m'
kibana:
image: kibana:7.17.0
environment:
# Kibana talks to Elasticsearch using its service name
- 'ELASTICSEARCH_URL=http://elasticsearch:9200'
ports:
- '5601:5601'
When you run ergomake up
within that file's folder, we'll run Kibana and Elasticsearch on our infrastructure and bind all "exposed" ports to localhost
.
After that, you'll be able to access Kibana at localhost:5601
, and you'll see that it talks to Elasticsearch using its service name, not an IP, as shown by the ELASTICSEARCH_URL
environment variable above.
$ ergomake up
ā Found no existing remote environments.
ā Environment ready. Remote services are now bound to localhost.
[ Press CTRL+C to terminate the environment ]
# On another shell
$ curl http://localhost:5601 -u elastic:changeme
{
"name": "0460be043a49",
"uuid": "c63a6b0b-928d-44e4-97e7-234d59fc72cc",
"version": {
"number": "7.17.0",
"build_hash": "60a9838d21b6420bbdb5a4d07099111b74c68ceb",
"build_number": 46534,
"build_snapshot": false
},
[ ... ]
}
What happens in Kubernetes when you run ergomake up
?
When you run ergomake up
, our CLI reads your compose file and sends it to our back-end, which we call kitchen
. Once the file's contents reach the kitchen
, they're parsed and transformed into a Custom Resource Definition
: an ErgomakeEnv
.
That ErgomakeEnv
represents all services within your compose file, which images they use, and what ports they expose, among other things.
After generating an ErgomakeEnv
, our back-end, kitchen
, "applies" that ErgomakeEnv
to the cluster.
Once an ErgomakeEnv
is "applied," it triggers a Kubernetes Operator of our own, which we call dishwasher
.
The dishwasher
is a piece of software that transforms an ErgomakeEnv
into Kubernetes resources like pods and services and ensures that environments are always running smoothly.
Now you know how Ergomake turns your docker-compose.yml
file into Kubernetes resources.
Replaying steps manually
In this section, we'll manually apply each of those Kubernetes resources, essentially replaying Ergomake's steps. That way, you'll learn how these Kubernetes resources talk to each other at a high level.
Whenever a pod gets created, Kubernetes assigns an IP to that pod. Once your pod is up, it can talk to any other pods in your cluster unless you've explicitly configured your cluster for that not to be possible.
If you create a pod for Kibana and another for Elasticsearch, for example, Kibana will be able to talk to Elasticsearch using the IP assigned to the elasticsearch
pod.
Let's go ahead and try that ourselves. First, we'll create a pod for Kibana and another for Elasticsearch.
apiVersion: v1
kind: Pod
metadata:
name: elasticsearch
spec:
containers:
- image: elasticsearch:7.17.0
name: elasticsearch
env:
- name: 'discovery.type'
value: 'single-node'
---
apiVersion: v1
kind: Pod
metadata:
name: kibana
spec:
containers:
- image: kibana:7.17.0
name: kibana
After deploying those with kubectl apply -f ./example.yml
, get the pods IPs with kubectl get pods -o wide
.
$ kubectl get pods -o wide
NAMESPACE NAME READY STATUS IP NODE
default elasticsearch 1/1 Running 10.244.1.22 minikube-m02
default kibana 1/1 Running 10.244.1.23 minikube-m02
With Elasticsearch's IPs, get a shell within Kibana's container and try to curl
Elasticsearch using its IP and default port, 9200
.
$ kubectl exec --stdin --tty kibana -- /bin/sh
$ curl 10.244.1.22:9200
{
"name" : "elasticsearch",
[ ... ]
"tagline" : "You Know, for Search"
}
Although pods can talk to each other using IPs, there are two main problems with that approach.
The first problem is that you don't know which IP a pod will receive until you actually deploy it.
Imagine you wanted to configure that Kibana instance to connect to Elasticsearch. In that case, you'd have to create the Elasticsearch pod first, get its IP, and only then deploy Kibana setting the ELASTICSEARCH_URL
to that IP. In other words, you wouldn't be able to deploy both pods simultaneously because there'd be no way to tell Kibana what's Elasticsearch's IP in advance.
The second problem with using IPs is that they may change.
For example, if you have to change environment variables for your elasticsearch
pod, you'll have to deploy a new pod. When you do that, Kubernetes will again assign an IP to that pod. That new IP is not guaranteed to be the same as before (spoiler: it won't). The same thing will happen when deployment
recreates replicas as it scales up or down and pods get reallocated to different nodes.
To solve these problems, we can use a service
. By using a service, you can reference a particular set of services using a static name instead of an IP, which may change.
In our example, we will create a service called elasticsearch
, just like Ergomake would do. Then, we will use that service name in Kibana's ELASTICSEARCH_URL
.
For that service to work, it must include a selector indicating the pods to which it will route traffic and the definitions of which logical ports map to which ports within the pod.
apiVersion: v1
kind: Service
metadata:
name: elasticsearch
spec:
selector:
app.kubernetes.io/name: elasticsearch
ports:
- protocol: TCP
port: 9200
targetPort: 9200
---
apiVersion: v1
kind: Pod
metadata:
name: elasticsearch
# Don't forget to add matching labels to the pod too!
labels:
app.kubernetes.io/name: elasticsearch
spec:
containers:
- image: elasticsearch:7.17.0
name: elasticsearch
env:
- name: 'discovery.type'
value: 'single-node'
After applying that file, try getting a shell within Kibana again. From there, curl
the elasticsearch
pod's port 9200
using the service's name instead of the pod IP.
$ kubectl exec --stdin --tty kibana -- /bin/sh
$ curl elasticsearch:9200
{
"name" : "elasticsearch",
[ ... ]
"tagline" : "You Know, for Search"
}
Now, you can also set ELASTICSEARCH_URL
for the kibana
pod to connect to Elasticsearch through the elasticsearch
service.
apiVersion: v1
kind: Pod
metadata:
name: kibana
spec:
containers:
- image: kibana:7.17.0
name: kibana
env:
- name: ELASTICSEARCH_URL
value: elasticsearch:9200
Once you apply all these changes to your cluster, you'll see that Kibana successfully connects to Elasticsearch using its hostname.
$ kubectl logs kibana -f
[ ... ]
{"type":"log","@timestamp":"2023-03-13T18:31:09+00:00","tags":["info","status"],"pid":7,"message":"Kibana is now available (was degraded)"}
Now that you know how these resources talk to each other at a high level, we'll dig deeper into Kubernetes to make it less magical.
What happens when Kibana sends requests to elasticsearch
?
In this section, you'll learn how a pod can talk to another using a service name.
Regardless of where your application runs, it must resolve hostnames into IPs before sending requests. For example, when you send a request to google.com
, you must resolve google.com
into an IP and then send the request there.
In our previous example, the same thing happened when we sent a request to elasticsearch
from within Kibana.
Before it could send the request, the sender had to "translate" elasticsarch
into an IP. Only then was it able to send a request to it.
You can see that DNS lookup by sending a verbose request (--vvvv
) with curl
from the Kibana pod.
$ kubectl exec --stdin --tty kibana -- /bin/sh
$ curl elasticsearch:9200 -vvvv
* Trying 10.100.34.205:9200...
* TCP_NODELAY set
* Connected to elasticsearch (10.100.34.205) port 9200 (#0)
> GET / HTTP/1.1
> Host: elasticsearch:9200
> User-Agent: curl/7.68.0
> Accept: */*
[ ... ]
As shown above, the request to elasticsearch
was sent to the IP 10.100.34.205
, which is the elasticsearch
service's IP.
$ kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
elasticsearch ClusterIP 10.100.34.205 <none> 9200/TCP 3h35m
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 3h49m
As you would expect, sending a request to the service's IP will yield the same result as sending a request to elasticsearch
.
$ kubectl exec --stdin --tty kibana -- /bin/sh
$ curl 10.100.34.205:9200
{
"name" : "elasticsearch",
[ ... ]
"tagline" : "You Know, for Search"
}
That's the IP to which Kibana sends requests whenever it needs to reach Elasticsearch.
Now, two questions remain: who turns elasticsearch
into the service's IP, and how do they know which IP it should be?
Question 1: Who turns elasticsearch
into an IP?
In Linux systems, you can use the /etc/resolve.conf
file to determine the server to which DNS lookup requests will be sent.
If we look at the contents of that file within our Kibana container, you'll see that it's sending DNS requests to 10.96.0.10
.
$ kubectl exec --stdin --tty kibana -- /bin/sh
$ cat /etc/resolv.conf
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
That IP refers to a service called kube-dns
, which is in the kube-system
namespace, so you usually don't see it.
$ kubectl get services -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 4h20m
That service, in turn, points to the coredns
pod, which runs the actual DNS server: CoreDNS. That pod is also in the kube-system
namespace, so you don't usually see it either.
$ kubectl get pods -n kube-system
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-565d847f94-xkmcx 1/1 Running 0 122m
[ ... ]
It's that coredns
pod that resolves the elasticsearch
name into the service's IP.
Question 2: How does CoreDNS know what's the IP for elasticsearch
?
Among the pods in kube-system
you don't usually see, there's kube-controller-manager
.
$ kubectl get pods -n kube-system
NAME READY STATUS IP NODE
kube-controller-manager-minikube 1/1 Running 192.168.49.2 minikube
[ ... ]
The kube-controller-manager
pod watches the cluster's desired state and takes action for that desired state to become the cluster's actual state.
When you create a service
, for example, the kube-controller-manager
will break it down into further resources called Endpoint
and EndpointSlices
.
$ kubectl get endpoints
NAME ENDPOINTS AGE
elasticsearch 10.244.1.4:9200 4h56m
kubernetes 192.168.49.2:8443 5h10m
$ kubectl get endpointslices
NAME ADDRESSTYPE PORTS ENDPOINTS AGE
elasticsearch-zdjcx IPv4 9200 10.244.1.4 4h56m
kubernetes IPv4 8443 192.168.49.2 5h10m
CoreDNS uses these Endpoints
and EndpointSlices
to resolve DNS queries. Whenever it gets a query, it'll look at these resources and respond with the correct IP.
If you look at its configuration, which is just a ConfigMap
within kube-system
, you'll see a reference to the CoreDNS Kubernetes plugin.
$ kubectl get configmap coredns -n kube-system -o yaml
apiVersion: v1
data:
Corefile: |
.:53 {
errors
health {
lameduck 5s
}
ready
# Here's the important part
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
# ...
}
That plugin turns CoreDNS into a "cluster-aware" DNS server. Otherwise, it'd be a DNS server like any other.
How do services "forward" requests to pods?
In this section, you'll learn how requests to a service get redirected to a particular pod.
By now, perspicacious readers may have noticed that the IP for the elasticsearch
service does not match the IP for the Elasticsearch pod. That IP is also not bound to any pod or virtual machine.
$ kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
elasticsearch ClusterIP 10.100.34.205 <none> 9200/TCP 26h
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 26h
$ kubectl get pods -o wide
NAME READY STATUS IP NODE
elasticsearch 1/1 Running 10.244.1.4 minikube-m02
kibana 1/1 Running 10.244.1.5 minikube-m02
In that case, how can requests to the service's IP reach that service's pods?
The way a request gets to a service's pods is because its packets are not actually sent to the service's IP. Instead, the node rewrites packets addressed to the service's IP and addresses them to the pod's IP.
The way nodes rewrite packet addresses is by using a program called iptables
. That program allows administrators to configure rules determining how network packets get treated.
Suppose you want to see some of these iptables
rules. In that case, you can SSH into the Minikube node running these pods with minikube ssh -n minikube-m02
and then list all the iptables
rules with iptables-save
. Alternatively, you can filter only the rules containing "elasticsearch
" by using iptables-save | grep elasticsearch
.
You don't have to worry about understanding all the rules below. All you need to understand is that they (and a couple of others) get the packets addressed to the correct place: the pod.
docker@minikube-m02:~$ sudo iptables-save | grep elasticsearch
-A KUBE-SEP-EGMOGNCDKABRF3JZ -s 10.244.1.4/32 -m comment --comment "default/elasticsearch" -j KUBE-MARK-MASQ
-A KUBE-SEP-EGMOGNCDKABRF3JZ -p tcp -m comment --comment "default/elasticsearch" -m tcp -j DNAT --to-destination 10.244.1.4:9200
-A KUBE-SERVICES -d 10.100.34.205/32 -p tcp -m comment --comment "default/elasticsearch cluster IP" -m tcp --dport 9200 -j KUBE-SVC-JKSFFZ7OSH2DB73R
-A KUBE-SVC-JKSFFZ7OSH2DB73R ! -s 10.244.0.0/16 -d 10.100.34.205/32 -p tcp -m comment --comment "default/elasticsearch cluster IP" -m tcp --dport 9200 -j KUBE-MARK-MASQ
-A KUBE-SVC-JKSFFZ7OSH2DB73R -m comment --comment "default/elasticsearch -> 10.244.1.4:9200" -j KUBE-SEP-EGMOGNCDKABRF3JZ
Who creates these iptables
rules?
Remember our friend kube-controller-manager
? When that fellow creates Endpoints and EndpointSlices, a pod called kube-proxy
reads those resources to make the IP tables rules which redirect packets from a service to that service's pods.
The kube-proxy
pod runs in every node because it's spawned through a DaemonSet. That way, Kubernetes can ensure each node will have a kube-proxy
to update that node's iptables
rules.
As a note, when a service targets multiple pods, such as pods for a
deployment
with numerous replicas, it will createiptables
rules which load balance the traffic between them. Thoseiptables
rules take care of randomly assigning traffic to pods.
Putting it all together
Whenever you create a pod, it gets assigned an IP.
Any two pods in your cluster can talk to each other using their IP addresses.
The problem with using IP addresses for pods to talk to each other is that these IPs may change as pods get deleted and recreated.
For pods to consistently address each other correctly, you can use a Service.
When you create a service
using kubectl
, the Kubernetes apiserver
will save its data, and another pod called kubernetes-controller-manager
will wake up and break that service down into two resources: Endpoints and EndpointSlices.
CoreDNS will use those resources to know how to turn a service name into a service IP. Additionally, each node's kube-proxy
pods will update the node's iptables
rules. Those iptables
rules cause requests to the service's IP to get addressed to the service's pods.
Finally, when a pod makes a request, it will do a DNS query to CoreDNS to get the service's IP. Then, when sending packets to that IP, the iptables
rules created by kube-proxy
will cause the packets to get addressed to an actual pod's IP.
A few more notes
I've intentionally skipped a few details to avoid confusing the reader.
Among those details is how a pod gets assigned an IP and how iptables
rules work.
I also haven't touched on CNI plugin implementations, like Kindnet.
A tour through container networking itself would also be helpful for most readers.
Finally, if you want to learn more about CoreDNS itself, this talk is a great start.
Wanna chat?
We're a two-people startup, and we love talking to interesting people.
If you'd like to chat, you can book a slot with me here.
I'd love to discuss Kubernetes, command-line interfaces, ephemeral environments, or what we're building at Ergomake.
Alternatively, you can send me a tweet or DM @thewizardlucas or an email at lucas.costa@getergomake.com.