How can pods talk to one another? A brief guide to Kubernetes networking

When I first encountered Kubernetes, its networking model seemed sufficiently advanced to look like magic. The problem is: I hate magic, and I hope you do too.

In this blog post, I hope to ruin your esoteric expectations about Kubernetes Networking.

This is the brief guide to Kubernetes networking I wish I had when I started writing Ergomake.

We'll start this post by explaining what an Ergomake environment is and what happens within our Kubernetes cluster when you run ergomake up.

Then, I'll go through each of those steps manually and explain everything that Kubernetes itself does when you create pods and services.

If you want to follow along, I'd recommend creating your own three-node Minikube cluster and giving it quite a bit of memory, just in case.

minikube start --nodes=2 --memory=max

What is an Ergomake environment?

At Ergomake, we take your docker-compose.yml files, run them on "the cloud", and make everything look like localhost.

Imagine you have the following docker-compose.yml file, for example.

version: '3.7'
services:
  elasticsearch:
    image: elasticsearch:7.17.0
    ports:
      - '9200'
    environment:
      - discovery.type=single-node
      - 'ES_JAVA_OPTS=-Xms512m -Xmx512m'

  kibana:
    image: kibana:7.17.0
    environment:
      # Kibana talks to Elasticsearch using its service name
      - 'ELASTICSEARCH_URL=http://elasticsearch:9200'
    ports:
      - '5601:5601'

When you run ergomake up within that file's folder, we'll run Kibana and Elasticsearch on our infrastructure and bind all "exposed" ports to localhost.

After that, you'll be able to access Kibana at localhost:5601, and you'll see that it talks to Elasticsearch using its service name, not an IP, as shown by the ELASTICSEARCH_URL environment variable above.

$ ergomake up
✔ Found no existing remote environments.
✔ Environment ready. Remote services are now bound to localhost.
[ Press CTRL+C to terminate the environment ]

# On another shell
$ curl http://localhost:5601 -u elastic:changeme
{
  "name": "0460be043a49",
  "uuid": "c63a6b0b-928d-44e4-97e7-234d59fc72cc",
  "version": {
    "number": "7.17.0",
    "build_hash": "60a9838d21b6420bbdb5a4d07099111b74c68ceb",
    "build_number": 46534,
    "build_snapshot": false
  },
  [ ... ]
}

What happens in Kubernetes when you run `ergomake up`?

When you run ergomake up, our CLI reads your compose file and sends it to our back-end, which we call kitchen. Once the file's contents reach the kitchen, they're parsed and transformed into a Custom Resource Definition: an ErgomakeEnv.

That ErgomakeEnv represents all services within your compose file, which images they use, and what ports they expose, among other things.

After generating an ErgomakeEnv, our back-end, kitchen, "applies" that ErgomakeEnv to the cluster.

Once an ErgomakeEnv is "applied," it triggers a Kubernetes Operator of our own, which we call dishwasher.

The dishwasher is a piece of software that transforms an ErgomakeEnv into Kubernetes resources like pods and services and ensures that environments are always running smoothly.

Now you know how Ergomake turns your docker-compose.yml file into Kubernetes resources.

Replaying steps manually

In this section, we'll manually apply each of those Kubernetes resources, essentially replaying Ergomake's steps. That way, you'll learn how these Kubernetes resources talk to each other at a high level.

Whenever a pod gets created, Kubernetes assigns an IP to that pod. Once your pod is up, it can talk to any other pods in your cluster unless you've explicitly configured your cluster for that not to be possible.

If you create a pod for Kibana and another for Elasticsearch, for example, Kibana will be able to talk to Elasticsearch using the IP assigned to the elasticsearch pod.

Let's go ahead and try that ourselves. First, we'll create a pod for Kibana and another for Elasticsearch.

apiVersion: v1
kind: Pod
metadata:
  name: elasticsearch
spec:
  containers:
    - image: elasticsearch:7.17.0
      name: elasticsearch
      env:
        - name: 'discovery.type'
          value: 'single-node'
---
apiVersion: v1
kind: Pod
metadata:
  name: kibana
spec:
  containers:
    - image: kibana:7.17.0
      name: kibana

After deploying those with kubectl apply -f ./example.yml, get the pods IPs with kubectl get pods -o wide.

$ kubectl get pods -o wide

NAMESPACE     NAME           READY   STATUS    IP             NODE
default       elasticsearch  1/1     Running   10.244.1.22    minikube-m02
default       kibana         1/1     Running   10.244.1.23    minikube-m02

With Elasticsearch's IPs, get a shell within Kibana's container and try to curl Elasticsearch using its IP and default port, 9200.

$ kubectl exec --stdin --tty kibana -- /bin/sh

$ curl 10.244.1.22:9200
{
  "name" : "elasticsearch",
  [ ... ]
  "tagline" : "You Know, for Search"
}

Although pods can talk to each other using IPs, there are two main problems with that approach.

The first problem is that you don't know which IP a pod will receive until you actually deploy it.

Imagine you wanted to configure that Kibana instance to connect to Elasticsearch. In that case, you'd have to create the Elasticsearch pod first, get its IP, and only then deploy Kibana setting the ELASTICSEARCH_URL to that IP. In other words, you wouldn't be able to deploy both pods simultaneously because there'd be no way to tell Kibana what's Elasticsearch's IP in advance.

The second problem with using IPs is that they may change.

For example, if you have to change environment variables for your elasticsearch pod, you'll have to deploy a new pod. When you do that, Kubernetes will again assign an IP to that pod. That new IP is not guaranteed to be the same as before (spoiler: it won't). The same thing will happen when deployment recreates replicas as it scales up or down and pods get reallocated to different nodes.

To solve these problems, we can use a service. By using a service, you can reference a particular set of services using a static name instead of an IP, which may change.

In our example, we will create a service called elasticsearch, just like Ergomake would do. Then, we will use that service name in Kibana's ELASTICSEARCH_URL.

For that service to work, it must include a selector indicating the pods to which it will route traffic and the definitions of which logical ports map to which ports within the pod.

apiVersion: v1
kind: Service
metadata:
  name: elasticsearch
spec:
  selector:
    app.kubernetes.io/name: elasticsearch
  ports:
    - protocol: TCP
      port: 9200
      targetPort: 9200
---
apiVersion: v1
kind: Pod
metadata:
  name: elasticsearch
  # Don't forget to add matching labels to the pod too!
  labels:
    app.kubernetes.io/name: elasticsearch
spec:
  containers:
    - image: elasticsearch:7.17.0
      name: elasticsearch
      env:
        - name: 'discovery.type'
          value: 'single-node'

After applying that file, try getting a shell within Kibana again. From there, curl the elasticsearch pod's port 9200 using the service's name instead of the pod IP.

$ kubectl exec --stdin --tty kibana -- /bin/sh

$ curl elasticsearch:9200
{
  "name" : "elasticsearch",
  [ ... ]
  "tagline" : "You Know, for Search"
}

Now, you can also set ELASTICSEARCH_URL for the kibana pod to connect to Elasticsearch through the elasticsearch service.

apiVersion: v1
kind: Pod
metadata:
  name: kibana
spec:
  containers:
    - image: kibana:7.17.0
      name: kibana
      env:
        - name: ELASTICSEARCH_URL
          value: elasticsearch:9200

Once you apply all these changes to your cluster, you'll see that Kibana successfully connects to Elasticsearch using its hostname.

$ kubectl logs kibana -f
[ ... ]
{"type":"log","@timestamp":"2023-03-13T18:31:09+00:00","tags":["info","status"],"pid":7,"message":"Kibana is now available (was degraded)"}

Now that you know how these resources talk to each other at a high level, we'll dig deeper into Kubernetes to make it less magical.

What happens when Kibana sends requests to `elasticsearch`?

In this section, you'll learn how a pod can talk to another using a service name.

Regardless of where your application runs, it must resolve hostnames into IPs before sending requests. For example, when you send a request to google.com, you must resolve google.com into an IP and then send the request there.

In our previous example, the same thing happened when we sent a request to elasticsearch from within Kibana.

Before it could send the request, the sender had to "translate" elasticsarch into an IP. Only then was it able to send a request to it.

You can see that DNS lookup by sending a verbose request (--vvvv) with curl from the Kibana pod.

$ kubectl exec --stdin --tty kibana -- /bin/sh

$ curl elasticsearch:9200 -vvvv
*   Trying 10.100.34.205:9200...
* TCP_NODELAY set
* Connected to elasticsearch (10.100.34.205) port 9200 (#0)
> GET / HTTP/1.1
> Host: elasticsearch:9200
> User-Agent: curl/7.68.0
> Accept: */*
[ ... ]

As shown above, the request to elasticsearch was sent to the IP 10.100.34.205, which is the elasticsearch service's IP.

$ kubectl get services

NAME            TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
elasticsearch   ClusterIP   10.100.34.205   <none>        9200/TCP   3h35m
kubernetes      ClusterIP   10.96.0.1       <none>        443/TCP    3h49m

As you would expect, sending a request to the service's IP will yield the same result as sending a request to elasticsearch.

$ kubectl exec --stdin --tty kibana -- /bin/sh

$ curl 10.100.34.205:9200
{
  "name" : "elasticsearch",
  [ ... ]
  "tagline" : "You Know, for Search"
}

That's the IP to which Kibana sends requests whenever it needs to reach Elasticsearch.

Now, two questions remain: who turns elasticsearch into the service's IP, and how do they know which IP it should be?

Question 1: Who turns `elasticsearch` into an IP?

In Linux systems, you can use the /etc/resolve.conf file to determine the server to which DNS lookup requests will be sent.

If we look at the contents of that file within our Kibana container, you'll see that it's sending DNS requests to 10.96.0.10.

$ kubectl exec --stdin --tty kibana -- /bin/sh

$ cat /etc/resolv.conf
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

That IP refers to a service called kube-dns, which is in the kube-system namespace, so you usually don't see it.

$ kubectl get services -n kube-system
NAME       TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                  AGE
kube-dns   ClusterIP   10.96.0.10   <none>        53/UDP,53/TCP,9153/TCP   4h20m

That service, in turn, points to the coredns pod, which runs the actual DNS server: CoreDNS. That pod is also in the kube-system namespace, so you don't usually see it either.

$ kubectl get pods -n kube-system
NAMESPACE     NAME                               READY   STATUS    RESTARTS   AGE
kube-system   coredns-565d847f94-xkmcx           1/1     Running   0          122m
[ ... ]

It's that coredns pod that resolves the elasticsearch name into the service's IP.

Question 2: How does CoreDNS know what's the IP for `elasticsearch`?

Among the pods in kube-system you don't usually see, there's kube-controller-manager.

$ kubectl get pods -n kube-system
NAME                               READY   STATUS    IP             NODE
kube-controller-manager-minikube   1/1     Running   192.168.49.2   minikube
[ ... ]

The kube-controller-manager pod watches the cluster's desired state and takes action for that desired state to become the cluster's actual state.

When you create a service, for example, the kube-controller-manager will break it down into further resources called Endpoint and EndpointSlices.

$ kubectl get endpoints
NAME            ENDPOINTS           AGE
elasticsearch   10.244.1.4:9200     4h56m
kubernetes      192.168.49.2:8443   5h10m

$ kubectl get endpointslices
NAME                  ADDRESSTYPE   PORTS   ENDPOINTS      AGE
elasticsearch-zdjcx   IPv4          9200    10.244.1.4     4h56m
kubernetes            IPv4          8443    192.168.49.2   5h10m

CoreDNS uses these Endpoints and EndpointSlices to resolve DNS queries. Whenever it gets a query, it'll look at these resources and respond with the correct IP.

If you look at its configuration, which is just a ConfigMap within kube-system, you'll see a reference to the CoreDNS Kubernetes plugin.

$ kubectl get configmap coredns -n kube-system -o yaml

apiVersion: v1
data:
  Corefile: |
    .:53 {
        errors
        health {
           lameduck 5s
        }
        ready

        # Here's the important part
        kubernetes cluster.local in-addr.arpa ip6.arpa {
           pods insecure
           fallthrough in-addr.arpa ip6.arpa
           ttl 30
        }

        # ...
    }

That plugin turns CoreDNS into a "cluster-aware" DNS server. Otherwise, it'd be a DNS server like any other.

How do services "forward" requests to pods?

In this section, you'll learn how requests to a service get redirected to a particular pod.

By now, perspicacious readers may have noticed that the IP for the elasticsearch service does not match the IP for the Elasticsearch pod. That IP is also not bound to any pod or virtual machine.

$ kubectl get services
NAME            TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
elasticsearch   ClusterIP   10.100.34.205   <none>        9200/TCP   26h
kubernetes      ClusterIP   10.96.0.1       <none>        443/TCP    26h

$ kubectl get pods -o wide
NAME            READY   STATUS    IP           NODE
elasticsearch   1/1     Running   10.244.1.4   minikube-m02
kibana          1/1     Running   10.244.1.5   minikube-m02

In that case, how can requests to the service's IP reach that service's pods?

The way a request gets to a service's pods is because its packets are not actually sent to the service's IP. Instead, the node rewrites packets addressed to the service's IP and addresses them to the pod's IP.

The way nodes rewrite packet addresses is by using a program called iptables. That program allows administrators to configure rules determining how network packets get treated.

Suppose you want to see some of these iptables rules. In that case, you can SSH into the Minikube node running these pods with minikube ssh -n minikube-m02 and then list all the iptables rules with iptables-save. Alternatively, you can filter only the rules containing "elasticsearch" by using iptables-save | grep elasticsearch.

You don't have to worry about understanding all the rules below. All you need to understand is that they (and a couple of others) get the packets addressed to the correct place: the pod.

docker@minikube-m02:~$ sudo iptables-save | grep elasticsearch
-A KUBE-SEP-EGMOGNCDKABRF3JZ -s 10.244.1.4/32 -m comment --comment "default/elasticsearch" -j KUBE-MARK-MASQ
-A KUBE-SEP-EGMOGNCDKABRF3JZ -p tcp -m comment --comment "default/elasticsearch" -m tcp -j DNAT --to-destination 10.244.1.4:9200
-A KUBE-SERVICES -d 10.100.34.205/32 -p tcp -m comment --comment "default/elasticsearch cluster IP" -m tcp --dport 9200 -j KUBE-SVC-JKSFFZ7OSH2DB73R
-A KUBE-SVC-JKSFFZ7OSH2DB73R ! -s 10.244.0.0/16 -d 10.100.34.205/32 -p tcp -m comment --comment "default/elasticsearch cluster IP" -m tcp --dport 9200 -j KUBE-MARK-MASQ
-A KUBE-SVC-JKSFFZ7OSH2DB73R -m comment --comment "default/elasticsearch -> 10.244.1.4:9200" -j KUBE-SEP-EGMOGNCDKABRF3JZ

Who creates these `iptables` rules?

Remember our friend kube-controller-manager? When that fellow creates Endpoints and EndpointSlices, a pod called kube-proxy reads those resources to make the IP tables rules which redirect packets from a service to that service's pods.

The kube-proxy pod runs in every node because it's spawned through a DaemonSet. That way, Kubernetes can ensure each node will have a kube-proxy to update that node's iptables rules.

As a note, when a service targets multiple pods, such as pods for a deployment with numerous replicas, it will create iptables rules which load balance the traffic between them. Those iptables rules take care of randomly assigning traffic to pods.

Putting it all together

Whenever you create a pod, it gets assigned an IP.

Any two pods in your cluster can talk to each other using their IP addresses.

The problem with using IP addresses for pods to talk to each other is that these IPs may change as pods get deleted and recreated.

For pods to consistently address each other correctly, you can use a Service.

When you create a service using kubectl, the Kubernetes apiserver will save its data, and another pod called kubernetes-controller-manager will wake up and break that service down into two resources: Endpoints and EndpointSlices.

CoreDNS will use those resources to know how to turn a service name into a service IP. Additionally, each node's kube-proxy pods will update the node's iptables rules. Those iptables rules cause requests to the service's IP to get addressed to the service's pods.

Finally, when a pod makes a request, it will do a DNS query to CoreDNS to get the service's IP. Then, when sending packets to that IP, the iptables rules created by kube-proxy will cause the packets to get addressed to an actual pod's IP.

A few more notes

I've intentionally skipped a few details to avoid confusing the reader.

Among those details is how a pod gets assigned an IP and how iptables rules work.

I also haven't touched on CNI plugin implementations, like Kindnet.

A tour through container networking itself would also be helpful for most readers.

Finally, if you want to learn more about CoreDNS itself, this talk is a great start.

Wanna chat?

We're a two-people startup, and we love talking to interesting people.

If you'd like to chat, you can book a slot with me here.

I'd love to discuss Kubernetes, command-line interfaces, ephemeral environments, or what we're building at Ergomake.

Alternatively, you can send me a tweet or DM @thewizardlucas or an email at lucas.costa@getergomake.com.