I've noticed a few people running into DNS issues when attempting to install Kubernetes from scratch on VirtualBox.

As you may already know, VirtualBox's default NAT network assigns an IP address from a 10.0.2.0/24 CIDR to any virtual machine with a network adapter set to NAT. This will result in your virtual machine receiving a DHCP option that sets /etc/resolv.conf to a nameserver that is typically at something like 10.0.2.3.

From Ubuntu 16.04:

ubuntu@worker1:~$ cat /etc/resolv.conf
 # Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
#     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 10.0.2.3

To keep things simple in a Kubernetes environment, you may want to have each worker node's pod network on a unique /24. This would result in each worker node's network bridge getting assigned something like 10.200.0.1/24, 10.200.1.1/24, 10.200.2.1/24, and so on. If using kubenet, the out-of-the-box Kubernetes networking plugin, this can be achieved by setting the --cluster-cidr flag on the kube-controller-manager daemon to 10.200.0.0/16.

Here's an example of one commonly used kube-controller-manager systemd unit config from Kelsey Hightower's Learn Kubernetes the Hard Way

cat > kube-controller-manager.service <<EOF
[Unit]
Description=Kubernetes Controller Manager
Documentation=https://github.com/GoogleCloudPlatform/kubernetes

[Service]
ExecStart=/usr/bin/kube-controller-manager \\
  --address=0.0.0.0 \\
  --allocate-node-cidrs=true \\
  --cluster-cidr=10.200.0.0/16 \\
  --cluster-name=kubernetes \\
  --cluster-signing-cert-file="/var/lib/kubernetes/ca.pem" \\
  --cluster-signing-key-file="/var/lib/kubernetes/ca-key.pem" \\
  --leader-elect=true \\
  --master=http://${INTERNAL_IP}:8080 \\
  --root-ca-file=/var/lib/kubernetes/ca.pem \\
  --service-account-private-key-file=/var/lib/kubernetes/ca-key.pem \\
  --service-cluster-ip-range=10.32.0.0/16 \\
  --v=2
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target
EOF

After creating your DNS deployment in the kube-system namespace, resolution of service names inside the cluster may work fine:

ubuntu@worker1:~$ kubectl exec -it mypod /bin/sh
/ # nslookup kubernetes
Server:    10.32.0.10
Address 1: 10.32.0.10 kube-dns.kube-system.svc.cluster.local

Name:      kubernetes
Address 1: 10.32.0.1 kubernetes.default.svc.cluster.local

But resolution to something outside your cluster doesn't work:

/ # nslookup kubernetes.io
Server:    10.32.0.10
Address 1: 10.32.0.10 kube-dns.kube-system.svc.cluster.local

nslookup: can't resolve 'kubernetes.io'

If we check out the upstream kube-dns deployment object manifest the dnsmasq container does not have the --resolv-file flag set. This flag would allow us to specify our own customized resolv.conf. By default, the dnsmasq container will use the /etc/resolv.conf of the node the pod resides on for resolving all non-cluster queries. In our case with VirtualBox NAT network, it will use the 10.0.2.3 nameserver.

Rather than play around with setting a new DNS server like 8.8.8.8, let's do some tcpdumping on the worker node's NAT network interface to see if we can see the packets attempting to hit the VirtualBox DNS:

ubuntu@worker1:~$ sudo tcpdump -n "dst host 10.0.2.3"
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on cbr0, link-type EN10MB (Ethernet), capture size 262144 bytes
02:29:54.364528 IP 10.200.0.3.15612 > 10.0.2.3.53: 28501+ AAAA? kubernetes.io. (31)
02:29:54.364629 IP 10.200.0.3.15612 > 10.0.2.3.53: 28501+ AAAA? kubernetes.io. (31)
02:29:59.370219 IP 10.200.0.3.22452 > 10.0.2.3.53: 53929+ AAAA? kubernetes.io. (31)
02:29:59.370413 IP 10.200.0.3.22452 > 10.0.2.3.53: 53929+ AAAA? kubernetes.io. (31)

It looks like the source IP address is still the original pod IP.

Checking iptables NAT table, we can see the following in the POSTROUTING chain:

ubuntu@worker1:~$ sudo iptables -t nat -vxnL POSTROUTING

Chain POSTROUTING (policy ACCEPT 8 packets, 480 bytes)
    pkts      bytes target     prot opt in     out     source               destination
      861    57670 KUBE-POSTROUTING  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes postrouting rules */
      49     2940 MASQUERADE  all  --  *      *       0.0.0.0/0           !10.0.0.0/8           /* kubenet: SNAT for outbound traffic from cluster */ ADDRTYPE match dst-type !LOCAL

It looks like all traffic destined for something within 10.0.0.0/8 will not be source natted!

Where did it get this 10.0.0.0/8 network? As you can see here, kubelet automatically sets this cidr value when the --non-masquerade-cidr flag for kubelet is not specified. The intention is to ensure proper pod-to-pod communication within the cluster by preserving the pod's source IP. In my dev environment, I have manually generated host-routes, but you may be using flannel or another pod networking solution. Anyway, it looks like this will be a problem since our VirtualBox DNS falls into the 10.0.0.0/8 RFC 1918 range.

We could certainly work around this by changing our cluster-cidr to something like 192.168.0.0/16. We could also manually modify iptables rules on all of our worker nodes. Let's try something easier.

The ip-masq-agent

Enter the ip-masq-agent!

In Kubernetes 1.7 there is some new code that allows you to completely void out the creation of the --non-masquerade-cidr by setting its value to 0.0.0.0/0, i.e. --non-masquerade-cidr=0.0.0.0/0.

I will create ConfigMap that says I do not want to SNAT any traffic destined for the cluster, in my case the user-provided cluster-cidr and service-cluster-ip ranges: 10.200.0.0/16 and 10.32.0.0/16. This is pretty cool because the kubelet's --non-masquerade-cidr is limited to one CIDR. The template for the ConfigMap is available here. Here is my ConfigMap:

nonMasqueradeCIDRs:
  - 10.200.0.0/16
  - 10.32.0.0/16
masqLinkLocal: false
resyncInterval: 60s
  • masqLinkLocal determinws whether to masquerade traffic to 169.254.0.0/16. False by default.
  • resyncInterval sets the interval at which the agent attempts to reload config from disk.

Deploy the ip-masq-agent DaemonSet

kubectl create -f https://raw.githubusercontent.com/kubernetes-incubator/ip-masq-agent/v2.0.1/ip-masq-agent.yaml

We should see a brand new POSTROUTING rule that points to a new chain called IP-MASQ-AGENT:

ubuntu@worker1:~$ sudo iptables -t nat -vxnL POSTROUTING
Chain POSTROUTING (policy ACCEPT 4 packets, 240 bytes)
  pkts      bytes target     prot opt in     out     source               destination
 11235   674191 KUBE-POSTROUTING  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes postrouting rules */
 10067   604182 IP-MASQ-AGENT  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* ip-masq-agent: ensure nat POSTROUTING directs all non-LOCAL destination traffic to our custom IP-MASQ-AGENT chain */ ADDRTYPE match dst-type !LOCAL

Let's look at the new chain:

ubuntu@worker1:~$ sudo iptables -t nat -vxnL IP-MASQ-AGENT
Chain IP-MASQ-AGENT (1 references)
    pkts      bytes target     prot opt in     out     source               destination
       0        0 RETURN     all  --  *      *       0.0.0.0/0            169.254.0.0/16       /* ip-masq-agent: cluster-local traffic should not be subject to MASQUERADE */ ADDRTYPE match dst-type !LOCAL
      12      720 RETURN     all  --  *      *       0.0.0.0/0            10.200.0.0/16        /* ip-masq-agent: cluster-local traffic should not be subject to MASQUERADE */ ADDRTYPE match dst-type !LOCAL
       0        0 RETURN     all  --  *      *       0.0.0.0/0            10.32.0.0/16         /* ip-masq-agent: cluster-local traffic should not be subject to MASQUERADE */ ADDRTYPE match dst-type !LOCAL
       0        0 MASQUERADE  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* ip-masq-agent: outbound traffic should be subject to MASQUERADE (this match must come after cluster-local CIDR matches) */ ADDRTYPE match dst-type !LOCAL

Our non-masquerade rules have been applied!

Let's check DNS resolution:

ubuntu@worker1:~$ kubectl exec -it mypod /bin/sh
/ # nslookup kubernetes.io
Server:    10.32.0.10
Address 1: 10.32.0.10 kube-dns.kube-system.svc.cluster.local

Name:      kubernetes.io
Address 1: 23.236.58.218 218.58.236.23.bc.googleusercontent.com
/ #

It works!

Let's check out tcpdump again:

ubuntu@worker1:~$ sudo tcpdump -i enp0s3 dst host 10.0.2.3
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on enp0s3, link-type EN10MB (Ethernet), capture size 262144 bytes
12:30:35.522668 IP 10.0.2.15.65039 > 10.0.2.3.domain: 36319+ AAAA? google.com. (28)
12:30:35.522781 IP 10.0.2.15.65039 > 10.0.2.3.domain: 36319+ AAAA? google.com. (28)
12:30:35.522928 IP 10.0.2.15.63090 > 10.0.2.3.domain: 3284+ AAAA? google.com. (28)
12:30:35.523002 IP 10.0.2.15 > 10.0.2.3: ICMP 10.0.2.15 udp port 65039 unreachable, length 64
12:30:35.523099 IP 10.0.2.15.12876 > 10.0.2.3.domain: 44566+ AAAA? google.com. (28)
12:30:35.523476 IP 10.0.2.15.57223 > 10.0.2.3.domain: 37209+ PTR? 3.2.0.10.in-addr.arpa. (39)
12:30:35.523639 IP 10.0.2.15.40473 > 10.0.2.3.domain: 10359+ AAAA? google.com. (28)

Looks like SNAT is working for our request to the VirtualBox DNS.

If you are interested in diving more into customizing upstream dns servers and/or private dns zones, check out this recent kubernetes blog article.



Comments

comments powered by Disqus