Packet flow in OpenShift SDN

KCD Chennai - May 20 '22 - - Dev Community

Author: Red Hat Inc. (Platinum Sponsor, KCD Chennai 2022)

OpenShift Container Platform uses Software Defined Networking (SDN) as the default Cluster Network Interface (CNI) provider. OpenShift SDN sets up cluster networking which facilitates the communication between the pods. It does so by configuring an overlay network using Open VSwitch (OVS).

In every node, an OVS bridge (br0) is created. Whenever any packet reaches the OVS bridge, the packet is checked against the flow rules in the flow-tables. After processing the packet through the flow-tables, all the actions corresponding to the matching flow rules are applied to the packet.

An internal interface (tun0) is configured on each node and is connected to the OVS bridge via a port. A route is added in the node’s routing table for sending all traffic destined for the pod network CIDR to the tun0 interface. The tun0 will send it to the OVS bridge for processing the packet through the flow tables.

OpenShift SDN uses Virtual Extensible LAN (VXLAN) tunnels to set up an overlay network between all the nodes in a cluster. The VXLAN tunnels are used to send traffic from one node to another, thus also enabling communication between pods belonging to different nodes. In every node, a VXLAN interface (vxlan0) is configured and is also connected to the OVS bridge via a port. Whenever a packet is needed to be forwarded from one node to another one, the bridge (br0) on the first node sends the packet to the corresponding vxlan0 interface. The vxlan0 interface then encapsulates the packet and sends it to the other node. When the packet is received by the vxlan0 interface on the other node, it first removes the encapsulation from the packet and then sends the packet to the bridge (br0) for further processing.

Every pod that is provisioned on a node is configured with a virtual ethernet pair, one interface in the pod’s network namespace and the other interface in the node’s network stack. The ethernet interface that is configured in the node’s network stack is connected to the OVS bridge via a port.

There can be 3 different types of communications a pod can have: 1) Pod to pod communication, 2) Pod to service communication, and 3) Pod to external host communication. The packet flow from source to destination for these different scenarios are described in the following sections.

Pod to pod communication

In pod to pod communication, a pod tries to communicate with another pod directly. For example, pod A wants to communicate with pod B. In this case, directly communicating would mean that either pod A uses pod B’s DNS name or pod B’s IP address. The two pods may be in the same node or in different nodes. In the following sections we will see how both scenarios are handled in OpenShift SDN.

In the same node

Fig. 1 shows how pod to pod communication takes place when both the pods belong to the same node.

Pod to pod communication in the same node

Fig. 1: Pod to pod communication in the same node.

The pod A sends a packet to its own ethernet interface (eth0) with pod B’s address as the destination. From eth0, the packet is sent to the corresponding virtual ethernet (veth0) interface in the node’s network stack. Since veth0 is connected to the OVS bridge (br0) the packet is forwarded to it. When the bridge receives the packet, it is checked for matches in the flow-tables and the corresponding actions are applied to the packet. As the packet is destined for pod B in the same node, the bridge forwards the traffic to the virtual ethernet interface (veth1) in the node’s network stack corresponding to pod B’s ethernet interface (eth0).

In different nodes

Fig. 2 shows how pod to pod communication takes place when the communicating pods belong to the different nodes.

Pod to pod communication in the different nodes.

Fig. 2: Pod to pod communication in the different nodes.

The pod A sends a packet to its own ethernet interface (eth0) with pod B’s address as the destination. From eth0, the packet is sent to the corresponding virtual ethernet (veth0) interface in the node’s network stack. As veth0 is connected to the OVS bridge (br0) the packet is forwarded to it. The actions of the matching flows in the flow-tables are executed and the bridge forwards the traffic to the Virtual Extensible LAN interface (vxlan0) of the node. The reason behind this is that the packet is destined for a pod in a different node.
The vxlan0 encapsulates the packet and sends the traffic from node 1 to node 2 through the VXLAN tunnel. Once the packet is received by the vxlan0 interface in node 2, the encapsulation is removed. Then, the vxlan0 interface forwards the packet to the bridge (br0) of node 2. The packet is checked for matches in the flow-tables. As the packet is destined for the local pod B the corresponding flows match, and the packet is forwarded to the virtual ethernet interface (veth0) in the node’s network stack corresponding to pod B’s ethernet interface (eth0).

Pod to Pod communication via service

In this scenario, a pod tries to communicate with another set of pods by using the DNS name or IP address of a service. For example, pod A wants to communicate with a set of pods which are represented by service S. Lets say, packets sent by pod A to service S get redirected to pod B. Pods A and B may be in the same node or in different nodes. In the following sections we will see how both scenarios are handled in OpenShift SDN.

In the same node

Fig. 3 shows how pod to pod communication via service takes place when both the pods belong to the same node.

Pod to service communication in the same node.

Fig. 3: Pod to service communication in the same node.

The pod A sends a packet to its own ethernet interface (eth0) with service S’s address as the destination. As eth0’s corresponding virtual ethernet (veth0) in the nodes’s network stack is connected to the OVS bridge (br0), the packet is forwarded to it. The packet is checked for matches in the flow-tables and the corresponding actions are executed. As the packet is destined for the service S, it is forwarded to the internal interface (tun0). Before forwarding the packet, the iptables rules are processed and the backend pod (pod B) is chosen for service S. The destination of the packet is now modified to pod B from service S. Thus, the packet is sent to tun0 as it is destined for a pod which, in turn, forwards the packet to the bridge (br0). The packet is matched with the flow rules in the flow-tables and it is forwarded to the virtual ethernet interface (veth1) in the node’s network stack corresponding to pod B’s ethernet interface (eth0).

In different nodes

Fig. 4 shows how pod to pod communication via service takes place when the communicating pods belong to the different nodes.

Pod to service communication in different nodes

Fig. 4: Pod to service communication in different nodes.

The pod A sends a packet to its own ethernet interface (eth0) with service S’s address as the destination. As eth0’s corresponding virtual ethernet (veth0) in the nodes’s network stack is connected to the OVS bridge (br0), the packet is forwarded to it. The packet is checked for matches in the flow-tables and the corresponding actions are executed. As the packet is destined for the service S, it is forwarded to the internal interface (tun0). Before forwarding the packet, the iptables rules are processed and the backend pod (pod B) is chosen for service S. The destination of the packet is now modified to pod B from service S. The packet is sent to tun0 as it is destined for a pod which in turn forwards the packet to the br0 interface. As the packet is destined for a pod B in a different node the corresponding flow rules match, the bridge forwards the traffic to the virtual extensible LAN interface (vxlan0).

The vxlan0 encapsulates the packet and sends the traffic from node 1 to node 2 through the VXLAN tunnel. Once the packet is received by the vxlan0 interface in node 2, the encapsulation is removed. Then, the vxlan0 interface forwards the packet to the bridge (br0) of node 2. The packet is checked for matches in the flow-tables. As the packet is destined for the local pod B the corresponding flow rules match, and the packet is forwarded to the virtual ethernet interface (veth0) in the node’s network stack corresponding to pod B’s ethernet interface (eth0).

Pod to external host communication

In this case, a pod tries to communicate with an external host. Traffic can be sent from a pod to an external host in different ways: 1) through special pods called Egress Router pods, 2) through the same node using the node IP, and 3) through Egress IP (either on same node or different node). In the following sections we will see how all the scenarios are handled in OpenShift SDN.

Through Egress Router

Fig. 5 shows how pod to external host communication takes place via an Egress Router pod.

Pod to external host communication through Egress Router

Fig. 5: Pod to external host communication through Egress Router.

The Egress Router pod is configured with an additional MACVLAN interface (macvlan0) in the node’s network stack. The macvlan0 is assigned an IP address and any traffic sent out through the interface is NAT-ed using the assigned IP address.

The pod A sends a packet with the Egress Router pod’s address as the destination. In this example the Egress Router pod is provisioned on a different node. However, the Egress Router pod can also be on the same node as well. The packet flow, in both the cases, from pod A to Egress Router pod would happen in the same way as explained above in the pod to pod communication section. Once the packet is received by the Egress Router pod, it will send the packet to the External Host (based on how the Egress Router pod is configured). The source address of the packet would be the IP address of the macvlan0 interface.

Through the same node

Fig. 6 shows how pod to external host communication takes place through the same node using the node IP.

Pod to external host communication through the same node.

Fig. 6: Pod to external host communication through the same node.

Whenever a pod tries to send a packet to an external host, the default method is sending the packet from the same node. When pod A sends a packet destined for the external host, the packet is matched against the flow rules after it reaches the OVS bridge (br0). As the destination is an external host, the packet is sent to the internal interface (tun0) in the node’s network stack. Then, the packet is sent through the eth0 interface after NAT-ing the packet using the node IP assigned on eth0 interface.

Through Egress IPs

Fig. 7 shows how pod to external host communication takes place through an Egress IP.

Pod to external host communication through Egress IP

Fig. 7: Pod to external host communication through Egress IP.

As mentioned above, the default method of sending a packet, destined for an external host, is through the same node using the Node IP. However, a namespace can be configured with Egress IP(s) for sending packets to external hosts. The Egress IPs will be provisioned on the ethernet interfaces of the nodes in addition to the Node IP. When a pod which belongs to a namespace configured with Egress IP(s) sends a packet destined for an external host, then the packet would first be sent to the node on which the chosen Egress IP is provisioned. If the chosen Egress IP is provisioned on the same node then it would be similar to sending the packet using the Node IP, except the NAT-ed packet would have the chosen Egress IP as the source address. If the chosen Egress IP is provisioned on another node, then for sending the packet from one node to another, the VXLAN tunnel is used. Once the packet reaches the OVS bridge (br0) on the node with the chosen Egress IP, the packet is matched against the flow rules. The packet is then sent to the tun0 interface which, in turn, sends the packet to the eth0 interface to be sent to the external host. Before the packet is forwarded to the external host, it is NAT-ed using the chosen Egress IP.

What’s next

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .