At SDNCon this past week, I recreated my Kiwi Pycon Ryu Example by combining Docker and Open vSwitch. The process was fairly simple, just requiring some custom Dockerfiles and a bit of network scripting, but one element struck me as a kludge: the need to delete and recreate the Open vSwitch before starting the example containers, in order to predict which switches port numbers they were on. So I wanted to find a way to determine which switch port a new Docker container instance had been connected to.
A Docker container that is given a network interface will, by default,
end up with an interface with a randomly allocated MAC address, and a
randomly allocated IP address. The scripts used by my Docker and Open vSwitch
example
also used a randomly allocated MAC address (by the veth
network
interface), but statically assigned IP addresses to match the ones in
the original
example. As a
result there may or may not be any predictible (in advance) network
information that, eg, a SDN controller like
Ryu can look for to identify a specific
container connection.
By inspection it appears that a new connection to an Open vSwitch bridge will currently be allocated the next sequentially available OpenFlow port number for that bridge, starting with the first OpenFlow port number, 1. This is the kludge that my previous post relied on to have predictible port numbers. But firstly that is almost certainly an internal implementation detail that could change at any time, and secondly if a container is stopped and started again then new network interfaces will be allocated for the new container instance which will get new port numbers -- so guessing the OpenFlow port number for a given container depends not only on the order in which the containers are started, and the assumption of sequential port numbers, but also knowing how often the containers have been restarted since the Open vSwitch was created. That is hardly a recipe for reliable prediction (my previous example worked around this by deleting the Open vSwitch bridge and recreating it immediately before starting the example containers -- to reset the port counter).
It is also worth nothing that when the container is started then
stopped, the container end of the veth
link disappears which takes
the host end of the linnk down, but it appears the Open vSwitch
database retains the other end of the veth
link, even though Linux
has removed the interface. It is possible that if that stale
interface were manually removed from the Open vSwitch database that
Open vSwitch would reuse that old port number at some later point
-- which would make the OpenFlow port numbers even less predictible
than the apparent current implementation of "sequentially incrementing".
My aim was to find a way to start with a Docker container identifier and end up with an OpenFlow port number in use by the connection of that Docker container to a specific Open vSwitch bridge. So that port number could then be used to address the container's connection by the OpenFlow controller managing that Open vSwitch. After some investigation, and asking an Open vSwitch developer, I found there is a way to map from a running Docker container's network interface through to an OpenFlow port number on the Open vSwitch bridge -- but it also relies on Open vSwitch internal implementation detail. I wanted to document this path as even with the reliance on internal implementation detail it seems more reliable than "try to start things in a predictable order and guess the OpenFlow port numbers."
Given that we use veth
links to connect the Docker container to Open
vSwitch bridge, the overall approach is:
Translate the Docker container identifer to a Linux network namespace
Use that network namespace to get the ethernet interfaces in that Linux network namespace (and hence in that Docker container)
From the ethernet interface in the network namespace (container) get the "peer"
veth
SNMP ifIndex of the network interface that is outside the container (ie, the external end of theveth
link that actually got connected to the Open vSwitch bridge).Scan the network interfaces in the host machine to find the name that matches that SNMP ifIndex
Ask Open vSwitches internal tool (
ovs-appctl
) to tell us which OpenFlow port that host interface is connected to.
I have written this up into a script which automates these steps, but I wanted to detail them below for ease of reference. (See the end of this post for script usage information.)
Because of the use of an internal tool, this approach could break at any time -- but AFAICT at present Open vSwitch provides no non-internal means to perform this useful mapping from a container interface to an OpenFlow port. It seems to be assumed that the OpenFlow controller will identify connected devices via something other than the port number, but if the MAC address and IP address are also randomly allocated then there is no obvious network-visible attribute to use to locate the container connection.
Docker container to Linux network namespace
Suppose we have a container running with the name firewall_ext
, the
process looks like:
GUEST_NAME="firewall_ext"
# Find the Linux container device mountpoint
CGROUP_MOUNT=$(grep -w devices /proc/mounts | awk '{ print $2; }')
# Translate the Docker container name into a Docker Container ID
CONTAINER_SHORTID=$(docker ps -a |
awk 'substr($0,139) ~ '"/${GUEST_NAME}"'/ { print $1;}')
# eg, ab4e7d1d591a
# Find that container ID in the devices mountpoint
CONTAINER=$(find "${CGROUP_MOUNT}" -name "${CONTAINER_SHORTID}*")
#
# eg, /sys/fs/cgroup/devices/docker/ab4e7d1d591a16fe5f87702a61a15555accb02fb624f2eb84ff027741529454d
# Turn that container ID mount location into a Network namespace ID
NETNS=$(head -n 1 "${CONTAINER}/tasks")
#
# eg, 11082
(this process based on the process used in, eg, pipework and ovswork.sh.)
Getting ethernet interfaces in Linux network namespace
The ip netns
exec
command allows running ip
commands as if they were inside
the network namepsace. (See, eg, Scott Lowe's
post
about this feature.) Given this extremely helpful command, it is pretty
simple to make a list of all the ethernet network interfaces in the
container:
CONTAINER_ETH=$(sudo ip netns exec "${NETNS}" ip link show |
grep -B 1 link/ether | grep '^[0-9]' |
cut -f 2 -d : | sed 's/ //g;')
# eg, eth0
taken from full output like:
ewen@docker:~$ sudo ip netns exec 11082 ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode
DEFAULT group default
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
27: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
link/ether 9a:69:12:bc:3d:1d brd ff:ff:ff:ff:ff:ff
ewen@docker:~$
Find the SNMP ifIndex of the Host end of the veth
link
For each of those ethernet interfaces we found, we can use ethtool
-S
to find the other end of the veth
link:
HOST_IF_ID=$(sudo ip netns exec "${NETNS}" ethtool -S "${GUEST_IF}" |
awk '/peer_ifindex:/ { print $2; }')
# eg, 28
taken from full output like:
ewen@docker:~$ sudo ip netns exec 11082 ethtool -S eth0
NIC statistics:
peer_ifindex: 28
ewen@docker:~$
Find the interface names from the host SNMP ifIndex values
Conveniently ip link show
shows the SNMP ifIndex values, so we can
simply scan that output for the ifIndex value we want:
HOST_IF=$(ip link show | awk "/^${HOST_IF_ID}:/"' { print $2; }' |
cut -f 1 -d :)
# eg, vethp11082eth0
taken from (partial) full output like (trimmed for width, and length):
ewen@docker:~$ ip link show | tail -8 | cut -c 1-62
24: vethp10565eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
link/ether c6:03:3b:74:20:e5 brd ff:ff:ff:ff:ff:ff
26: vethp10565eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
link/ether 16:a7:2e:27:67:e5 brd ff:ff:ff:ff:ff:ff
28: vethp11082eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
link/ether ca:f8:49:a3:a5:f9 brd ff:ff:ff:ff:ff:ff
30: vethp11240eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
link/ether 56:fe:1b:09:d4:80 brd ff:ff:ff:ff:ff:ff
ewen@docker:~$
Finding where the veth
host end connects to Open vSwitch
(ETA, 2014-10-15: Also see update at end of this post for other, possibly more maintainable, ways to fetch this information.)
Open vSwitch maintains two port numbers for a given interface connected to a given Open vSwitch:
A Linux port number, which is global to all Open vSwitch managed interfaces on the host
An OpenFlow port number, which is local to a specific Opon vSwitch bridge
Multiple Open vSwitch bridges can have OpenFlow port number 1, which relate to different Linux interfaces -- but those will have different Linux port numbers (for the OpenFlow/Linux interface).
The Open vSwitch ovs-dpctl
tool is able to show the global (Linux) port
numbers that Open vSwitch is tracking for given interfaces:
ewen@docker:~$ ovs-dpctl show
system@ovs-system:
lookups: hit:61 missed:67 lost:0
flows: 0
port 0: ovs-system (internal)
port 1: kiwipycon (internal)
port 2: vandervecken (internal)
port 3: vlanswitch (internal)
port 4: vethp10565eth0
port 5: vethp10565eth1
port 6: vethp11082eth0
port 7: vethp11240eth0
ewen@docker:~$
(Note that most of the ovs-dpctl show
output is inexpecably indented by
one tab -- and it really is a tab not spaces -- for no particularly
obvious reason; I've translated them to spaces in this blog post to ensure
the visual alignment remains. But if parsing this output beware that it
is literally a tab character. Unfortunately many of the Open vSwitch tools
have this "non-trivial to parse" output format.)
After talking with an Open vSwitch developer, the only way to get
the OpenFlow port numbers of a specific Linux (veth
host)
interface on a specific Open vSwitch bridge is to use an Open vSwitch
internal tool, ovs-appctl
to run dpif/show
(a command not
documented in the ovs-appctl
manpage, but which is listed in the
ovs-vswitchd
manpage;
but unfortunately the ovs-appctl
version does not take arguments,
at least in the version in Ubuntu Linux 14.04 LTS -- ie, Open vSwitch
2.0.2).
This command outputs the internal Open vSwitch information which maps each interace on an Open vSwitch bridge through to both OpenFlow port number (which we want) and Linux port number (shown above):
ewen@docker:~$ sudo ovs-appctl dpif/show
system@ovs-system: hit:0 missed:54
flows: cur: 0, avg: 0, max: 14, life span: 0ms
hourly avg: add rate: 0.381/min, del rate: 0.381/min
overall avg: add rate: 1.000/min, del rate: 1.000/min
kiwipycon: hit:0 missed:7
kiwipycon 65534/1: (internal)
vandervecken: hit:0 missed:7
vandervecken 65534/2: (internal)
vlanswitch: hit:0 missed:40
vethp10565eth0 1/4: (system)
vethp10565eth1 2/5: (system)
vethp11082eth0 3/6: (system)
vethp11240eth0 4/7: (system)
vlanswitch 65534/3: (internal)
ewen@docker:~$
(as above, note the indenting by the tool is a tab character, not spaces but I have converted it to spaces to preserve the visual indent in this blog post).
In this output, for each Open vSwitch bridge, there is the Linux
interface name (the host end of the veth
link in our case), then a pair
of "OpenFlowPort/LinuxPort" -- ie, the first number maps to the OpenFlow
port number that we want, and the second number maps to the Linux port
number (eg, as returned by ovs-dpctl show
), which we do not currently
care about.
To parse the ovs-appctl dpif/show
information (which does require root
privileges to retrieve) we need to extract out the section starting with
the bridge name that we want, and finishing with the next bridge name
(or end of input):
get_ovs_portmap() {
BRIDGE="${1}"
sudo ovs-appctl dpif/show |
MATCH="${BRIDGE}" \
perl -ne 'BEGIN { $in_switch=0; }
if (/$ENV{MATCH}/) { $in_switch=1; next; }
if ($in_switch) {
if (/^(\t| {8})\S/) { $in_switch=0 }
else { print; }
}'
}
(later versions of Open vSwitch may take an argument to dpif/show
to
limit output to the section for a specific bridge, as we do with the
shell function above).
Given that output we can scan for the host veth
interface name
that we care about, and get the OpenFlow port number of the container
interface we started with as it conncts to that specific Open vSwitch
bridge:
OF_PORT_ID=$(get_ovs_portmap "${OVS_BRIDGE}" |
awk "/^\s*${HOST_IF}/ "'{ print $2; }' | cut -f 1 -d /)
# eg, 3
docker-ovs-port
script
The docker-ovs-port
script
automates all the above steps, given a container name or identifier and
an Open vSwitch bridge name:
ewen@docker:~$ ./docker-ovs-port firewall_ext
Usage: ./docker-ovs-port CONTAINER OVS_BRIDGE
ewen@docker:~$
it outputs three fields (in CVS format) for each ethernet interface of the Docker container):
interface name inside the Docker container (eg, eth0)
OpenFlow port number of that container on the named Open vSwitch bridge (if this field is empty, it is not connected to that OpenFlow bridge)
(for convenience) the MAC address of the ethernet interface in that Docker container (also useful to the OpenFlow controller, and easily obtained with `ip netns exec ${NETNS} ip link show``)
For instance:
ewen@docker:~$ ./docker-ovs-port firewall_ext vlanswitch
eth0,3,9a:69:12:bc:3d:1d
ewen@docker:~$
or for a container with multiple connections (in this case a "firewall" container, which routes/firewalls traffic between multiple VLANs on the Open vSwitch):
ewen@docker:~$ docker ps | grep trivial_firewall | cut -c 1-65
e7fd7cf5c4cc trivial_firewall:latest "/bin/sh /usr/local
ewen@docker:~$ ./docker-ovs-port e7fd7cf5c4cc vlanswitch
eth0,1,52:09:90:0b:db:f3
eth1,2,e6:0c:ad:58:93:1f
ewen@docker:~$
(here the container does not have a manually assigned named, so we find it by the image that it is running instead, and use that to get the Docker container ID).
ETA, 2014-10-14: An Open vSwitch developer pointed out that since Open vSwitch 2.1 (around 6 months old; newer than what is in Ubuntu 14.04 LTS -- also around 6 months old), there is a way to request that a particular interface be assigned a particular OpenFlow port on the Open vSwitch bridge, and this will be stored in the Open vSwitch database and reapplied on bridge restart if possible (eg, it does not conflict). The ovs-vswitchd.conf.db(5) man page (PDF) (apparently only available as a PDF) has more detail. It appears the syntax is something like:
ovs-vsctl add-port ${OVS_BRIDGE} ${ETH_PORT} -- set interface ${ETH_PORT} ofport_request=10
I can not easily test this as I do not have Open vSwitch 2.1 running anywhere, but that appears consistent with this example, and examples in the FAQ for "How do I configure Quality of Service (QoS)?".
However if the request cannot be satisified then some other port will be allocated, and the above process will be needed to find it again.
ETA, 2014-10-15: Further discussion turns up that this is actually a FAQ in the Open vSwitch FAQ (sadly one cannot link directly to a specific question, because Open vSwitch's online FAQ is just a GitHub view of a text file):
Q: How can I figure out the OpenFlow port number for a given port?
which offers several options:
An OpenFlow
OFPT_FEATURES_REQUEST
(returning aOFPT_FEATURES_REPLY
) includes the OpenFlow Port to Name mapping, which could then be parsed by an OpenFlow Controller looking for a specifc interface name (saving a step if you are already passing the information to an OpenFlow Controller).ovs-ofctl show ${OVS_BRIDGE}
prints the output ofOFPT_FEATURES_REPLY
in a format that could be parsed, eg:[....] 1(vethp10840eth0): addr:4a:87:60:b0:be:cc [....] 2(vethp10840eth1): addr:aa:14:8e:d3:3d:d8
and being a public interface hopefully this would be a little more stable (AFAICT the OpenFlow port number is first, then the Linux interface name inside parenthesis, then the MAC address of that host-end interface which isn't as useful to us).
ovs-vsctl get Interface ${INTERFACE_NAME} ofport
(which would need to be be combined withsudo ovs-vsctl iface-to-br ${INTERFACE_NAME}
to check that the interface is actually on the bridge you want -- otherwise the port number refers to some other OVS Bridge...).ovs-vsctl -- --columns=name,ofport list Interface
to get the whole table (two lines per interface), where -1 indicates "no OpenFlow port". (Trickier to parse, and lists all bridges, so probably not ideal.)
Finally, it looks like combining Docker and Open vSwitch is topical, with SocketPlane founded to explore that (started by some OpenDaylight developers).
Also in the same sort of space is Zettio's Weave for linking containers across mutiple hosts. (It appears not to be using Open vSwitch, but some sort of home-grown UDP encapsulation.)