A short talk, with code, on creating and using network namespaces from Go.
This post begins with a question Max Weber asked his students in 1917:
“Does it mean that we, today, for instance, everyone sitting in this hall, have a greater knowledge of the conditions of life under which we exist than has an American Indian or a [Khoisan]? Hardly. Unless he is a physicist, one who rides on the streetcar has no idea how the car happened to get into motion. And he does not need to know. He is satisfied that he may ‘count’ on the behavior of the streetcar, and he orients his conduct according to this expectation; but he knows nothing about what it takes to produce such a car so that it can move.”1
Modernity, but software in particular, requires us to ‘count’ on a good many under-understood streetcars. And as software engineers, we’re usually just one twist away from having to fix, to misuse, or to rebuild any one of them.
So in this post, the streetcar is Docker or any other Linux container runtime. And the motor we’ll be misusing, somewhere inside it all, is the Linux network namespace.
Code/TLDR: github.com/hblanks/sketches/2020-02-21-netns
Motivation
The author/ex-CDN2 engineer’s work on the hobbyist control plane has continued, reaching the predictable point where one Go program (the “agent”) must create interfaces with arbitrary addresses, and it must keep these addresses separated by tenant. Because the control plane deploys on Linux, we’ll need to use Linux namespaces to do this – the same technology we use every time we start a Docker container or deploy a Kubernetes pod.
Namespaces are the fundamental primitive for ensuring a container has
its own filesystem root, its own process ID space, and (barring things
like --net host
) its own network addresses and interfaces. But, how do
you actually work with them? If, instead of riding on the Docker
streetcar, you had to create one from your own program, and to make it
persist, how would you do it?
I began again with a sketch. The goal: a simple tool that would look like this:
$ ./setns -h
Usage: ./build/setns NAMESPACE COMMAND ARG...
Runs COMMAND in the given named network NAMESPACE,
creating the namespace if it doesn't already exist.
Which is to say, something entirely analogous to the ip netns
tool
offered by iproute2
(ip-netns(8)
).
But written in Go, since that’s what the control plane’s written in.
Creating namespaces
The first step for the tool, assuming the namespace doesn’t already exist, is to create it. Linux provides two system calls for this, both called (as they must be) from an existing process:
clone(2)
copies your process into a new namespace and new process, andunshare(2)
moves your existing process into a new namespace.
clone(2)
turns out to be a fascinating streetcar of its own: for Linux, it’s the
fundamental primitive for creating light-weight
processes,
generally known as threads.3 But, because the control
plane agent is written in Go, it’s not a good fit: in Go, the Go runtime
alone manages threads, scheduling goroutines on them as it sees
fit. It thus won’t work for us to start a new thread every time we want
to operate on a different namespace.
In contrast, unshare(2)
,
works within an existing process. A working Go
example follows, including the right flags for creating a new
network namespace, plus saving off a file descriptor to the
original namespace (more on that in a minute):
// Log any new error. For use when closing a file during
// error handling.
func closeFile(f *os.File) {
if err := f.Close(); err != nil {
log.Printf("close file error: %v", err)
}
}
// Unshare into a new namespace, returning the original
// namespace.
func unshare() (*os.File, error) {
f, err := os.Open("/proc/self/ns/net")
if err != nil {
return nil, err
}
_, _, e1 := syscall.Syscall(
syscall.SYS_UNSHARE, syscall.CLONE_NEWNET, 0, 0)
if e1 != 0 {
closeFile(f)
return nil, e1
}
return f, nil
}
Joining an existing namespaces
If instead the network namespace already exists, the tool needs to
specify and to join that namespace instead of creating one. For this,
Linux provides a slightly different system call,
setns(2)
,
which takes two arguments:
- “A file descriptor referring to a namespace,” and
- A flag specifying which namespaces to change (network, pid, user, cgroup, mount, etc.).
A working Go example for changing the network namespace, given an open file, is:
// Sets namespace to the given open file.
func setns(f *os.File) error {
_, _, e1 := syscall.Syscall(
SYS_SETNS, f.Fd(), syscall.CLONE_NEWNET, 0)
if e1 != 0 {
return e1
}
return nil
}
Naming namespaces
So far, we’ve seen how to create namespaces and how to join existing namespaces, assuming we have a file descriptor to that namespace. But how do we get the file descriptor?
For any namespace attached to a process, it’s very simple: you open the
corresponding file under /proc/${PID}/ns
, or for your own process,
the corresponding file under /proc/self/ns/
. (In fact, that’s what we
did above with /proc/self/ns/net
.)
But if we don’t have a process to refer to, we need to do something more complicated: after we create and enter a namespace, we need to save off that namespace to a different path, so we can refer to it later.
File descriptors aren’t something we generally just “save off”. But sudo
strace ip netns add ns0
shows us the way. With comments to explicate
the text:
# Create a new, empty file, /var/run/netns/ns0
openat(AT_FDCWD, "/var/run/netns/ns0",
O_RDONLY|O_CREAT|O_EXCL, 000) = 5
close(5) = 0
# Create a new namespace
unshare(CLONE_NEWNET) = 0
# Bind mount the new namespace to /var/run/netns/ns0
mount("/proc/self/ns/net", "/var/run/netns/ns0",
0x562c7c23d9a5, MS_BIND, NULL) = 0
Normally, a namespace is only supposed to last as long as either (1) there’s at least one process in it, or (2) there’s at least one open file descriptor pointing to that namespace. Bind-mounting the namespace’s file, however, lets us persist it even when there are no processes present and no open file descriptors pointing to it.
The working example, then, for not just creating a namespace, but for
“naming” it by bind mounting it into /var/run/netns
(an arbitrary
path, but the same path used by iproute2), is:
// Enters a new namespace and bind mounts it to
// /var/run/netns/${name}, returning an open file to the
// original namespace.
func createNamespace(name string) (*os.File, error) {
origF, err := unshare()
if err != nil {
return nil, err
}
if err := mountNamespaceDir(); err != nil {
closeFile(origF)
return nil, err
}
nsPath := filepath.Join(nsDir, name)
f, err := os.Create(nsPath)
if err != nil {
closeFile(origF)
return nil, err
}
if f.Close(); err != nil {
closeFile(origF)
return nil, err
}
err = syscall.Mount("/proc/self/ns/net",
nsPath, "", syscall.MS_BIND, "")
if err != nil {
closeFile(origF)
return nil, err
}
return origF, nil
}
And our top level function, which joins a named network namespace (and creates it if necessary) is:
// Opens the file for a given namespace
func openNamespace(name string) (*os.File, error) {
return os.Open(filepath.Join("/run/netns", name))
}
// Sets namespace to /var/run/netns/${name}, creating
// that namespace if necessary.
//
// Returns an open file pointing to the original namespace.
func setNamespace(name string) (*os.File, error) {
newFile, err := openNamespace(name)
if os.IsNotExist(err) {
origFile, err := createNamespace(name)
if err != nil {
return nil, err
}
return origFile, nil
} else if err != nil {
return nil, err
}
origFile, err := os.Open("/proc/self/ns/net")
if err != nil {
closeFile(newFile)
return nil, err
}
if err := setns(newFile); err != nil {
closeFile(newFile)
closeFile(origFile)
return nil, err
}
return origFile, nil
}
A final wrinkle: LockOSThread()
One last and Go-specific concern: the system calls above all expect to be made from the same process. But in Go, the scheduler can and does move goroutines transparently from one light weight process to another. So by default, we have no guarantee that our syscalls will happen safely and from the same process.
Thankfully, since Go 1.10, it’s been possible to temporarily lock a
goroutine to a single thread. Thus, the top-level function in our
example obtains this lock, joins the namespace, and calls exec(2)
to
replace our process with the user-supplied command:
// Executes a command in a given namespace.
func execNamespace(name string, args []string) error {
runtime.LockOSThread()
defer runtime.UnlockOSThread()
f, err := setNamespace(name)
if err != nil {
return err
}
arg0, err := exec.LookPath(args[0])
if err != nil {
closeFile(f)
return err
}
return syscall.Exec(arg0, args, os.Environ())
}
So ends this short sketch and discussion of network namespaces, finished during one more winter visit to London, a home for some three years. Working code remains in:
and all comments or corrections are welcome as issues there, by email, or wherever this ends up on the lobste.rs.
Errata
- 2020-03-06: when integrating this code, I ran into a bug in
createNamespace()
, fixed in3afba981
.
Sources
namespaces(7)
network_namespaces(7)
clone(2)
unshare(2)
setns(2)
ip-netns(8)
- Code samples: hblanks/sketches/2020-02-21-netns
Max Weber. “Science as Vocation,” from From Max Weber: Essays in Sociology. Trans., ed. by H. H. Gerth and C. Wright Mills. New York: Oxford University Press, 1946, p. 136. (My emphasis) This is quote from an altogether an extraordinary and prescient essay about what science once meant and was already ceasing to mean by the nadir of the First World War. I first read it for a class audited in 2003; I still think about it today.
↰I think this is where I’m supposed to say, “It wasn’t a CDN, it was an intelligent edge network.” Comments on how to rebrand the hobbyist control plane similarly, both for better VC pitches and blog posts are much sought and appreciated. :-)
↰Top of Google and in-depth articles on
↰clone(2)
include Chris Wellon’s Raw Linux Threads Via System Calls and Eli Bendersky’s Launching Linux threads and processes with clone.