Mind the Gap: 2014

Tuesday, August 19, 2014

A simple dependency injection convention for Go

Steve Powell and I are experimenting with a minimal DI convention for Go which simplifies how we construct values of Go interfaces and use mocks of dependencies during testing.

We are building a layered component and need to test the packages in isolation from each other. So we've introduced interfaces. Each interface usually has one exported creation function named something like NewXXX() which constructs a value of the interface. Packages in higher layers depend on interface values from packages in lower layers and so we pass these values into the creation function.

These NewXXX()functions are tied to the implementation of a package, so we wanted some way to organise the way in which these functions are used. The convention makes these functions private, with names like newXXX(), and then provides separate wiring functions which are used by higher layers and test code.

Although we work for SpringSource, we were hesitant to introduce code generation or reflective "magic" to do the wiring as those seemed contrary to Go's design philosophy of keeping things simple (even if a bit more code needs writing). So we made sure the wiring functions are patterned and simple to read and write.

The lowest layers have trivial wiring functions, for example:

func Wire() (Fileutils, error) {

return WireWith()

}

func WireWith() (Fileutils, error) {

return newFileutils()

}

whereas higher layers are somewhat more involved although extremely patterned and the wiring functions simply call lower level wiring functions, for example:

func Wire(depotPath string, rwBaseDir string) (warden.Backend,

error) {

rootfs, err := rootfs.Wire(rwBaseDir)

if err != nil {

return nil, err

}

configBuilder, err := config_builder.Wire()

if err != nil {

return nil, err

}

return WireWith(depotPath, rootfs, configBuilder)

}

func WireWith(depotPath string, rootfs rootfs.RootFS, configBuilder

config_builder.ConfigBuilder) (warden.Backend,

error) {

return newGuardianBackend(depotPath, rootfs, configBuilder)

}

Note: the examples above use standard Go errors, but our code actually uses a special error type described in a previous blog.

For a detailed description of the convention, see this commit log.

Tuesday, July 01, 2014

Scoping the libcontainer API

In order to use libcontainer as the basis for a new Warden backend, it needs upgrading. The requirements for a Container API over and above what libcontainer already provides are:

1. Multiple user processes: It must be possible to run multiple user processes in the same Container, either serially or in parallel. All user processes must run in the Container’s namespaces and control groups (so that they are all subject to the same isolation requirements and resource limits). User processes are peers in the sense that they share a common parent and one may terminate without terminating any others.

2. Container life cycle: It must be possible to create a new Container with no user processes. A Container should continue to exist after user processes have terminated.

3. Dynamic reconfiguration: It must be possible to reconfigure the Container after it has been created. There are some fundamental limitations in what can be reconfigured, but these must be kept to a bare minimum. In particular, it must be possible to reconfigure the Container’s control groups and network settings.

4. Container file system: It must be possible to share a single read-only root file system across multiple Container instances. A Container’s file system must be updatable and any updates made in one Container instance must not be visible to other Container instances.

5. Copying and streaming files: It must be possible to copy files and directories between the host and Container file systems. It must also be possible to stream files from the Container’s file system back to the host’s, for example to tail a log file.

Some of these requirements turn out to be beyond the scope of libcontainer as envisaged by its maintainers. In particular, since libcontainer's only consumer is Docker and Docker deals with file system layers, libcontainer avoids managing the read-write layer necessary to satisfy requirement 4. Requirement 5 is closely related to requirement 4.

Similarly, naming and managing collections of Container instances is deemed to be out of scope for libcontainer. So we envisage building an "ideal" Container API at a higher level than libcontainer and reusing libcontainer to provide the core function for a single Container:

It's conceivable that in the future when libcontainer has more consumers, it could step up to requirements 4 and 5, in which case it may be possible to move some of our higher level components down into libcontainer.

With the scope of libcontainer clarified, we are producing another revision of our API proposal which should be ready tomorrow. The first version was captured in a Google document; the next version will be a pull request.

Thursday, June 19, 2014

Containers in Cloud Foundry: warden meets libcontainer

Pivotal has decided to merge Cloud Foundry's container technology with Docker's. I'm delighted to be working on this project with my friend and colleague Steve Powell. This blog sketches our initial thoughts.

Steve and I have been exploring how to improve CF's Warden container component to make it a suitable base for future maintenance and extension. Warden is functionally rich, but needs some improvements in robustness, diagnostics, testing, and so on. Rewriting Warden was initially attractive, but would be unlikely to gain much traction in the open source community. Reusing Docker's container technology turned out to be the best option.

Now that Docker has split out the libcontainer project, basing a new Warden Linux backend on libcontainer is feasible: CF will benefit from container backend improvements in libcontainer and libcontainer will be strengthened by supporting the CF use case.

Introduction to Warden

Warden defines an external interface in terms of Google protocol buffers and implements a server which receives protocol buffer requests from clients such as Cloud Foundry's Droplet Execution Agent (DEA) component:

Detailed container management is delegated to a warden backend. There are backends for Linux and Windows. The Linux backend creates containers each of which contains a warden shell daemon (wshd) which is responsible for managing applications and performing other functions inside the container. The backend connects to wshd via TCP/IP using a file-based socket. The container is implemented in terms of Linux primitives such as namespaces and control groups.

Introduction to libcontainer

The docker daemon delegates to an execution driver to create and manage containers. The default "native" execution driver is based on the reusable library of container management functions known as libcontainer:

libcontainer implements its containers in terms of similar Linux primitives to those used by Warden: namespaces, control groups, etc.

The Way Forward

We plan to extend libcontainer on two fronts:

functionally - to close the gaps relative to warden, and
non-functionally - to improve robustness, maintainability, and serviceability.

We'll replace the existing Warden Linux backend with a new Warden backend based on the extended libcontainer. We hope that Docker will also be able to exploit the extensions to libcontainer, for instance by exposing new monitoring, management, and diagnostic capabilities.

In simple terms, the new Warden backend will have a layered structure:

Note that the libcontainer API does not yet exist, but is currently being proposed and discussed.

Initially, we plan to use the unextended libcontainer API to launch an agent which will play a similar role to that which wshd did previously:

Eventually, it may be possible to merge the agent functions into libcontainer, which would make the agent's active management functions more easily available to other users of libcontainer, such as Docker:

Tuesday, April 29, 2014

Gathering diagnostic context: an improved error idiom for Go

After discussing Better error handling idioms in Go and casting around a while, nothing turned up which was ideally suited to our project, so a colleague and I implemented the gerror package which captures stack traces when errors are created and enables errors to be identified without relying on the error message content.

So how does it look to a user? The first code to use it is a fileutils package which implements a file copy function in pure Go (no "shelling out" to cp). Let's take a look at a typical piece of error handling:

src, err := os.Open(source)
if err != nil {
return gerror.NewFromError(ErrOpeningSourceDir, err)
}

What does this achieve over and above the normal Go idiom of simply returning err if it is non-nil?

Firstly, it captures a stack trace in the error which appears when the error is logged (see below for an example). This gives the full context of the error which, in a large project, avoids guesswork and saves time locating the source of the error.

Secondly, it associates a "tag", in the form of an error identifier, with the error. Callers can use the tag if they want to check for particular errors programmatically:

gerr := fileutils.Copy(destPath, srcPath)
if gerr.EqualTag(fileutils.ErrOpeningSourceDir) {
...
}

Thirdly, the resultant error conforms to the builtin error interface and so can be returned or passed around wherever an error is expected.

Fourthly, by defining a function's error return type to be gerror.Gerror, the compiler prevents a "vanilla" error being returned accidentally from the function, which is useful when we want to ensure that all errors have stack traces and tags.

So how are error identifier tags defined? It's easy, as this code from fileutils shows:

type ErrorId int

const (
ErrFileNotFound ErrorId = iota
ErrOpeningSourceDir
...
)

Note that this has an advantage over the approach of using variables to refer to specific errors - variables can be overwritten (see this example of issue 7885), whereas constants cannot.

When errors are constructed, the gerror package stores the tag and its type. Both the tag and its type are included in the error string (returned, as usual, by the Error method) and are used when checking for equality in the EqualTag method.

The tag type is logically of the form package.Type which could be ambiguous if two packages had the same name, but the stack trace avoids the ambiguity. For example the following stack trace of a "file not found" error makes it clear that the tag type fileutils.ErrorId refers to the type in the package github.com/cf-guardian/guardian/kernel/fileutils:

0 fileutils.ErrorId: Error caused by: lstat /tmp/fileutils_test-027950024/src.file: no such file or directory
goroutine 8 [running]:
github.com/cf-guardian/guardian/gerror.NewFromError(0xe49a0, 0x0, 0x3484c8, 0xc21000aa80, 0x3484c8, ...)
/Users/gnormington/go/src/github.com/cf-guardian/guardian/gerror/gerror.go:68 +0x8d
github.com/cf-guardian/guardian/kernel/fileutils.fileMode(0xc21000a960, 0x26, 0xc21000a9c0, 0x29, 0x0)
/Users/gnormington/go/src/github.com/cf-guardian/guardian/kernel/fileutils/fileutils.go:196 +0x6b
github.com/cf-guardian/guardian/kernel/fileutils.doCopy(0xc21000a9c0, 0x29, 0xc21000a960, 0x26, 0xc21000a960, ...)
/Users/gnormington/go/src/github.com/cf-guardian/guardian/kernel/fileutils/fileutils.go:68 +0x87
github.com/cf-guardian/guardian/kernel/fileutils.Copy(0xc21000a9c0, 0x29, 0xc21000a960, 0x26, 0x29, ...)
/Users/gnormington/go/src/github.com/cf-guardian/guardian/kernel/fileutils/fileutils.go:61 +0x14f
...

The gerror package is available as open source on github and is licensed under the Apache v2 license. It's really a starting point and others are free to use it "as is" or adapt it to their own needs. If you have an improvement you think we might like, please read our contribution guidelines and send us a pull request.

Friday, April 11, 2014

Better error handling idioms in Go

How often have you seen, or written, Go code like this?

file, err := os.Open("someFile")

if err != nil {

return err

}

Explicit, inline error handling is necessary since Go doesn't have exceptions. The code is sufficient for small programs, even if the error is returned from more than one level of function call. Hopefully at some point the error is logged and it's fairly easy then to guess what caused the error, especially since os.Open returns a failure helpful error string, for example:

"open someFile: No such file or directory"

However, in larger programs, this approach breaks down. There tend to be too many calls which could have returned the error and (an unbounded amount of) work has to be done to isolate the failing call.

We'd like to see a stack trace, so let's add one as soon as we detect the error.

file, err := os.Open("someFile")

if err != nil {

var stack [4096]byte

runtime.Stack(stack[:], false)

log.Printf("%q\n%s\n", err, stack[:])

return err

}

The resultant log looks something like this (at least in the playground):

2009/11/10 23:00:00 "open someFile: No such file or directory"

goroutine 1 [running]:
main.main()
 /tmpfs/gosandbox-xxx/prog.go:15 +0xe0
runtime.main()
 /tmp/sandbox/go/src/pkg/runtime/proc.c:220 +0x1c0
runtime.goexit()
 /tmp/sandbox/go/src/pkg/runtime/proc.c:1394

which is better than simply seeing the error message from os.Open.

Clearly this is too much code to write after each call, but some of the code can be moved into a custom error type. (Also 4K isn't enough to capture deep stack traces which is a shame when there is enough free memory available. Maybe there's room for improvement in the runtime package?)

An important consideration is that of "soft" errors - errors which don't appear to need diagnosing at one level of the stack, but which turn out to be more serious from the perspective of one (or more) of the callers. It will probably be too expensive to capture a stack trace every time an error is detected. But it may be sufficient for the first caller which regards the error as serious to capture the stack trace. The combination of a stack trace of this caller and a reasonably helpful error message may be good enough in most cases of soft errors.

Another consideration is logging of errors. It can be very distracting to see the same error logged over and over again. So it might be necessary to keep state in an error to record whether it has already been logged.

I'm interested to hear what error handling practices are evolving in the Go community. An early blog acknowledges the problem:

It is the error implementation's responsibility to summarize the context.

but doesn't address the difficulty of large codebases where the immediate program context isn't always sufficient to diagnose problems.

Some will argue for adding exceptions to Go, but I think that may be overkill, especially for soft errors. I like explicit error handling as it encourages good recovery logic. However, there may be room for improvement in the way the context of an error can be captured. Let's see what nice idioms are beginning to emerge...

Mind the Gap