Tuesday, April 29, 2014

Gathering diagnostic context: an improved error idiom for Go

After discussing Better error handling idioms in Go and casting around a while, nothing turned up which was ideally suited to our project, so a colleague and I implemented the gerror package which captures stack traces when errors are created and enables errors to be identified without relying on the error message content.

So how does it look to a user? The first code to use it is a fileutils package which implements a file copy function in pure Go (no "shelling out" to cp). Let's take a look at a typical piece of error handling:

    src, err := os.Open(source)
    if err != nil {
        return gerror.NewFromError(ErrOpeningSourceDir, err)
    }

What does this achieve over and above the normal Go idiom of simply returning err if it is non-nil?

Firstly, it captures a stack trace in the error which appears when the error is logged (see below for an example). This gives the full context of the error which, in a large project, avoids guesswork and saves time locating the source of the error.

Secondly, it associates a "tag", in the form of an error identifier, with the error. Callers can use the tag if they want to check for particular errors programmatically:

    gerr := fileutils.Copy(destPath, srcPath)
    if gerr.EqualTag(fileutils.ErrOpeningSourceDir) {
        ...
    }

Thirdly, the resultant error conforms to the builtin error interface and so can be returned or passed around wherever an error is expected.

Fourthly, by defining a function's error return type to be gerror.Gerror, the compiler prevents a "vanilla" error being returned accidentally from the function, which is useful when we want to ensure that all errors have stack traces and tags.

So how are error identifier tags defined? It's easy, as this code from fileutils shows:

type ErrorId int

const (
  ErrFileNotFound ErrorId = iota
  ErrOpeningSourceDir
...
)

Note that this has an advantage over the approach of using variables to refer to specific errors - variables can be overwritten (see this example of issue 7885), whereas constants cannot.

When errors are constructed, the gerror package stores the tag and its type. Both the tag and its type are included in the error string (returned, as usual, by the Error method) and are used when checking for equality in the EqualTag method.

The tag type is logically of the form package.Type which could be ambiguous if two packages had the same name, but the stack trace avoids the ambiguity. For example the following stack trace of a "file not found" error makes it clear that the tag type fileutils.ErrorId refers to the type in the package github.com/cf-guardian/guardian/kernel/fileutils:

0 fileutils.ErrorId: Error caused by: lstat /tmp/fileutils_test-027950024/src.file: no such file or directory
goroutine 8 [running]:
github.com/cf-guardian/guardian/gerror.NewFromError(0xe49a0, 0x0, 0x3484c8, 0xc21000aa80, 0x3484c8, ...)
/Users/gnormington/go/src/github.com/cf-guardian/guardian/gerror/gerror.go:68 +0x8d
github.com/cf-guardian/guardian/kernel/fileutils.fileMode(0xc21000a960, 0x26, 0xc21000a9c0, 0x29, 0x0)
/Users/gnormington/go/src/github.com/cf-guardian/guardian/kernel/fileutils/fileutils.go:196 +0x6b
github.com/cf-guardian/guardian/kernel/fileutils.doCopy(0xc21000a9c0, 0x29, 0xc21000a960, 0x26, 0xc21000a960, ...)
/Users/gnormington/go/src/github.com/cf-guardian/guardian/kernel/fileutils/fileutils.go:68 +0x87
github.com/cf-guardian/guardian/kernel/fileutils.Copy(0xc21000a9c0, 0x29, 0xc21000a960, 0x26, 0x29, ...)
/Users/gnormington/go/src/github.com/cf-guardian/guardian/kernel/fileutils/fileutils.go:61 +0x14f
                        ...

The gerror package is available as open source on github and is licensed under the Apache v2 license. It's really a starting point and others are free to use it "as is" or adapt it to their own needs. If you have an improvement you think we might like, please read our contribution guidelines and send us a pull request.

Friday, April 11, 2014

Better error handling idioms in Go

How often have you seen, or written, Go code like this?

file, err := os.Open("someFile")
if err != nil {
    return err
}

Explicit, inline error handling is necessary since Go doesn't have exceptions. The code is sufficient for small programs, even if the error is returned from more than one level of function call. Hopefully at some point the error is logged and it's fairly easy then to guess what caused the error, especially since os.Open returns a failure helpful error string, for example:

"open someFile: No such file or directory"

However, in larger programs, this approach breaks down. There tend to be too many calls which could have returned the error and (an unbounded amount of) work has to be done to isolate the failing call.

We'd like to see a stack trace, so let's add one as soon as we detect the error.

file, err := os.Open("someFile")
if err != nil {
    var stack [4096]byte
    runtime.Stack(stack[:], false)
    log.Printf("%q\n%s\n", err, stack[:])
    return err
}

The resultant log looks something like this (at least in the playground):

2009/11/10 23:00:00 "open someFile: No such file or directory"
goroutine 1 [running]:
main.main()
 /tmpfs/gosandbox-xxx/prog.go:15 +0xe0
runtime.main()
 /tmp/sandbox/go/src/pkg/runtime/proc.c:220 +0x1c0
runtime.goexit()
 /tmp/sandbox/go/src/pkg/runtime/proc.c:1394
which is better than simply seeing the error message from os.Open.

Clearly this is too much code to write after each call, but some of the code can be moved into a custom error type. (Also 4K isn't enough to capture deep stack traces which is a shame when there is enough free memory available. Maybe there's room for improvement in the runtime package?)

An important consideration is that of "soft" errors - errors which don't appear to need diagnosing at one level of the stack, but which turn out to be more serious from the perspective of one (or more) of the callers. It will probably be too expensive to capture a stack trace every time an error is detected. But it may be sufficient for the first caller which regards the error as serious to capture the stack trace. The combination of a stack trace of this caller and a reasonably helpful error message may be good enough in most cases of soft errors.

Another consideration is logging of errors. It can be very distracting to see the same error logged over and over again. So it might be necessary to keep state in an error to record whether it has already been logged.

I'm interested to hear what error handling practices are evolving in the Go community. An early blog acknowledges the problem:
It is the error implementation's responsibility to summarize the context.
but doesn't address the difficulty of large codebases where the immediate program context isn't always sufficient to diagnose problems.

Some will argue for adding exceptions to Go, but I think that may be overkill, especially for soft errors. I like explicit error handling as it encourages good recovery logic. However, there may be room for improvement in the way the context of an error can be captured. Let's see what nice idioms are beginning to emerge...