Category: Programming

Zero values in Go and Lazy Initialization

Posted on 2018-12-172018-12-25 by Roberto's Chronicles

I’m a big fan of the way Go does zero values, meaning it initializes every variable to a default value. This is in contrast with the way other languages such as, say, C behave. For instance, the printed result of the following C program is unpredictable.

#include <stdio.h>

int main(void) {
    int i;
    printf("%d\n", i);
    return 0;

}

The value of i will be whatever happens to be at the position in memory where the compiler happened to allocate the variable. Contrast this with the equivalent program in Go —

package main

import "fmt"

func main() {
    var i int
    fmt.Println(i)
}

This will always print 0 because i is initialized by the compiler to the default value of an int, which happens to be 0.

This happens for every variable of any type, including our own custom types. What’s even cooler is that this is done recursively, so if you the fields inside a struct will also be themselves initialized.

I strive to make all my zero values useful, but it’s not always that simple. Sometimes you need to use different default values for your fields, or maybe you need to initialize one of those fields. This is especially important when we remember that the zero value of a pointer is nil.

Imagine the following type —

type Foobar struct {
    db *DB
}

func NewFoobar() *Foobar {
    return &Foobar{db: DB.New()}
}

func (f *Foobar) Get(key string) (*Foo, error) {
    foo, err := db.Get(key)
    if err != nil {
        return nil, err
    }
    return foo, nil
}

In the example above, our zero value is no longer useful: we’d cause a runtime error because db will be nil inside the Get() function. We’re forced to call NewFoobar() before using our functions.

But there’s a simple trick to make the Foobar zero value useful again. As it turns out, being lazy sometimes pays off. Our technique is called lazy initialization —

type Foobar struct {
    dbOnce sync.Once
    db *DB
}

// lazy initialize db
func (f *Foobar) lazyInit() {
    f.dbOnce.Do(func() {
        f.db = DB.New()
    })
}

We added a sync.Once to our type. From the Go docs:

Once an object that will perform exactly one action.

The function we pass to sync.Once.Do() is guaranteed to run once and only once, so it is perfect for initializations. Now we can call lazyInit() at the top of our exported function and it will ensure db is initialized —

func (f *Foobar) Get(key string) (*Foo, error) {
    f.lazyInit()

    foo, err := db.Get(key)
    if err != nil {
        return nil, err
    }
    return foo, nil
}

...

var f Foobar
foo, err := f.Get("baz")

We are now free to use our zero value with no additional initialization. I love it.

Of course, it is not always possible to use zero values. For example, our Foobar assumes a magical object DB that can be initialized by itself, but in real life we probably need to connect to an external database, authenticate, etc and then pass the created DB to our Foobar.

Still, using lazy initialization allows us to make a lot of objects’ zero values useful that would otherwise not be.

Playing with Go module proxies

Posted on 2018-08-292019-11-05 by Roberto's Chronicles

(This article has been graciously translated to Russian here. Huge thanks to Akhmad Karimov.)

I wrote a brief introduction to Go modules and in it I talked briefly about Go modules proxies and now that Go 1.11 is out, I thought I’d play a bit these proxies to figure our how they’re supposed to work.

Why

One of the goals of Go modules is to provide reproducible builds and it does a very good job by fetching the correct and expected files from a repository.

But what if the servers are offline? What if the repository simply vanishes?

One way teams deal with these risks is by vendoring the dependencies, which is fine. But Go modules offers another way: the use of a module proxy.

The Download Protocol

When Go modules support is enabled and the go command determines that it needs a module, it first looks at the local cache (under $GOPATH/pkg/mods). If it can’t find the right files there, it then goes ahead and fetches the files from the network (i.e. from a remote repo hosted on Github, Gitlab, etc.)

If we want to control what files go can download, we need to tell it to go through our proxy by setting the GOPROXY environment variable to point to our proxy’s URL. For instance:

export GOPROXY=http://gproxy.mycompany.local:8080

The proxy is nothing but a web server that responds to the module download protocol, which is a very simple API to query and fetch modules. The web server may even serve static files.

A typical scenario would be the go command trying to fetch github.com/pkg/errors:

The first thing go will do is ask the proxy for a list of available versions. It does this by making a GET request to /{module name}/@v/list. The server then responds with a simple list of versions it has available:

v0.8.0
v0.7.1

The go will determine which version it wants to download — the latest unless explicitly told otherwise¹. It will then request information about that given version by issuing a GET request to /{module name}/@v/{module revision} to which the server will reply with a JSON representation of the struct:

type RevInfo struct {
    Version string    // version string
    Name    string    // complete ID in underlying repository
    Short   string    // shortened ID, for use in pseudo-version
    Time    time.Time // commit time
}

So for instance, we might get something like this:

{
    "Version": "v0.8.0",
    "Name": "v0.8.0",
    "Short": "v0.8.0",
    "Time": "2018-08-27T08:54:46.436183-04:00"
}

The go command will then request the module’s go.mod file by making a GET request to /{module name}/@v/{module revision}.mod. The server will simply respond with the contents of the go.mod file (e.g. module github.com/pkg/errors.) This file may list additional dependencies and the cycle restarts for each one.

Finally, the go command will request the actual module by getting /{module name}/@v/{module revision}.zip. The server should respond with a byte blob (application/zip) containing a zip archive with the module files where each file must be prefixed by the full module path and version (e.g. github.com/pkg/errors@v0.8.0/), i.e. the archive should contain:

github.com/pkg/errors@v0.8.0/example_test.go
github.com/pkg/errors@v0.8.0/errors_test.go
github.com/pkg/errors@v0.8.0/LICENSE
...

And not:

errors/example_test.go
errors/errors_test.go
errors/LICENSE
...

This seems like a lot when written like this, but it’s in fact a very simple protocol that simply fetches 3 or 4 files:

The list of versions (only if go does not already know which version it wants)
The module metadata
The go.mod file
The module zip itself

Creating a simple local proxy

To try out the proxy support, let’s create a very basic proxy that will serve static files from a directory. First we create a directory where we will store our in-site copies of our dependencies. Here’s what I have in mine:

$ find . -type f
./github.com/robteix/testmod/@v/v1.0.0.mod
./github.com/robteix/testmod/@v/v1.0.1.mod
./github.com/robteix/testmod/@v/v1.0.1.zip
./github.com/robteix/testmod/@v/v1.0.0.zip
./github.com/robteix/testmod/@v/v1.0.0.info
./github.com/robteix/testmod/@v/v1.0.1.info
./github.com/robteix/testmod/@v/list

These are the files our proxy will serve. You can find these files on Github if you’d like to play along. For the examples below, let’s assume we have a devel directory under our home directory; adapt accordingly.

$ cd $HOME/devel
$ git clone https://github.com/robteix/go-proxy-blog.git

Our proxy server is simple (it could be even simpler, but I wanted to log the requests):

package main

import (
    "flag"
    "log"
    "net/http"
)

func main() {
    addr := flag.String("http", ":8080", "address to bind to")
    flag.Parse()

    dir := "."
    if flag.NArg() > 0 {
        dir = flag.Arg(0)
    }

    log.Printf("Serving files from %s on %s\n", dir, *addr)

    h := handler{http.FileServer(http.Dir(dir))}

    panic(http.ListenAndServe(*addr, h))
}

type handler struct {
    h http.Handler
}

func (h handler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
    log.Println("New request:", r.URL.Path)
    h.h.ServeHTTP(w, r)
}

Now run the code above:

$ go run proxy.go -http :8080 $HOME/devel/go-proxy-blog
2018/08/29 14:14:31 Serving files from /home/robteix/devel/go-proxy-blog on :8080

$ curl http://localhost:8080/github.com/robteix/testmod/@v/list
v1.0.0
v1.0.1

Leave the proxy running and move to a new terminal. Now let’s create a new test program. we create a new directory $HOME/devel/test and create a file named test.go inside it with the following code:

package main

import (
    "github.com/robteix/testmod"
)

func main() {
    testmod.Hi("world")
}

And now, inside this directory, let’s enable Go modules:

$ go mod init test

And we set the GOPROXY variable:

export GOPROXY=http://localhost:8080

Now let’s try building our new program:

$ go build
go: finding github.com/robteix/testmod v1.0.1
go: downloading github.com/robteix/testmod v1.0.1

And if you check the output from our proxy:

2018/08/29 14:56:14 New request: /github.com/robteix/testmod/@v/list
2018/08/29 14:56:14 New request: /github.com/robteix/testmod/@v/v1.0.1.info
2018/08/29 14:56:14 New request: /github.com/robteix/testmod/@v/v1.0.1.mod
2018/08/29 14:56:14 New request: /github.com/robteix/testmod/@v/v1.0.1.zip

So as long as GOPROXY is set, go will only download files from out proxy. If I go ahead and delete the repository from Github, things will continue to work.

Using a local directory

It is interesting to note that we don’t even need our proxy.go at all. We can set GOPROXY to point to a directory in the filesystem and things will still work as expected:

export GOPROXY=file://home/robteix/devel/go-proxy-blog

If we do a go build now², we’ll see exactly the same thing as with the proxy:

$ go build
go: finding github.com/robteix/testmod v1.0.1
go: downloading github.com/robteix/testmod v1.0.1

Of course, in real life, we probably will prefer to have a company/team proxy server where our dependencies are stored, because a local directory is not really much different from the local cache that go already maintains under $GOPATH/pkg/mod, but still, nice to know that it works.

There is a project called Athens that is building a proxy and that aims — if I don’t misunderstand it — to create a central repository of packages à la npm.

Remember that somepackage and somepackage/v2 are treated as different packages. ↩
That’s not strictly true as now that we’ve already built it once, go has cached the module locally and will not go to the proxy (or the network) at all. You can still force it by deleting $GOPATH/pkg/mod/cache/download/github.com/robteix/testmod/ and $GOPATH/pkg/mod/github.com/robteix/testmod@v1.0.1) ↩

Introduction to Go Modules

Posted on 2018-08-182022-02-10 by Roberto's Chronicles

This post is also available in other languages:

Russian: Введение в модули Go
Uzbek: Go modullariga kirish

The upcoming version 1.11 of the Go programming language will bring experimental support for modules, a new dependency management system for Go. A few days ago, I wrote a quick post about it. Since that post went live, things changed a bit and as we’re now very close to the new release, I thought it would be a good time for another post with a more hands-on approach. So here’s what we’ll do: we’ll create a new package and then we’ll make a few releases to see how that would work.

Creating a Module

So first things first. Let’s create our package. We’ll call it “testmod”. An important detail here: this directory should be outside your $GOPATH because by default, the modules support is disabled inside it. Go modules is a first step in potentially eliminating $GOPATH entirely at some point.

$ mkdir testmod
$ cd testmod

Our package is very simple:

package testmod

import "fmt" 

// Hi returns a friendly greeting
func Hi(name string) string {
   return fmt.Sprintf("Hi, %s", name)
}

The package is done but it is still not a module. Let’s change that.

$ go mod init github.com/robteix/testmod
go: creating new go.mod: module github.com/robteix/testmod

This creates a new file named go.mod in the package directory with the following contents:

module github.com/robteix/testmod

Not a lot here, but this effectively turns our package into a module. We can now push this code to a repository:

$ git init 
$ git add * 
$ git commit -am "First commit" 
$ git push -u origin master

Until now, anyone willing to use this package would go get it:

$ go get github.com/robteix/testmod

And this would fetch the latest code in master. This still works, but we should probably stop doing that now that we have a Better Way™. Fetching master is inherently dangerous as we can never know for sure that the package authors didn’t make change that will break our usage. That’s what modules aims at fixing.

Quick Intro to Module Versioning

Go modules are versioned, and there are some particularities with regards to certain versions. You will need to familiarize yourself with the concepts behind semantic versioning.

More importantly, Go will use repository tags when looking for versions, and some versions are different of others: e.g. versions 2 and greater should have a different import path than versions 0 and 1 (we’ll get to that.) As well, by default Go will fetch the latest tagged version available in a repository. This is an important gotcha as you may be used to working with the master branch. What you need to keep in mind for now is that to make a release of our package, we need to tag our repository with the version. So let’s do that.

Making our first release

Now that our package is ready, we can release it to the world. We do this by using version tags. Let’s release our version 1.0.0:

$ git tag v1.0.0
$ git push --tags

This creates a tag on my Github repository marking the current commit as being the release 1.0.0.

Go doesn’t enforce that in any way, but a good idea is to also create a new branch (“v1”) so that we can push bug fixes to.

$ git checkout -b v1
$ git push -u origin v1

Now we can work on master without having to worry about breaking our release.

Using our module

Now we’re ready to use the module. We’ll create a simple program that will use our new package:

package main

import (
 	"fmt"

 	"github.com/robteix/testmod"
)

func main() {
 	fmt.Println(testmod.Hi("roberto"))
}

Until now, you would do a go get github.com/robteix/testmod to download the package, but with modules, this gets more interesting. First we need to enable modules in our new program.

$ go mod init mod

As you’d expect from what we’ve seen above, this will have created a new go.mod file with the module name in it:

module mod

Things get much more interesting when we try to build our new program:

$ go build
go: finding github.com/robteix/testmod v1.0.0
go: downloading github.com/robteix/testmod v1.0.0

As we can see, the go command automatically goes and fetches the packages imported by the program. If we check our go.mod file, we see that things have changed:

module mod
require github.com/robteix/testmod v1.0.0

And we now have a new file too, named go.sum, which contains hashes of the packages, to ensure that we have the correct version and files.

github.com/robteix/testmod v1.0.0 h1:9EdH0EArQ/rkpss9Tj8gUnwx3w5p0jkzJrd5tRAhxnA=
github.com/robteix/testmod v1.0.0/go.mod h1:UVhi5McON9ZLc5kl5iN2bTXlL6ylcxE9VInV71RrlO8=

Making a bugfix release

Now let’s say we realized a problem with our package: the greeting is missing ponctuation! People are mad because our friendly greeting is not friendly enough. So we’ll fix it and release a new version:

// Hi returns a friendly greeting
func Hi(name string) string {
-       return fmt.Sprintf("Hi, %s", name)
+       return fmt.Sprintf("Hi, %s!", name)
}

We made this change in the v1 branch because it’s not relevant for what we’ll do for v2 later, but in real life, maybe you’d do it in master and then back-port it. Either way, we need to have the fix in our v1 branch and mark it as a new release.

$ git commit -m "Emphasize our friendliness" testmod.go
$ git tag v1.0.1
$ git push --tags origin v1

Updating modules

By default, Go will not update modules without being asked. This is a Good Thing™ as we want predictability in our builds. If Go modules were automatically updated every time a new version came out, we’d be back in the uncivilized age pre-Go1.11. No, we need to tell Go to update a modules for us.

We do this by using our good old friend go get:

run go get -u to use the latest minor or patch releases (i.e. it would update from 1.0.0 to, say, 1.0.1 or, if available, 1.1.0)
run go get -u=patch to use the latest patch releases (i.e., would update to 1.0.1 but not to 1.1.0)
run go get package@version to update to a specific version (say, github.com/robteix/testmod@v1.0.1)

In the list above, there doesn’t seem to be a way to update to the latest major version. There’s a good reason for that, as we’ll see in a bit.

Since our program was using version 1.0.0 of our package and we just created version 1.0.1, any of the following commands will update us to 1.0.1:

$ go get -u
$ go get -u=patch
$ go get github.com/robteix/testmod@v1.0.1

After running, say, go get -u our go.mod is changed to:

module mod
require github.com/robteix/testmod v1.0.1

Major versions

According to semantic version semantics, a major version is different than minors. Major versions can break backwards compatibility. From the point of view of Go modules, a major version is a different package completely. This may sound bizarre at first, but it makes sense: two versions of a library that are not compatible with each other are two different libraries.

Let’s make a major change in our package, shall we? Over time, we realized our API was too simple, too limited for the use cases of our users, so we need to change the Hi() function to take a new parameter for the greeting language:

package testmod

import (
 	"errors"
 	"fmt" 
) 

// Hi returns a friendly greeting in language lang
func Hi(name, lang string) (string, error) {
 	switch lang {
 	case "en":
 		return fmt.Sprintf("Hi, %s!", name), nil
 	case "pt":
 		return fmt.Sprintf("Oi, %s!", name), nil
 	case "es":
 		return fmt.Sprintf("¡Hola, %s!", name), nil
 	case "fr":
 		return fmt.Sprintf("Bonjour, %s!", name), nil
 	default:
 		return "", errors.New("unknown language")
 	}
}

Existing software using our API will break because they (a) don’t pass a language parameter and (b) don’t expect an error return. Our new API is no longer compatible with version 1.x so it’s time to bump the version to 2.0.0.

I mentioned before that some versions have some peculiarities, and this is the case now. Versions 2 and over should change the import path. They are different libraries now.

We do this by appending a new version path to the end of our module name.

module github.com/robteix/testmod/v2

The rest is the same as before, we push it, tag it as v2.0.0 (and optionally create a v2 branch.)

$ git commit testmod.go -m "Change Hi to allow multilang"
$ git checkout -b v2 # optional but recommended
$ echo "module github.com/robteix/testmod/v2" > go.mod
$ git commit go.mod -m "Bump version to v2"
$ git tag v2.0.0
$ git push --tags origin v2 # or master if we don't have a branch

Updating to a major version

Even though we have released a new incompatible version of our library, existing software will not break, because it will continue to use the existing version 1.0.1. go get -u will not get version 2.0.0.

At some point, however, I, as the library user, may want to upgrade to version 2.0.0 because maybe I was one of those users who needed multi-language support.

I do it but modifying my program accordingly:

package main

import (
 	"fmt"
 	"github.com/robteix/testmod/v2" 
)

func main() {
 	g, err := testmod.Hi("Roberto", "pt")
 	if err != nil {
 		panic(err)
 	}
 	fmt.Println(g)
}

And then when I run go build, it will go and fetch version 2.0.0 for me. Notice how even though the import path ends with “v2”, Go will still refer to the module by its proper name (“testmod”).

As I mentioned before, the major version is for all intents and purposes a completely different package. Go modules does not link the two at all. That means we can use two incompatible versions in the same binary:

package main
import (
 	"fmt"
 	"github.com/robteix/testmod"
 	testmodML "github.com/robteix/testmod/v2"
)

func main() {
 	fmt.Println(testmod.Hi("Roberto"))
 	g, err := testmodML.Hi("Roberto", "pt")
 	if err != nil {
 		panic(err)
 	}
 	fmt.Println(g)
}

This eliminates a common problem with dependency management: when dependencies depend on different versions of the same library.

Tidying it up

Going back to the previous version that uses only testmod 2.0.0, if we check the contents of go.mod now, we’ll notice something:

module mod
require github.com/robteix/testmod v1.0.1
require github.com/robteix/testmod/v2 v2.0.0

By default, Go does not remove a dependency from go.mod unless you ask it to. If you have dependencies that you no longer use and want to clean up, you can use the new tidy command:

$ go mod tidy

Now we’re left with only the dependencies that are really being used.

Vendoring

Go modules ignores the vendor/ directory by default. The idea is to eventually do away with vendoring[^0]. But if we still want to add vendored dependencies to our version control, we can still do it:

$ go mod vendor

This will create a vendor/ directory under the root of your project containing the source code for all of your dependencies.

Still, go build will ignore the contents of this directory by default. If you want to build dependencies from the vendor/ directory, you’ll need to ask for it.

$ go build -mod vendor

I expect many developers willing to use vendoring will run go build normally on their development machines and use -mod vendor in their CI.

Again, Go modules is moving away from the idea of vendoring and towards using a Go module proxy for those who don’t want to depend on the upstream version control services directly.

There are ways to guarantee that go will not reach the network at all (e.g. GOPROXY=off) but these are the subject for a future blog post.

Conclusion

This post may seem a bit daunting, but I tried to explain a lot of things together. The reality is that now Go modules is basically transparent. We import package like always in our code and the go command will take care of the rest.

When we build something, the dependencies will be fetched automatically. It also eliminates the need to use $GOPATH which was a roadblock for new Go developers who had trouble understanding why things had to go into a specific directory.

~~Vendoring is (unofficially) being deprecated in favour of using proxies.~~[^0] I may do a separate post about the Go module proxy. (Update: it’s live.) [^0]: I think this came out a bit too strong and people left with the impression that vendoring is being removed right now. It isn’t. Vendoring still works, albeit slightly different than before. There seems to be a desire to replace vendoring with something better, which may or may not be a proxy. But for now this is just it: a desire for a better solution. Vendoring is not going away until a good replacement is found (if ever.)

Playing With Go Modules

Posted on 2018-07-202019-05-17 by Roberto's Chronicles

Update: much of this article has been rendered obsolete by changes made to Go modules since. Check this more recent post that’s up to date.

Had some free time in my hands, so I decided to check out the new Go modules. For those unaware, the next Go release (1.11) will include a new functionality that aims at addressing package management called “modules”. It will still be marked as experimental, so things may very well change a lot.

Here’s how it works. Let’s say we create a new package:

rselbach@wile ~/code $ mkdir foobar
rselbach@wile ~/code $ cd foobar/

Our foobar.go will be very simple:

package foobar

import (
"fmt"
"io"

"github.com/pkg/errors"
)

func WriteGreet(w io.Writer, name string) error {
if _, err := fmt.Fprintln(w, "Hello", name); err != nil {
return errors.Wrapf(err, "Could not greet %s", name)
}

return nil
}

Notice we’re using Dave Cheney’s excellent errors package. Until now, go get would fetch whatever it found on that package’s repository. If Dave ever decided to change something drastic in the package, our code would probably break without us even knowing about it until too late.

That’s where Go modules come into play, as it will help us (1) formalize the dependency and (2) lock a particular version of it to our code. First we need to turn on the support for modules in our package. We do this by using the new go mod command and giving out module a name:

$ go mod -init -module github.com/rselbach/foobar
go: creating new go.mod: module github.com/rselbach/foobar

After that, you will notice that a new file called go.mod will be created in our package root. For now, it only contains the module name we gave it above:

module github.com/rselbach/foobar

But it does more than that, for now the go command is aware that we are in a module-aware package and will behave accordingly. See what happens when we first try to build this:

$ go build
go: finding github.com/pkg/errors v0.8.0
go: downloading github.com/pkg/errors v0.8.0

It sees that we are using an external package and then it goes out, finds the latest version of it, downloads it, and adds it to our go.mod file:

module github.com/rselbach/foobar

require github.com/pkg/errors v0.8.0

From now one, whenever someone uses our package, Go will also download version 0.8.0 of Dave’s errors package. If version 0.9.0 completely breaks compatibility, it will not break our code for it will continue to be compiled with correct version.

If we change go.mod to require version 0.7.0, our next go build will fetch it as well:

$ go build
go: finding github.com/pkg/errors v0.7.0
go: downloading github.com/pkg/errors v0.7.0

If you’re wondering where the packages are, they’re stored under $GOPATH/src/mod:

$ ls -l ~/go/src/mod/github.com/pkg/
total 0
dr-xr-xr-x 13 rselbach staff 416 20 Jul 07:44 errors@v0.7.0
dr-xr-xr-x 14 rselbach staff 448 20 Jul 07:40 errors@v0.8.0

What about vendoring?

Good thing you asked. This is where I don’t like Go modules’ approach. It specifically aims at doing away with vendoring. Versioning is obviously taken care of by the modules functionality itself while disponibility is supposed to be taken care of by something like caching proxies.

I don’t see it. First of all, not everybody is Google. Maintaining cache proxies is extra work that many organizations don’t have the resources for. Also, what about when we’re working from home? Or on the commute? Sure, a VPN solves this, but then it’s one more thing we need to maintain.

That doesn’t mean you can’t vendor with Go modules. It’s simply not the default way it works. Let’s see how it works though. Let’s say we want to vendor our dependencies:

$ go mod -vendor
rselbach@wile ~/code/foobar $ ls -la
total 24
drwxr-xr-x 6 rselbach staff 192 20 Jul 08:01 .
drwxr-xr-x 33 rselbach staff 1056 20 Jul 07:14 ..
-rw-r--r-- 1 rselbach staff 252 20 Jul 07:22 foobar.go
-rw-r--r-- 1 rselbach staff 72 20 Jul 07:43 go.mod
-rw-r--r-- 1 rselbach staff 322 20 Jul 07:44 go.sum
drwxr-xr-x 4 rselbach staff 128 20 Jul 08:01 vendor

Notice how go mod now has created a vendor directory in our module root. Inside it you’ll find the source code for the modules we are using. It also contains a file with a list of packages and versions:

$ cat vendor/modules.txt
# github.com/pkg/errors v0.7.0
github.com/pkg/errors

We can now add this to our repository. But it’s not all done. Surprisingly, when modules support is enabled, the go command will ignore our vendor directory completely. In order to actually use our vendor directory, we need to explicitely tell go build to use it

go build -v -getmode=vendor

This essentially emulates the behaviour of previous version of Go.

An LRU in Go (Part 2)

Posted on 2018-06-152018-08-20 by Roberto's Chronicles

So we created a concurrency-safe LRU in the last post, but it was too slow when used concurrently because of all the locking.

Reducing the amount of time spent waiting on locks is actually not trivial, but not undoable. You can use things like the sync/atomic package to cleverly change pointers back and forth with basically no locking needed. However, our situation is more complicated than that: we use two separate data structures that need to be updated atomically (the list itself and the index.) I don’t know that we can do that with no mutexes.

We can, however, easily lessen the amount of time spent waiting on mutexes by using sharding.

You see, right now we have a single list.List to hold the data and a map for the index. This means that when one goroutine is trying to add, say, {"foo": "bar"}, it has to wait until another finishes updating {"bar" : "baz" } because even though the two keys are not related at all, the lock needs to protect the entire list and index.

A quick and easy way to go around this is to split the data structures into shards. Each shard has its own data structures:

type shard struct {
cap int
len int32

sync.Mutex // protects the index and list
idx map[interface{}]*list.Element // the index for our list
l *list.List // the actual list holding the data
}

And so now, our LRU looks a bit different:

type LRU struct {
    cap int // the max number of items to hold
    nshards int // number of shards
    shardMask int64 // mask used to select correct shard for key
    shards []*shard
}

The methods in LRU are now also much simpler as all they do is call an equivalent method in a given shard:

func (l *LRU) Add(key, val interface{}) {
    l.shard(key).add(key, val)
}

So now it’s time to figure out how to select the correct shard for a given key. In order to prevent two different values of the same key to be stored in two different shards, we need a given key to always return the same shard. We do this by passing a byte representation of the key through the Fowler–Noll–Vo hash function to generate a hash as an int32 number. We do a logical AND between this hash and a shard mask.

The hard part is actually getting a byte representation of a key. It would be trivial if the key was of a fixed type (say, int or string) but we actually use interface{}, which gives us a bit more work to do. The shard() function actually is a large type switch that tries to find the quickest way possible to find the byte representation of any given type.

For instance, if the type is int, we do:

const il = strconv.IntSize / 8
func intBytes(i int) []byte {
    b := make([]byte, il)
    b[0] = byte(i)
    b[1] = byte(i >> 8)
    b[2] = byte(i >> 16)
    b[3] = byte(i >> 24)
    if il == 8 {
        b[4] = byte(i >> 32)
        b[5] = byte(i >> 40)
        b[6] = byte(i >> 48)
        b[7] = byte(i >> 56)
    }
    return b
}

We do similar things to all of the known types. We also check for custom types that provide a String() string method (i.e. implement the Stringer interface.) This should allow for getting the byte representation for the vast majority of types people are likely to want to use as a key. If, however, the type is unknown and is not a Stringer (nor has a Bytes() []byte method), then we fall back to using gob.Encoder, which will work but is very slow to make it realistic usable.

So what does all of this does for us? Now we never lock the entire LRU when doing operations, instead locking only small portions of it when needed, this results in my less time spent waiting for mutexes on average.

We can play around with how many shards we want depending on how much data we store, but we can potentially have thousands of very small shards to provide very fine locking. Of course, since dealing with sharding does have a small overhead, at some point it is not worth adding more.

The following benchmarks were done with 10000 shards and without concurrency —

BenchmarkAdd/mostly_new-4 1000000    1006 ns/op
BenchmarkAdd/mostly_existing-4 10000000  236 ns/op
BenchmarkGet/mostly_found-4 5000000  571 ns/op
BenchmarkGet/mostly_not_found-4 10000000     289 ns/op
BenchmarkRemove/mostly_found-4 3000000   396 ns/op
BenchmarkRemove/mostly_not_found-4 10000000  299 ns/op

And this with 10000 shards with concurrent access —

BenchmarkAddParallel/mostly_new-4 2000000    719 ns/op
BenchmarkAddParallel/mostly_existing-4 10000000  388 ns/op
BenchmarkGetParallel/mostly_found-4 10000000     147 ns/op
BenchmarkGetParallel/mostly_not_found-4 20000000     126 ns/op
BenchmarkRemoveParallel/mostly_found-4 10000000  142 ns/op
BenchmarkRemoveParallel/mostly_not_found-4 20000000  338 ns/op

Still, each shard still locks completely at every access, we might do better than that.

Source code on Github.

An LRU in Go (Part 1)

Posted on 2018-06-132018-08-20 by Roberto's Chronicles

In my job, I often catch myself having to implement a way to keep data in memory so I can use them at a later time. Sometimes this is for caching purposes, sometimes it’s to keep track of what I’ve sent some other service earlier so I can create a diff or something. Reasons are plenty.

I noticed that I keep writing versions of the same thing over and over and when we’re doing that, maybe it’s time to try and create a more generic solution. So let’s start simple, shall we? What we want is to implement a least-recently-used list (LRU) with no fuss.

Naïve LRU

We start with a simple and very naïve implementation (see the full implementation on Github.)

type LRU struct {
    cap int                           // the max number of items to hold
    idx map[interface{}]*list.Element // the index for our list
    l   *list.List                    // the actual list holding the data
}

We don’t have much there, we have a doubly-linked list l to hold the data and a map to serve as an index. In true Go fashion, we create an initializer —

func New(cap int) *LRU {
    l := &LRU{
        cap: cap,
        l:   list.New(),
        idx: make(map[interface{}]*list.Element, cap+1),
    }
    return l
}

This is nice. Package users will initialize the list with something like l := lru.New(1000) and joy will spread thoughout the land. But I strongly believe the empty value of a struct should also be usable, which is not the case here :

var l LRU
l.Add("hello", "world")

This will panic because neither l nor idx are initialized. My go-to solution for this is to always provide a simples unexported initialization function that is run at the beginning of each exported method and makes sure the initialization is done.

func (l *LRU) lazyInit() {
    if l.l == nil {
        l.l = list.New()
        l.idx = make(map[interface{}]*list.Element, l.cap+1)
    }
}

Now, let’s see how we’d add a new item to the list —

func (l *LRU) Add(k, v interface{}) {
    l.lazyInit()

    // first let's see if we already have this key
    if le, ok := l.idx[k]; ok {
        // update the entry and move it to the front
        le.Value.(*entry).val = v
        l.l.MoveToFront(le)
        return
    }
    l.idx[k] = l.l.PushFront(&entry{key: k, val: v})

    if l.cap > 0 && l.l.Len() > l.cap {
        l.removeOldest()
    }
    return
}

We call lazyInit to make sure everything is initialized than we check whether the key already is in our list. If it is, we only need to update the value and move it to the front of the list as it’s now become the most-recently-used.

If, however, the key was not already in our index, then we need to create a new entry and push it to the list. By the way, this is the first time we see entry, which is an unexported type that we actually store in our list.

type entry struct {
    key, val interface{}
}

Finally, we check whether by adding the new element, we just passed the capacity of the list (if cap is less than 1, we treat as there being no limit to capacity, by the way). If we are over capacity, we go ahead and delete the oldest element in the list.

This is a naïve implementation, but it’s also very fast.

BenchmarkAdd/mostly_new-4                2000000            742 ns/op
BenchmarkAdd/mostly_existing-4          10000000            153 ns/op
BenchmarkGet/mostly_found-4             10000000            222 ns/op
BenchmarkGet/mostly_not_found-4         20000000            143 ns/op
BenchmarkRemove/mostly_found-4          10000000            246 ns/op
BenchmarkRemove/mostly_not_found-4      20000000            142 ns/op

Getting and removing elements is in the same ballpark as accessing a built-in Go map. Getting an existing item adds a bit of overhead on top of accessing the map because it also needs to move the element to the front of the list, but that’s a very fast pointer operation so it’s not that bad.

Adding has to check for an existing item as well as adding/moving the list element, so it’s relatively slower, but still good enough for my immediate needs.

There is a big problem though. This implementation is not at all safe for concurrent use. Sure, package users could protect it with a sync.Mutex but we may want to hide the complexity from them.

The (once again naïve) solution is to protect the list and map accesses with a sync.Mutex of our own.

Concurrent-safe Naïve LRU

Not much needs to be changed to make our naïve implementation safe for concurrent use (full implementation on Github.). We add a mutex —

type LRU struct {
    cap int // the max number of items to hold

    sync.Mutex                               // protects the idem and list
    idx        map[interface{}]*list.Element // the index for our list
    l          *list.List                    // the actual list holding the data
}

And at the top of every exported entrypoint we do something like —

func (l *LRU) Add(k, v interface{}) {
    l.Lock()
    defer l.Unlock()
    l.lazyInit()

Et voilà, we’re safe for concurrent use. But let’s see what the benchmarks tell us.

BenchmarkAdd/mostly_new-4            2000000           819 ns/op
BenchmarkAdd/mostly_existing-4      10000000           200 ns/op
BenchmarkGet/mostly_found-4         10000000           277 ns/op
BenchmarkGet/mostly_not_found-4     10000000           219 ns/op
BenchmarkRemove/mostly_found-4       5000000           325 ns/op
BenchmarkRemove/mostly_not_found-4  10000000           220 ns/op

Overall the code is slightly slower due to the (for now minimal) overhead of the mutex operations. However, although we are safe for concurrent use, we’re not really taking advantage of it, the entire code is completely sequential. An Add will prevent anyone from getting an item. Even a call to Len() locks everyone out.

This will be clear if we modify the tests to run in parallel.

BenchmarkAddParallel/mostly_new-4            1000000          1040 ns/op
BenchmarkAddParallel/mostly_existing-4       5000000           433 ns/op
BenchmarkGetParallel/mostly_found-4          5000000           293 ns/op
BenchmarkGetParallel/mostly_not_found-4     10000000           289 ns/op
BenchmarkRemoveParallel/mostly_found-4       5000000           305 ns/op
BenchmarkRemoveParallel/mostly_not_found-4  10000000           291 ns/op

On average, the calls will take longer because most of the times they’ll have to wait for another operation to finish and free the mutex.

In the next post, we’ll try and cut on the locking considerably.

How to use FileServer with Gorilla’s Subrouter

Posted on 2017-11-292018-08-20 by Roberto's Chronicles

I’ve just spent much more time than I ever wanted to get this right, so here’s how I did it for future reference.

I have a function that returns an http.Handler, kind of like this:

func Handler(prefix string) http.Handler {
    r := mux.NewRouter().PathPrefix(prefix).Subrouter()
    r.HandleFunc("/foo/", fooHandler).Methods("GET")
    r.HandleFunc("/bar/", barHandler).Methods("GET")
    return r
}

The prefix could be something like “/api/” or “/ui” or whatever, so this http.Handler will serve /api/foo and /api/bar, for example. So far so good.

Now, I also want to serve some static files under the prefix’s “root” (again, something like “/api/”.) My initial thinking was something like this:

r.Handle("/", http.StripPrefix(prefix, http.Dir("./somedir")))

It works fine for anything under the root of “./somedir” but it failed with a 404 error for anything under different subdirectories. I found some answers online, but they never seemed to use a subrouter and thus didn’t work for me at all.

I finally figured it out:

r.PathPrefix("/").Handler(http.StripPrefix(prefix, http.FileServer(http.Dir("./somedir"))))

It wasn’t obvious to me at all.

Returns in Go and C#

Posted on 2017-09-152018-08-20 by Roberto's Chronicles

Whenever someone posts anything related to the Go programming language on Hacker News, it never takes long before someone complains about error handling. I find it interesting because it is exactly one of things I like the most about Go.

I don’t want to specifically talk about error handling though. I want to talk about a feature that is intrinsically tied to it in Go: the ability of functions to return multiple values

For instance, in Go it is common and idiomatic to write functions like this —

func Divide(a, b float64) (float64, error) {
    if b == 0 {
        return 0.0, errors.New("divide by zero")
    }
    return a / b, nil
}

So the caller would do:

result, err := Divide(x, y)
if err != nil {
    // do error handling...
}

Some people deplore this. I absolutely love it. I find it so much clearer than, for instance, what we often have to do in C#. You see, C# didn’t have multiple returns (until very recently; see below) so you ended up with a few options.

First, you can simple throw exceptions.

public SomeObject GetObjectById(int id) {
    if (!SomeObjectRepo.Has(id))
        throw new ArgumentOutOfRangeException(nameof(id));
    // ...
}
...
try
{
    var obj = GetObjectById(1);
    // do something with obj
}
catch (ArgumentOutOfRangeException ex)
{
    //  error handling
}

I find the flow difficult to read. Particularly because variables are scoped within the try-catch so often you need to first declare something above the try and then test it after the catch.

A second option is to return null:

public SomeObject GetObjectById(int id)
{
    if (!SomeObjectRepo.Has(id))
        return null;

    // go get the object
}
...
var obj = GetObjectById(1);
if (obj == null) 
{
    // do error handling
}

This looks closer to what I like but it still has some serious downsides. You don’t get any error information. What made it fail? I don’t know. As well, this doesn’t work for non-nullable types. A method returning a, say, int cannot return null. Sure, you could return int? instead of int and then test for .HasValue but that’s cumbersome and artificial.

A third option is the use of a generic return type. Something like —

public class Result<T>
{
    public T Value {get;protected set;}
    public Exception Exception {get; protected set;}

    public bool IsError => Exception != null;

    public Result() : this(default(T)) {}
    public Result(T value)
    {
        Value = value;
    }

    public static Result<T> MakeError(Exception exception)
    {
        return new Result<T>
        {
            Value = default(T),
            Exception = exception
        };
    }
}

You could then use this to return values like —

public Result<int> Divide(int a, int b)
{
    if (b == 0)
    {
        return Result<int>.MakeError(new DivideByZeroException());
    }

    return new Result<int>(a / b);
}
...
var res = Divide(8, 4);
if (res.IsError)
{
    // do error handling, e.g.
    throw res.Exception;
}
// do something with res.Value (2)

This works, but it looks artificial. You need to create instances of Result<T> all around all the time. It is not that bad if your codebase uses this throughout and it becomes automatic for all programmers envolved. When it’s an exception to the rule, it is horrible.

A very similar solution is to return something like Tuple<T1, T2, ...> —

public Tuple<int,Exception> Divide(int a, int b)
{
    if (b == 0)
        return new Tuple<int,Exception>(0, new DivideByZeroException());
    return new Tuple<int,Exception>(a/b, null);
}
...
var res = Divide(1, 2);
if (res.Item2 != null) // Item2 is the exception
{
    // do error handling
}
// do something with res.Item1

Same principle. It’s ugly and artificial, but it will come back to us.

The way the C# authors found to work around this problem is the idiomatic try-pattern, which consists in creating non-exception-throwing versions of methods. For example, if we go back to the first C# example above (GetObjectById()), we could create a second method like so —

public bool TryGetObjectById(int id, out SomeObject result) {
    try 
    {
        result = GetObjectById(id);
        return true;
    }
    catch
    {
        result = default(SomeObject);
        return false;
    }
}
...
SomeObject result;
if (!TryGetObjectById(1, out result))
{
    // do error handling
}
// do something with result

Note that ever since C# 7.0 you can declare the out variable directly inside the method call as such —

if (!TryGetObjectById(1, out var result))

Which spares you of declaring your out variables arguably at the expense of clarity.

This method is idiomatic and found everywhere in the .NET Framework. I actually like it but it still has the problem of losing important information, namely what caused the method to fail: all you get is true or false.

In C# 7.0, the language authors came up with a new solution: they added syntactic sugar to the language to make the tuple solution a bit more appealing —

public (int, Exception) Divide(int a, int b)
{
    if (b == 0)
        return (0, new DivideByZeroException());

    return (a / b, null);
}
...
var (res, err) = Divide(1, 2);
if (err != null) 
{
    // do error handling
}

Suddenly this becomes very familiar to a Go programmer. In the background, this is using a tuple. In fact, you can check that this is so by using the method above like this —

var res = Divide(1, 2);
if (res.Item2 != null)
    // do error handling
// use res.Item1

You will see that res is of type System.ValueTuple. Also, if you create a library in C# 7.0 and then try to use it with a program in older versions of C#, you will see that the exposed type of the method is a tuple. This is actually nice because it means this big language change is backwards compatible.

All that said, I haven’t seen many uses of the new tuple returns in C# code in the wild. Maybe it’s just early (C# 7.0 has been out for only a few months.) Or maybe the try-pattern is simply way too ingrained in the way of doing things in C#. It’s more idiomatic.

I sure prefer the new (Go-like) way.

Casting objects and boxing

Posted on 2017-03-302018-08-20 by Roberto's Chronicles

I’m back from a trip to a customer.

How was it?

Okay. I got more snow that I expected on the way there, so the drive wasn’t much fun. Then again, a part of the trip goes through a beautiful forest that was worth everything else.

Cool!

Also, while showing the customer a new feature, the app crashed.

Typical. Blame it on Murphy!

That’s what I did at first. Then I blamed it on the developer. And then I finally went looking at the C# code to find out why it happened.

What was it?

It turned out to be a rather common but not obvious mistake. See the code below and tell me what is the value of each of doubleF, doubleI, and doubleO.

float f = 0.0;
int i = (int)f;
object o = f;

var doubleF = (double) f;
var doubleI = (double) i;
var doubleO = (double) o;

I’m sensing a catch here, but I’ll bite. They’re all cast from the same original variable f so I’m guessing they’d all end up 0.0…?

You would, wouldn’t you? But you’re wrong.

Waaat?

The final line in that code will throw an InvalidCastException at you — and crash your app if you don’t catch it, as was the case in our app.

Wait what? How? How come you can’t cast 0.0 to double?

Well, you can. For instance, this works perfectly —

var d = (double) 0.0f;

But this doesn’t —

object f = 0.0f;
var d = (double) d;

It makes no sense!

Actually it does. The problem is taking object to mean “anything.” Which incidentally it does, just not the way most people think. You see, object is a type representing Object, which is a class other types inherit from but not all. You can store anything as object because Object boxes whatever object you put in it. It stores the value internally but the compiler doesn’t know what type is stored there.

No no no! I know for a fact that you can too check what type is stored in an object

You’re right, you can. For instance —

object o = /* something */
Console.WriteLine(o.GetType());

This will print the type of whatever you put in the variable o. But this is at run time: the compiler doesn’t know.

That’s why we using casting. If we know for a fact that variable o will contain a, say, int, we can help the compiler and tell it about it with a cast. Remember, when you cast something, you are telling the compiler what type will be stored in the variable. The compiler can’t be held responsible if you lie to it.

Let’s get back at the original problem —

object o = 1.0f;
var d = (double) o;

You told the compiler that o will be a double, but it isn’t. Remember a double is a shorthand for the struct Double as float is for Single. And guess what? A Double is not a Single. When you stored a float in the variable o of type object, the float value was boxed inside an object of type Object. When you cast, the compiler has to unbox whatever was inside o and guess what, the value stored in o is of a different structure, with different methods and storage, than what you told it it was. You could convert between them, but they are not the same.

So the compiler expects an object of type Double but it has a Single and things fail miserably.

But you just said that we can convert between them! Why don’t the compiler does it?

It could. But think of how this would work out in real life. Remember the compiler doesn’t know what will be inside o so it needs to test what the value is. It would need to test if the type is a, say, string. If it is, then convert string to Double. If it isn’t then check if it is a Int32. Then a Int64. Then a DateTime. The number of possibilities is enormous and the compiler would have to generate all this code every time it needs it finds a cast. This would be a lot of code. It would be so much code in fact that you’d be mad not to put it all in separate methods. It would also be slow so the compiler won’t do this by default.

That’s why we have the Convert class, which in turn depends on types implementing the IConvertible interface. Whenever you want to convert a value of TypeA to TypeB, you can use this conversion methods. You can do —

object o = 1.0f;
var d = Convert.ToDouble(o);

The compiler authors had to make a decision: either they’d generate lots of slow code to test for the type and convert the value, or they’d leave the decision for the programmer who can call Convert.ToSomething when needed.

And they chose the former.

Exactly. I believe it was reasonable. If you know something will be of a given type at run time, you can still cast it. Otherwise, you should convert it.

New stuff coming in C# 7.0

Posted on 2016-08-302018-08-20 by Roberto's Chronicles

Mads Torgersen wrote a blog post highlighting what’s new in C# 7.0:

C# 7.0 adds a number of new features and brings a focus on data consumption, code simplification and performance.

The changes all seem to be in line with the recent trends of borrowing syntax sugar from other languages to C#. Nothing wrong with that: copy what’s good and shed what’s bad.

One of the changes is related to out variables. These are the C# way to deal with not being able to return multiple values (see below for good news on that front). It’s basically the same as passing by reference in, say, C. For instance:

int myOutVar;
changeMyOutVar(out myOutVar);

You could have the value of myOutVar set inside changeMyOutVar. Simple. What is changing in C# 7.0 is that you would no longer need to predeclare myOutVar:

changeMyOutVar(out int myOutVar);

The scope of the new variable will be the enclosing block. I have to say it: I don’t like it. It feels obfuscated to me. The variable doesn’t look like it should be in that scope. Compare with this popular Go idiom:

if err := DoSomething(); err != nil {
    return err;
}

The variable err is created inside the if and its scope is there as well. I know a lot of people who hate this for the same reason I don’t like the way the new out variables are to be created in C#. I find it much more clear in Go though.

The feature I absolutely loved to read about is tuples. Error handling in .NET is often done with exceptions, which I find clunky and cumbersome. With tuples, we might be able to move to something more Go-like:

(User, bool) FindUser(string username) {
    var found = _userList.Find(u => u.Username == username);
    if (found == null)
        return null, false;
    return found, true;
}

So we could do something like:

var (user, ok) = FindUser("someusername");
if (!ok) {
    // user not found, deal with this
}

Check his post for more features.