Playing with Go module proxies

(This article has been graciously translated to Russian here. Huge thanks to Akhmad Karimov.)

I wrote a brief introduction to Go modules and in it I talked briefly about Go modules proxies and now that Go 1.11 is out, I thought I’d play a bit these proxies to figure our how they’re supposed to work.

Why

One of the goals of Go modules is to provide reproducible builds and it does a very good job by fetching the correct and expected files from a repository.

But what if the servers are offline? What if the repository simply vanishes?

One way teams deal with these risks is by vendoring the dependencies, which is fine. But Go modules offers another way: the use of a module proxy.

The Download Protocol

When Go modules support is enabled and the go command determines that it needs a module, it first looks at the local cache (under $GOPATH/pkg/mods). If it can’t find the right files there, it then goes ahead and fetches the files from the network (i.e. from a remote repo hosted on Github, Gitlab, etc.)

If we want to control what files go can download, we need to tell it to go through our proxy by setting the GOPROXY environment variable to point to our proxy’s URL. For instance:

export GOPROXY=http://gproxy.mycompany.local:8080

The proxy is nothing but a web server that responds to the module download protocol, which is a very simple API to query and fetch modules. The web server may even serve static files.

A typical scenario would be the go command trying to fetch github.com/pkg/errors:

The first thing go will do is ask the proxy for a list of available versions. It does this by making a GET request to /{module name}/@v/list. The server then responds with a simple list of versions it has available:

v0.8.0
v0.7.1

The go will determine which version it wants to download — the latest unless explicitly told otherwise1. It will then request information about that given version by issuing a GET request to /{module name}/@v/{module revision} to which the server will reply with a JSON representation of the struct:

type RevInfo struct {
    Version string    // version string
    Name    string    // complete ID in underlying repository
    Short   string    // shortened ID, for use in pseudo-version
    Time    time.Time // commit time
}

So for instance, we might get something like this:

{
    "Version": "v0.8.0",
    "Name": "v0.8.0",
    "Short": "v0.8.0",
    "Time": "2018-08-27T08:54:46.436183-04:00"
}

The go command will then request the module’s go.mod file by making a GET request to /{module name}/@v/{module revision}.mod. The server will simply respond with the contents of the go.mod file (e.g. module github.com/pkg/errors.) This file may list additional dependencies and the cycle restarts for each one.

Finally, the go command will request the actual module by getting /{module name}/@v/{module revision}.zip. The server should respond with a byte blob (application/zip) containing a zip archive with the module files where each file must be prefixed by the full module path and version (e.g. github.com/pkg/errors@v0.8.0/), i.e. the archive should contain:

github.com/pkg/errors@v0.8.0/example_test.go
github.com/pkg/errors@v0.8.0/errors_test.go
github.com/pkg/errors@v0.8.0/LICENSE
...

And not:

errors/example_test.go
errors/errors_test.go
errors/LICENSE
...

This seems like a lot when written like this, but it’s in fact a very simple protocol that simply fetches 3 or 4 files:

  1. The list of versions (only if go does not already know which version it wants)
  2. The module metadata
  3. The go.mod file
  4. The module zip itself

Creating a simple local proxy

To try out the proxy support, let’s create a very basic proxy that will serve static files from a directory. First we create a directory where we will store our in-site copies of our dependencies. Here’s what I have in mine:

$ find . -type f
./github.com/robteix/testmod/@v/v1.0.0.mod
./github.com/robteix/testmod/@v/v1.0.1.mod
./github.com/robteix/testmod/@v/v1.0.1.zip
./github.com/robteix/testmod/@v/v1.0.0.zip
./github.com/robteix/testmod/@v/v1.0.0.info
./github.com/robteix/testmod/@v/v1.0.1.info
./github.com/robteix/testmod/@v/list

 

These are the files our proxy will serve. You can find these files on Github if you’d like to play along. For the examples below, let’s assume we have a devel directory under our home directory; adapt accordingly.

$ cd $HOME/devel
$ git clone https://github.com/robteix/go-proxy-blog.git

Our proxy server is simple (it could be even simpler, but I wanted to log the requests):

package main

import (
    "flag"
    "log"
    "net/http"
)

func main() {
    addr := flag.String("http", ":8080", "address to bind to")
    flag.Parse()

    dir := "."
    if flag.NArg() > 0 {
        dir = flag.Arg(0)
    }

    log.Printf("Serving files from %s on %s\n", dir, *addr)

    h := handler{http.FileServer(http.Dir(dir))}

    panic(http.ListenAndServe(*addr, h))
}

type handler struct {
    h http.Handler
}

func (h handler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
    log.Println("New request:", r.URL.Path)
    h.h.ServeHTTP(w, r)
}

Now run the code above:

$ go run proxy.go -http :8080 $HOME/devel/go-proxy-blog
2018/08/29 14:14:31 Serving files from /home/robteix/devel/go-proxy-blog on :8080

$ curl http://localhost:8080/github.com/robteix/testmod/@v/list
v1.0.0
v1.0.1

Leave the proxy running and move to a new terminal. Now let’s create a new test program. we create a new directory $HOME/devel/test and create a file named test.go inside it with the following code:

package main

import (
    "github.com/robteix/testmod"
)

func main() {
    testmod.Hi("world")
}

And now, inside this directory, let’s enable Go modules:

$ go mod init test

And we set the GOPROXY variable:

export GOPROXY=http://localhost:8080

Now let’s try building our new program:

$ go build
go: finding github.com/robteix/testmod v1.0.1
go: downloading github.com/robteix/testmod v1.0.1

And if you check the output from our proxy:

2018/08/29 14:56:14 New request: /github.com/robteix/testmod/@v/list
2018/08/29 14:56:14 New request: /github.com/robteix/testmod/@v/v1.0.1.info
2018/08/29 14:56:14 New request: /github.com/robteix/testmod/@v/v1.0.1.mod
2018/08/29 14:56:14 New request: /github.com/robteix/testmod/@v/v1.0.1.zip

So as long as GOPROXY is set, go will only download files from out proxy. If I go ahead and delete the repository from Github, things will continue to work.

Using a local directory

It is interesting to note that we don’t even need our proxy.go at all. We can set GOPROXY to point to a directory in the filesystem and things will still work as expected:

export GOPROXY=file://home/robteix/devel/go-proxy-blog

If we do a go build now2, we’ll see exactly the same thing as with the proxy:

$ go build
go: finding github.com/robteix/testmod v1.0.1
go: downloading github.com/robteix/testmod v1.0.1

Of course, in real life, we probably will prefer to have a company/team proxy server where our dependencies are stored, because a local directory is not really much different from the local cache that go already maintains under $GOPATH/pkg/mod, but still, nice to know that it works.

There is a project called Athens that is building a proxy and that aims — if I don’t misunderstand it — to create a central repository of packages à la npm.

  • Remember that somepackage and somepackage/v2 are treated as different packages. 
  • That’s not strictly true as now that we’ve already built it once, go has cached the module locally and will not go to the proxy (or the network) at all. You can still force it by deleting $GOPATH/pkg/mod/cache/download/github.com/robteix/testmod/ and $GOPATH/pkg/mod/github.com/robteix/testmod@v1.0.1