The Roberto Selbach Chronicles

About |  Blog |  Archive

Author: Roberto's Chronicles

Roberto Selbach is a curious gopher.

An LRU in Go (Part 1)

In my job, I often catch myself having to implement a way to keep data in memory so I can use them at a later time. Sometimes this is for caching purposes, sometimes it’s to keep track of what I’ve sent some other service earlier so I can create a diff or something. Reasons are plenty.

I noticed that I keep writing versions of the same thing over and over and when we’re doing that, maybe it’s time to try and create a more generic solution. So let’s start simple, shall we? What we want is to implement a least-recently-used list (LRU) with no fuss.

Naïve LRU

We start with a simple and very naïve implementation (see the full implementation on Github.)

type LRU struct {
    cap int                           // the max number of items to hold
    idx map[interface{}]*list.Element // the index for our list
    l   *list.List                    // the actual list holding the data
}

We don’t have much there, we have a doubly-linked list l to hold the data and a map to serve as an index. In true Go fashion, we create an initializer —

func New(cap int) *LRU {
    l := &LRU{
        cap: cap,
        l:   list.New(),
        idx: make(map[interface{}]*list.Element, cap+1),
    }
    return l
}

This is nice. Package users will initialize the list with something like l := lru.New(1000) and joy will spread thoughout the land. But I strongly believe the empty value of a struct should also be usable, which is not the case here :

var l LRU
l.Add("hello", "world")

This will panic because neither l nor idx are initialized. My go-to solution for this is to always provide a simples unexported initialization function that is run at the beginning of each exported method and makes sure the initialization is done.

func (l *LRU) lazyInit() {
    if l.l == nil {
        l.l = list.New()
        l.idx = make(map[interface{}]*list.Element, l.cap+1)
    }
}

Now, let’s see how we’d add a new item to the list —

func (l *LRU) Add(k, v interface{}) {
    l.lazyInit()

    // first let's see if we already have this key
    if le, ok := l.idx[k]; ok {
        // update the entry and move it to the front
        le.Value.(*entry).val = v
        l.l.MoveToFront(le)
        return
    }
    l.idx[k] = l.l.PushFront(&entry{key: k, val: v})

    if l.cap > 0 && l.l.Len() > l.cap {
        l.removeOldest()
    }
    return
}

We call lazyInit to make sure everything is initialized than we check whether the key already is in our list. If it is, we only need to update the value and move it to the front of the list as it’s now become the most-recently-used.

If, however, the key was not already in our index, then we need to create a new entry and push it to the list. By the way, this is the first time we see entry, which is an unexported type that we actually store in our list.

type entry struct {
    key, val interface{}
}

Finally, we check whether by adding the new element, we just passed the capacity of the list (if cap is less than 1, we treat as there being no limit to capacity, by the way). If we are over capacity, we go ahead and delete the oldest element in the list.

This is a naïve implementation, but it’s also very fast.

BenchmarkAdd/mostly_new-4                2000000            742 ns/op
BenchmarkAdd/mostly_existing-4          10000000            153 ns/op
BenchmarkGet/mostly_found-4             10000000            222 ns/op
BenchmarkGet/mostly_not_found-4         20000000            143 ns/op
BenchmarkRemove/mostly_found-4          10000000            246 ns/op
BenchmarkRemove/mostly_not_found-4      20000000            142 ns/op

Getting and removing elements is in the same ballpark as accessing a built-in Go map. Getting an existing item adds a bit of overhead on top of accessing the map because it also needs to move the element to the front of the list, but that’s a very fast pointer operation so it’s not that bad.

Adding has to check for an existing item as well as adding/moving the list element, so it’s relatively slower, but still good enough for my immediate needs.

There is a big problem though. This implementation is not at all safe for concurrent use. Sure, package users could protect it with a sync.Mutex but we may want to hide the complexity from them.

The (once again naïve) solution is to protect the list and map accesses with a sync.Mutex of our own.

Concurrent-safe Naïve LRU

Not much needs to be changed to make our naïve implementation safe for concurrent use (full implementation on Github.). We add a mutex —

type LRU struct {
    cap int // the max number of items to hold

    sync.Mutex                               // protects the idem and list
    idx        map[interface{}]*list.Element // the index for our list
    l          *list.List                    // the actual list holding the data
}

And at the top of every exported entrypoint we do something like —

func (l *LRU) Add(k, v interface{}) {
    l.Lock()
    defer l.Unlock()
    l.lazyInit()

Et voilà, we’re safe for concurrent use. But let’s see what the benchmarks tell us.

BenchmarkAdd/mostly_new-4            2000000           819 ns/op
BenchmarkAdd/mostly_existing-4      10000000           200 ns/op
BenchmarkGet/mostly_found-4         10000000           277 ns/op
BenchmarkGet/mostly_not_found-4     10000000           219 ns/op
BenchmarkRemove/mostly_found-4       5000000           325 ns/op
BenchmarkRemove/mostly_not_found-4  10000000           220 ns/op

Overall the code is slightly slower due to the (for now minimal) overhead of the mutex operations. However, although we are safe for concurrent use, we’re not really taking advantage of it, the entire code is completely sequential. An Add will prevent anyone from getting an item. Even a call to Len() locks everyone out.

This will be clear if we modify the tests to run in parallel.

BenchmarkAddParallel/mostly_new-4            1000000          1040 ns/op
BenchmarkAddParallel/mostly_existing-4       5000000           433 ns/op
BenchmarkGetParallel/mostly_found-4          5000000           293 ns/op
BenchmarkGetParallel/mostly_not_found-4     10000000           289 ns/op
BenchmarkRemoveParallel/mostly_found-4       5000000           305 ns/op
BenchmarkRemoveParallel/mostly_not_found-4  10000000           291 ns/op

On average, the calls will take longer because most of the times they’ll have to wait for another operation to finish and free the mutex.

In the next post, we’ll try and cut on the locking considerably.

How to use FileServer with Gorilla’s Subrouter

I’ve just spent much more time than I ever wanted to get this right, so here’s how I did it for future reference.

I have a function that returns an http.Handler, kind of like this:

func Handler(prefix string) http.Handler {
    r := mux.NewRouter().PathPrefix(prefix).Subrouter()
    r.HandleFunc("/foo/", fooHandler).Methods("GET")
    r.HandleFunc("/bar/", barHandler).Methods("GET")
    return r
}

The prefix could be something like “/api/” or “/ui” or whatever, so this http.Handler will serve /api/foo and /api/bar, for example. So far so good.

Now, I also want to serve some static files under the prefix’s “root” (again, something like “/api/”.) My initial thinking was something like this:

r.Handle("/", http.StripPrefix(prefix, http.Dir("./somedir")))

It works fine for anything under the root of “./somedir” but it failed with a 404 error for anything under different subdirectories. I found some answers online, but they never seemed to use a subrouter and thus didn’t work for me at all.

I finally figured it out:

r.PathPrefix("/").Handler(http.StripPrefix(prefix, http.FileServer(http.Dir("./somedir"))))

It wasn’t obvious to me at all.

SciTech links (November 4 2017)

China wants to be the first country to build a practical space-based solar power station. Space-based solar power would presumably be much more sustainable and clean than fossil fuels and more efficient than the current sustainable energy sources we have on Earth. There are many problems still to be solved before these power plants can exist so China’s expectations are more wishful thinking than based on reality.

Astronomers have found a massive gas-giant that is astonishingly 23.9% as massive as the star it orbits. For comparison, Jupiter is the largest planet in our solar system but its mass is only 0.09% of that of the Sun.

An interesting article about an accident involving a Minuteman I nuclear missile 53 years ago. A short-circuit caused one of the retrorockets to fire and as a result the cone containing the nuclear warhead detached and fell into the silo. Ultimately the warhead casing was intact and nothing more serious happened, but the story reminded me of the Damascus Incident in which a fuel leak on the Titan caused a major accident. Eric Schlosser’s book, Command and Control: Nuclear Weapons, the Damascus Accident, and the Illusion of Safety, is a fantastic tale of this and other “broken arrow” incidents.

Science and tech links (October 21, 2017)

Rodney Brooks writes about the 7 Deadly Sins of AI Predictions. The text articulates a lot of my personal doubts about much of what I see on the news regarding AI. Some people seemed to read this article as a rebutal of AI itself, which I find puzzling as I did not read that at all. If anything, Brooks seems to believe AI will be much bigger than we can possible imagine. He does talk about predictions being flawed. I can’t argue with that.

The Canadian Space Agency is worried about Canada’s vulnerability to solar flares and the like. As a first step, it has published a request for proposals to run studies on the effects of space weather events on the Canadian infrastructure (PDF). The study won’t focus only on cataclysmic scenarios but also at more-routine effects of space weather, like solar wind, flares that affect radio frequencies, radiation storms that send particles towards us and geomagnetic storms that can disrupt satellite systems and other technologies. If you know me, you’ll know a solar flare frying the electrical grid is my nightmare scenario.

After six years in service, the first Chinese space station, Tiangong 1,
is set to fall back to earth sometime in the next few months. Still not clear exactly when, it should happen late this year or early 2018. I’d love to get my hand at a piece of that. As a space exploration geek, I find it incredibly sad that the Chinese space program is relatively secret. I’d love to know more.

Caddy and the Importance of Licenses

I haven’t commented on the recent brouhaha caused by Caddy‘s decision to offer commercial licences, so I’ll do it briefly here before moving to the important part.

I am fine with it. I don’t love it, but it’s fine. The Caddy authors have every right to try to profit from their work. Best of luck to them, they deserve it. Do I think they mangled the announcement? Yes. Do I think the amount of vitriol out there was justified? No. But again, it’s fine.

But I want to talk about something else and I’ll use this episode to illustrate it. Matt Holt published his thoughts on the experience in The Realities of Being a FOSS Maintainer. It’s a nice read, but there is something there that I think we should not overlook.

Midway through Matt’s post, he clarifies the situation with their build server, that they removed[note]Most likely made private[/note] from Github.

To clarify, the Caddy build server was once open source, but we closed it up in the interest of focusing the technical attention of our community and our limited development resources (mostly time) on Caddy itself. The build server is not generalizable, and only exists to serve the Caddy project. As such, we’re taking it under our wings to develop and maintain it as needed. If you find some old source code still online, be aware that no license file was added to the code, and we have not granted others any license to use it.

This highlights the importance of checking the license of “FOSS” software. Being open source means something. It doesn’t just mean “hosted on Github.” Just because you find a piece of code on Github, it doesn’t mean you can freely use it. It sucks, but as the above paragraph shows, it matters.

What Matt is saying here is that although his build server was open source, it no longer is and if you have the code, you were never granted any license to use it. This cannot be, of course. Either it was never open source to begin with, or you were granted a license to use it. Which one is it?

Since Matt makes it clear that “no license file was added to the code,” that means it’s the former: it was never open source, no matter what he says now. Whether intentionally or not, people were misled into thinking it was.

People would find the code on Github and assume it was open source. That’s why checking the license is important. A project without a license is not open source and you are at risk.

I’ve seen small projects on Github before with no license information at all. It always made me uncomfortable. Now I see I was right.

I want to make clear this is not about Caddy or Matt. Again, I’m fine with their decision. My points are general:

  • Properly license your open source software.
  • Check the license of software you use.

If you don’t, this will get back to bite you.

Returns in Go and C#

Whenever someone posts anything related to the Go programming language on Hacker News, it never takes long before someone complains about error handling. I find it interesting because it is exactly one of things I like the most about Go.

I don’t want to specifically talk about error handling though. I want to talk about a feature that is intrinsically tied to it in Go: the ability of functions to return multiple values

For instance, in Go it is common and idiomatic to write functions like this —

func Divide(a, b float64) (float64, error) {
    if b == 0 {
        return 0.0, errors.New("divide by zero")
    }
    return a / b, nil
}

So the caller would do:

result, err := Divide(x, y)
if err != nil {
    // do error handling...
}

Some people deplore this. I absolutely love it. I find it so much clearer than, for instance, what we often have to do in C#. You see, C# didn’t have multiple returns (until very recently; see below) so you ended up with a few options.

First, you can simple throw exceptions.

public SomeObject GetObjectById(int id) {
    if (!SomeObjectRepo.Has(id))
        throw new ArgumentOutOfRangeException(nameof(id));
    // ...
}
...
try
{
    var obj = GetObjectById(1);
    // do something with obj
}
catch (ArgumentOutOfRangeException ex)
{
    //  error handling
}

I find the flow difficult to read. Particularly because variables are scoped within the try-catch so often you need to first declare something above the try and then test it after the catch.

A second option is to return null:

public SomeObject GetObjectById(int id)
{
    if (!SomeObjectRepo.Has(id))
        return null;

    // go get the object
}
...
var obj = GetObjectById(1);
if (obj == null) 
{
    // do error handling
}

This looks closer to what I like but it still has some serious downsides. You don’t get any error information. What made it fail? I don’t know. As well, this doesn’t work for non-nullable types. A method returning a, say, int cannot return null. Sure, you could return int? instead of int and then test for .HasValue but that’s cumbersome and artificial.

A third option is the use of a generic return type. Something like —

public class Result<T>
{
    public T Value {get;protected set;}
    public Exception Exception {get; protected set;}

    public bool IsError => Exception != null;

    public Result() : this(default(T)) {}
    public Result(T value)
    {
        Value = value;
    }

    public static Result<T> MakeError(Exception exception)
    {
        return new Result<T>
        {
            Value = default(T),
            Exception = exception
        };
    }
}

You could then use this to return values like —

public Result<int> Divide(int a, int b)
{
    if (b == 0)
    {
        return Result<int>.MakeError(new DivideByZeroException());
    }

    return new Result<int>(a / b);
}
...
var res = Divide(8, 4);
if (res.IsError)
{
    // do error handling, e.g.
    throw res.Exception;
}
// do something with res.Value (2)

This works, but it looks artificial. You need to create instances of Result<T> all around all the time. It is not that bad if your codebase uses this throughout and it becomes automatic for all programmers envolved. When it’s an exception to the rule, it is horrible.

A very similar solution is to return something like Tuple<T1, T2, ...>

public Tuple<int,Exception> Divide(int a, int b)
{
    if (b == 0)
        return new Tuple<int,Exception>(0, new DivideByZeroException());
    return new Tuple<int,Exception>(a/b, null);
}
...
var res = Divide(1, 2);
if (res.Item2 != null) // Item2 is the exception
{
    // do error handling
}
// do something with res.Item1

Same principle. It’s ugly and artificial, but it will come back to us.

The way the C# authors found to work around this problem is the idiomatic try-pattern, which consists in creating non-exception-throwing versions of methods. For example, if we go back to the first C# example above (GetObjectById()), we could create a second method like so —

public bool TryGetObjectById(int id, out SomeObject result) {
    try 
    {
        result = GetObjectById(id);
        return true;
    }
    catch
    {
        result = default(SomeObject);
        return false;
    }
}
...
SomeObject result;
if (!TryGetObjectById(1, out result))
{
    // do error handling
}
// do something with result

Note that ever since C# 7.0 you can declare the out variable directly inside the method call as such —

if (!TryGetObjectById(1, out var result))

Which spares you of declaring your out variables arguably at the expense of clarity.

This method is idiomatic and found everywhere in the .NET Framework. I actually like it but it still has the problem of losing important information, namely what caused the method to fail: all you get is true or false.

In C# 7.0, the language authors came up with a new solution: they added syntactic sugar to the language to make the tuple solution a bit more appealing —

public (int, Exception) Divide(int a, int b)
{
    if (b == 0)
        return (0, new DivideByZeroException());

    return (a / b, null);
}
...
var (res, err) = Divide(1, 2);
if (err != null) 
{
    // do error handling
}

Suddenly this becomes very familiar to a Go programmer. In the background, this is using a tuple. In fact, you can check that this is so by using the method above like this —

var res = Divide(1, 2);
if (res.Item2 != null)
    // do error handling
// use res.Item1

You will see that res is of type System.ValueTuple. Also, if you create a library in C# 7.0 and then try to use it with a program in older versions of C#, you will see that the exposed type of the method is a tuple. This is actually nice because it means this big language change is backwards compatible.

All that said, I haven’t seen many uses of the new tuple returns in C# code in the wild. Maybe it’s just early (C# 7.0 has been out for only a few months.) Or maybe the try-pattern is simply way too ingrained in the way of doing things in C#. It’s more idiomatic.

I sure prefer the new (Go-like) way.

SciTech Links (June 5, 2017)

Casting objects and boxing

I’m back from a trip to a customer.

How was it?

Okay. I got more snow that I expected on the way there, so the drive wasn’t much fun. Then again, a part of the trip goes through a beautiful forest that was worth everything else.

Cool!

Also, while showing the customer a new feature, the app crashed.

Typical. Blame it on Murphy!

That’s what I did at first. Then I blamed it on the developer. And then I finally went looking at the C# code to find out why it happened.

What was it?

It turned out to be a rather common but not obvious mistake. See the code below and tell me what is the value of each of doubleF, doubleI, and doubleO.

float f = 0.0;
int i = (int)f;
object o = f;

var doubleF = (double) f;
var doubleI = (double) i;
var doubleO = (double) o;

I’m sensing a catch here, but I’ll bite. They’re all cast from the same original variable f so I’m guessing they’d all end up 0.0…?

You would, wouldn’t you? But you’re wrong.

Waaat?

The final line in that code will throw an InvalidCastException at you — and crash your app if you don’t catch it, as was the case in our app.

Wait what? How? How come you can’t cast 0.0 to double?

Well, you can. For instance, this works perfectly —

var d = (double) 0.0f;

But this doesn’t —

object f = 0.0f;
var d = (double) d;

It makes no sense!

Actually it does. The problem is taking object to mean “anything.” Which incidentally it does, just not the way most people think. You see, object is a type representing Object, which is a class other types inherit from but not all. You can store anything as object because Object boxes whatever object you put in it. It stores the value internally but the compiler doesn’t know what type is stored there.

No no no! I know for a fact that you can too check what type is stored in an object

You’re right, you can. For instance —

object o = /* something */
Console.WriteLine(o.GetType());

This will print the type of whatever you put in the variable o. But this is at run time: the compiler doesn’t know.

That’s why we using casting. If we know for a fact that variable o will contain a, say, int, we can help the compiler and tell it about it with a cast. Remember, when you cast something, you are telling the compiler what type will be stored in the variable. The compiler can’t be held responsible if you lie to it.

Let’s get back at the original problem —

object o = 1.0f;
var d = (double) o;

You told the compiler that o will be a double, but it isn’t. Remember a double is a shorthand for the struct Double as float is for Single. And guess what? A Double is not a Single. When you stored a float in the variable o of type object, the float value was boxed inside an object of type Object. When you cast, the compiler has to unbox whatever was inside o and guess what, the value stored in o is of a different structure, with different methods and storage, than what you told it it was. You could convert between them, but they are not the same.

So the compiler expects an object of type Double but it has a Single and things fail miserably.

But you just said that we can convert between them! Why don’t the compiler does it?

It could. But think of how this would work out in real life. Remember the compiler doesn’t know what will be inside o so it needs to test what the value is. It would need to test if the type is a, say, string. If it is, then convert string to Double. If it isn’t then check if it is a Int32. Then a Int64. Then a DateTime. The number of possibilities is enormous and the compiler would have to generate all this code every time it needs it finds a cast. This would be a lot of code. It would be so much code in fact that you’d be mad not to put it all in separate methods. It would also be slow so the compiler won’t do this by default.

That’s why we have the Convert class, which in turn depends on types implementing the IConvertible interface. Whenever you want to convert a value of TypeA to TypeB, you can use this conversion methods. You can do —

object o = 1.0f;
var d = Convert.ToDouble(o);

The compiler authors had to make a decision: either they’d generate lots of slow code to test for the type and convert the value, or they’d leave the decision for the programmer who can call Convert.ToSomething when needed.

And they chose the former.

Exactly. I believe it was reasonable. If you know something will be of a given type at run time, you can still cast it. Otherwise, you should convert it.

Retreating back into my bubble

A few months back I decided to try and burst out of my bubble. I then decided to follow some public figures from all sides on Facebook and Twitter. On Facebook this is particularly weird because you’re forced to like the page. So it tells the world “Roberto likes Mrs. Public-Figure”, which is sometimes undesirable.

Still, I wanted to see what both sides of the political spectrum were saying. Also, I consider myself a centrist so I expected to agree with everyone on at least something.

Anyway, my town suffered a terrorist attack a couple of days ago and as soon as the identity of the suspect became known, the media started drawing conclusions based on who he “liked” on Facebook. That got me thinking: someone will eventually go through my social media and conclude I believe in X because I “like” Y on FB, even though I may only “like” Y because I want to be informed and not because I necessarily agree with them.

There are also reports that US Border agents now require people to provide social network credentials so that their political leanings can be attested. Regardless on my personal opinions about this, the fact is that someone with access to my FB account can quickly draw the conclusion that I lean this or that way because of the pages I like, even though I liked them only to be informed.

I then realized that it’s time to give up on my bubble-bursting experiment. I “unliked” pretty much every public figure on Facebook.