Locking Performance for Concurrent Updates to an In-Memory Cache in Go

Here is the situation: an application maintains an in-memory cache. The cache is accessed by two threads (goroutines) concurrently. The first one constantly tries to read from the cache: very busy. The other goroutine updates the cache, relatively rarely (maybe a few times an hour). What are some synchronization/locking methods and how do they compare?

To make matters definite, let’s assume that the cache is simply a map, of arbitrary key and value types; for simplicity, we will assume both to be of type int.

There are only two goroutines: the reader, which accesses the cache constantly and without interruption, and the writer, which accesses the cache very rarely by comparison. Furthermore, we will assume that the writer will always replace the entire cache with a new version: there is no need for element-by-element updates.

What is an appropriate locking strategy, given this particular access pattern?

Approach 1: No Synchronization

One approach is to dispense with synchronization altogether! Because the writer accesses the shared resource so rarely, conflicts are going to be rare as well: only if a read happens at exactly the moment that a write occurs. It is less clear what happens when there is a conflict: the reader may read garbage data, or possibly be unable to access the cache at all and panic. Given that a conflict occurs at most once for about every 30 billion reads, this may be an acceptable failure rate. (In particular when combined with an automated restart/recovery facility in the case of a panic.)

The advantages of this approach are the extreme simplicity of the code, and the fact that no synchronization slows down the reads. The disadvantages include the possibility of occasionally corrupted data or system crashes, and the undeniable shame of simply doing it wrong.

type Cache1 struct {
	data map[int]int
}

func (c *Cache1) Update() {
	tmp := newData()
	c.data = tmp
}

func (c *Cache1) Read(key int) (int, bool) {
	v, ok := c.data[key]
	return v, ok
}

Approach 2: Shared Cache and Mutex

The standard textbook approach is to protect the shared resource with a mutex. The problem is that now each read must lock the mutex, perform the read, and release the mutex. Given that with a probability of 0.99999999999 ($= 1-10^{-11}$) or so the lock will be uncontested, the reader is extremely unlikely to wait, yet we would still expect a performance hit from acquiring and releasing the lock so frequently.

Note that a RWMutex does not help in this particular scenario: using a RWMutex, multiple readers can hold a read lock, but only a single goroutine can hold a write lock. But because we only have a single reader, this optimization does not help us in our specific situation.

type Cache2 struct {
	mtx  sync.Mutex
	data map[int]int
}

func (c *Cache2) Update() {
	tmp := newData()

	c.mtx.Lock()
	c.data = tmp
	c.mtx.Unlock()
}

func (c *Cache2) Read(key int) (int, bool) {
	c.mtx.Lock()
	defer c.mtx.Unlock()

	v, ok := c.data[key]
	return v, ok
}

Approach 3: Copied Cache and Channel

Whenever we have many more reads than writes, we should consider an approach based on optimistic locking. Our third solution is based on this idea: what if we could give the reader its own copy of the cache (that is, a copy that will never be written to), and then signal to the reader when a new version of the cache is available. In this case, there is no need to lock the shared resource (because at any moment it is guaranteed that it will be accessed by only one goroutine) — but now there is a need to send a signal securely from one goroutine (the writer) to another (the reader).

Fortunately, Go offers us a tool to do exactly that: channels. Upon some further reflection, we also realize that instead of sending a “signal” over the channel, we might as well send the freshly populated new version of the cache itself!

type Cache3 struct {
	ch   chan map[int]int
	data map[int]int
}

func (c *Cache3) Update() {
	tmp := newData()
	c.ch <- tmp
}

func (c *Cache3) Read(key int) (int, bool) {
	select {
	case c.data = <-c.ch:
	default:
	}

	v, ok := c.data[key]
	return v, ok
}

The principal mechanism that makes this work is the select/default block in the Read() function. As a reminder: a pure select statement (without a default clause) will block, until an event occurs on one of the enclosed channels. But with a default clause, the select statement will never block, even if no event is available on any of the channels.

That’s exactly the behavior we want for our scenario.

Benchmarks

How do the different versions perform? How much overhead does the second version incur, compared to the first, due to the need to constantly acquire and release the mutex? And, is the third version, using the channel, actually any faster than the second?

I will not reproduce the full program here (it is available for download below), but only show the results. (Taken on my own, rather dated hardware.)

Algorithm Nanoseconds per Read Slowdown Factor
No Locking 34 1.0
Mutex 48 1.4
Channel 37 1.1

The results are pretty clear: the frequent locking and unlocking incurs are rather painful penalty. By comparison, the overhead associated with the channel is significantly smaller.

Discussion

The third solution (using a channel) has a certain conceptual elegance, in particular the notion that it avoids a shared resource, but instead hands off ownership of the cache from writer to reader. Its main drawback is that it relies on the specific behavior of the select/default block — but the reality is that it’s precisely the select/default idiom that makes channels useful! (Channels that block are relatively unwieldy objects, by comparison.)

It is interesting to observe that whatever synchronization Go does under the covers to enable channel semantics is indeed significantly faster than explicitly locking and releasing a mutex. (I wasn’t sure about this point in particular, until I ran the benchmark. In fact, I was prepared to find the channel to be slower than the bare mutex, but I was wrong about that.)

Other Options

Are there other solutions? Are there other correct solutions?

One approach I briefly considered is based on the second solution, using an explicit mutex, but instead of acquiring and releasing the mutex for each read operation, the program could acquire and hold the mutex for, say, 1000 reads, and only release it after completing them. The problem with this kind of ad-hoc fix is that they easily lead to nasty edge cases. For instance, what if the read traffic stops, for whatever reason, before 1000 reads are complete? Now the cache won’t be updated, and will be badly out of date when the traffic picks up again. This means that now we need to keep track not only of the number of reads, but also implement a timeout to release the mutex — let’s just stop here, ok?

Another question is whether there is a faster (and easier) way to signal to the reader that a new version of the cache is available. For instance, the reader could examine a version number of the cache, and compare it with the version it is currently using, and somehow obtain the new version of the cache, if needed. (That is the standard form of optimistic locking.) The problem is that access to this version number must be synchronized, either using a mutex, or using a channel, so we are back to where we were. (There may be ways to achieve this effect using methods in the sync/atomic package, but I am not eager to try.)

Finally, I did not contemplate the scenario where there is more than one reader. That is a different problem, with different solutions.

Resources

My complete benchmarking program can be found here: