Locking Performance for Concurrent Updates to an In-Memory Cache in Go
Here is the situation: an application maintains an in-memory cache. The cache is accessed by two threads (goroutines) concurrently. The first one constantly tries to read from the cache: very busy. The other goroutine updates the cache, relatively rarely (maybe a few times an hour). What are some synchronization/locking methods and how do they compare?
To make matters definite, let’s assume that the cache is simply a
map, of arbitrary key and value types; for simplicity, we will
assume both to be of type int
.
There are only two goroutines: the reader, which accesses the cache constantly and without interruption, and the writer, which accesses the cache very rarely by comparison. Furthermore, we will assume that the writer will always replace the entire cache with a new version: there is no need for element-by-element updates.
What is an appropriate locking strategy, given this particular access pattern?
Approach 1: No Synchronization
One approach is to dispense with synchronization altogether! Because the writer accesses the shared resource so rarely, conflicts are going to be rare as well: only if a read happens at exactly the moment that a write occurs. It is less clear what happens when there is a conflict: the reader may read garbage data, or possibly be unable to access the cache at all and panic. Given that a conflict occurs at most once for about every 30 billion reads, this may be an acceptable failure rate. (In particular when combined with an automated restart/recovery facility in the case of a panic.)
The advantages of this approach are the extreme simplicity of the code, and the fact that no synchronization slows down the reads. The disadvantages include the possibility of occasionally corrupted data or system crashes, and the undeniable shame of simply doing it wrong.
type Cache1 struct {
data map[int]int
}
func (c *Cache1) Update() {
tmp := newData()
c.data = tmp
}
func (c *Cache1) Read(key int) (int, bool) {
v, ok := c.data[key]
return v, ok
}
Approach 2: Shared Cache and Mutex
The standard textbook approach is to protect the shared resource with a mutex. The problem is that now each read must lock the mutex, perform the read, and release the mutex. Given that with a probability of 0.99999999999 ($= 1-10^{-11}$) or so the lock will be uncontested, the reader is extremely unlikely to wait, yet we would still expect a performance hit from acquiring and releasing the lock so frequently.
Note that a RWMutex
does not help in this particular scenario:
using a RWMutex
, multiple readers can hold a read lock, but
only a single goroutine can hold a write lock. But because we only
have a single reader, this optimization does not help us in our
specific situation.
type Cache2 struct {
mtx sync.Mutex
data map[int]int
}
func (c *Cache2) Update() {
tmp := newData()
c.mtx.Lock()
c.data = tmp
c.mtx.Unlock()
}
func (c *Cache2) Read(key int) (int, bool) {
c.mtx.Lock()
defer c.mtx.Unlock()
v, ok := c.data[key]
return v, ok
}
Approach 3: Copied Cache and Channel
Whenever we have many more reads than writes, we should consider an approach based on optimistic locking. Our third solution is based on this idea: what if we could give the reader its own copy of the cache (that is, a copy that will never be written to), and then signal to the reader when a new version of the cache is available. In this case, there is no need to lock the shared resource (because at any moment it is guaranteed that it will be accessed by only one goroutine) — but now there is a need to send a signal securely from one goroutine (the writer) to another (the reader).
Fortunately, Go offers us a tool to do exactly that: channels. Upon some further reflection, we also realize that instead of sending a “signal” over the channel, we might as well send the freshly populated new version of the cache itself!
type Cache3 struct {
ch chan map[int]int
data map[int]int
}
func (c *Cache3) Update() {
tmp := newData()
c.ch <- tmp
}
func (c *Cache3) Read(key int) (int, bool) {
select {
case c.data = <-c.ch:
default:
}
v, ok := c.data[key]
return v, ok
}
The principal mechanism that makes this work is the
select/default
block in the Read()
function. As
a reminder: a pure select
statement (without a
default
clause) will block, until an event occurs
on one of the enclosed channels. But with a default
clause, the select
statement will never block,
even if no event is available on any of the channels.
That’s exactly the behavior we want for our scenario.
Benchmarks
How do the different versions perform? How much overhead does the second version incur, compared to the first, due to the need to constantly acquire and release the mutex? And, is the third version, using the channel, actually any faster than the second?
I will not reproduce the full program here (it is available for download below), but only show the results. (Taken on my own, rather dated hardware.)
Algorithm | Nanoseconds per Read | Slowdown Factor |
---|---|---|
No Locking | 34 | 1.0 |
Mutex | 48 | 1.4 |
Channel | 37 | 1.1 |
The results are pretty clear: the frequent locking and unlocking incurs are rather painful penalty. By comparison, the overhead associated with the channel is significantly smaller.
Discussion
The third solution (using a channel) has a certain conceptual
elegance, in particular the notion that it avoids a shared
resource, but instead hands off ownership of the cache from
writer to reader. Its main drawback is that it relies on the
specific behavior of the select/default
block — but
the reality is that it’s precisely the select/default
idiom
that makes channels useful! (Channels that block are relatively
unwieldy objects, by comparison.)
It is interesting to observe that whatever synchronization Go does under the covers to enable channel semantics is indeed significantly faster than explicitly locking and releasing a mutex. (I wasn’t sure about this point in particular, until I ran the benchmark. In fact, I was prepared to find the channel to be slower than the bare mutex, but I was wrong about that.)
Other Options
Are there other solutions? Are there other correct solutions?
One approach I briefly considered is based on the second solution, using an explicit mutex, but instead of acquiring and releasing the mutex for each read operation, the program could acquire and hold the mutex for, say, 1000 reads, and only release it after completing them. The problem with this kind of ad-hoc fix is that they easily lead to nasty edge cases. For instance, what if the read traffic stops, for whatever reason, before 1000 reads are complete? Now the cache won’t be updated, and will be badly out of date when the traffic picks up again. This means that now we need to keep track not only of the number of reads, but also implement a timeout to release the mutex — let’s just stop here, ok?
Another question is whether there is a faster (and easier) way to
signal to the reader that a new version of the cache is available. For
instance, the reader could examine a version number of the cache, and
compare it with the version it is currently using, and somehow obtain
the new version of the cache, if needed. (That is the standard form of
optimistic locking.) The problem is that access to this version number
must be synchronized, either using a mutex, or using a channel, so
we are back to where we were. (There may be ways to achieve this
effect using methods in the sync/atomic
package, but I am not
eager to try.)
Finally, I did not contemplate the scenario where there is more than one reader. That is a different problem, with different solutions.
Resources
My complete benchmarking program can be found here: