Caching done right!

Development time!

By: Kees Hessels, Jan 2024.

 

Let's have a chat about caching from a developer's perspective.

As a developer we often require compound objects (an object build out of a multiple other objects) to be cached. A product with price and content information for example. Caching becomes even more important now that many companies move to cloud based database services that have relatively high latency times. Although caching was often seen as an afterthought (it used to be that caching was something we do when the product goes live), one should consider the scope of a new project and perhaps BEGIN with caching as the foundation of the project. Having the right caching integrated into your project from the beginning often not only speeds up processing and improves customer perception, but can simplify development by a factor while at the same time shorten project development time.

Let me explain: (2 minute read)

I have seen quite a few caching implementations, and to be honest, usually the implementations have always bothered me to some extent. I have seen developers using sql statements as a key (yes I literally mean the whole cleaned up sql string is being used as a key...) and sometimes we see database objects stored in cache based on an id, which is created and maintained by the application. If the id did not exist anymore (because they purposely expire the cached object) they would query the db and store the new object in cache. Sometimes objects are stored in cache having an expiration, That meant when many objects expired at once, they still had their db going down because of all the new requests. Never have I actually seen a caching mechanism that I thought was actually developed the "right way" from the ground up and NOT using expiration of objects!

Last year I finished a distributed POS project for a large US retail company that was build for a main office and multiple brick and mortar outlets in various countries. The outlets utilized intelligent shopping carts that updated the client's app with real-time information of all products in their cart(s). This project forced me to profoundly rethink the requirements of caching.

Generic caching pattern

So let us write down some requirements:

  • The caching mechanism MUST be separated from all other logic. By this I mean that if a developer for example works with products, he\she should not have to deal with the cache objects for these products. The reason is simple: A product can be modified by different devs. Each dev can have a slightly different approach. This can lead to ambiguity. Ambiguity leads to bugs.
  • You need to be able to rely on cached objects. By this I mean that if a product for example is cached then that object is current and valid. If the object is not in memory IT DOES NOT EXIST!
  • You need to be able to use these cached objects inside your controllers for non-persistent operations.
  • We can not trust on every (or any) id being present in the cached object. (for example we don't want to filter objects being sent to the clients)
  • Many brick and mortar companies require products to be distributed. For example: A POS can have a different product list for each outlet. This means each outlet can have its own unique cache, and we want limit data flows to their products only. So the caching mechanism needs to be able to handle this.

There are several ways to go about this. One of these ways is to use replication to have an in-memory database. Although it is quite elegant, it does use a lot of processing and often way more space than intended. The other way involves a mechanism that although a bit more complex, very compatible in distributed environments. More importantly, although it is a somewhat elaborated setup, it is a VERY clean mechanism.

TLDR: The key principle is that instead of letting the application being responsible for updating your cache objects, you let your database drive the management your cached objects.

Let your database drive the updating of your caches

We do this by using the replication journal (the binlog) of the database server. We can connect to the binlog and tell it to which events we want to subscribe. For example: we can subscribe to updates of the product content table or the product prices table. In order to cache a compound object, we need to construct a signature array for that object. A signature is basically a table\id key values array that constitutes the cached object and the cache key of the cached object.

The cool thing is, when done well, you can just update a price and watch it being changed instantaneous like magic.

We store this signature in a list in a long term process (NodesJS\pm2 for example). In this process we also log into to the database as a replication user and subscribe to the replication journal\binlog tables we previously found in the signatures list. Now whenever a crud operation occurs on the database we receive an event with the table\id in question. This enables us to do a lookup in the signatures list and determine which cached object needs to be updated.

There are some other parts of course like startup population of cached objects, but if you read this far, you are smart enough to understand that part as well. It is not rocket science. In part two, we will have a look at the implementation side of things.

Contact me now!

Address


Av 9, Casa don Pedro, Timbre 3
Santa Ana, Costa Rica

Phone \ WhatsApp


+506 6107 3424