Let's have a chat about caching from a developer's perspective.
As a developer we often require compound objects (an object build out of a multiple other objects) to be cached. A product with price and content information for example. Caching becomes even more important now that many companies move to cloud based database services that have relatively high latency times. Although caching was often seen as an afterthought (it used to be that caching was something we do when the product goes live), one should consider the scope of a new project and perhaps BEGIN with caching as the foundation of the project. Having the right caching integrated into your project from the beginning often not only speeds up processing and improves customer perception, but can simplify development by a factor while at the same time shorten project development time.
Let me explain: (2 minute read)
I have seen quite a few caching implementations, and to be honest, usually the implementations have always bothered me to some extent. I have seen developers using sql statements as a key (yes I literally mean the whole cleaned up sql string is being used as a key...) and sometimes we see database objects stored in cache based on an id, which is created and maintained by the application. If the id did not exist anymore (because they purposely expire the cached object) they would query the db and store the new object in cache. Sometimes objects are stored in cache having an expiration, That meant when many objects expired at once, they still had their db going down because of all the new requests. Never have I actually seen a caching mechanism that I thought was actually developed the "right way" from the ground up and NOT using expiration of objects!
Last year I finished a distributed POS project for a large US retail company that was build for a main office and multiple brick and mortar outlets in various countries. The outlets utilized intelligent shopping carts that updated the client's app with real-time information of all products in their cart(s). This project forced me to profoundly rethink the requirements of caching.
So let us write down some requirements:
There are several ways to go about this. One of these ways is to use replication to have an in-memory database. Although it is quite elegant, it does use a lot of processing and often way more space than intended. The other way involves a mechanism that although a bit more complex, very compatible in distributed environments. More importantly, although it is a somewhat elaborated setup, it is a VERY clean mechanism.
We do this by using the replication journal (the binlog) of the database server. We can connect to the binlog and tell it to which events we want to subscribe. For example: we can subscribe to updates of the product content table or the product prices table. In order to cache a compound object, we need to construct a signature array for that object. A signature is basically a table\id key values array that constitutes the cached object and the cache key of the cached object.
The cool thing is, when done well, you can just update a price and watch it being changed instantaneous like magic.
We store this signature in a list in a long term process (NodesJS\pm2 for example). In this process we also log into to the database as a replication user and subscribe to the replication journal\binlog tables we previously found in the signatures list. Now whenever a crud operation occurs on the database we receive an event with the table\id in question. This enables us to do a lookup in the signatures list and determine which cached object needs to be updated.
There are some other parts of course like startup population of cached objects, but if you read this far, you are smart enough to understand that part as well. It is not rocket science. In part two, we will have a look at the implementation side of things.