There and back again, Redis file
Redis guide
There and back again, Redis.
Foreword
In recent months, I have journeyed deep with Redis, wielding it for ends both worthy and ill-suited. Through these trials, a secret knowledge has fallen to me—a burden I did not seek. I have seen its brilliance misspent and its true power left wanting. This, then, is the tale of that hard-won wisdom, a guide for others, that they might find the true path and not wander as I have under the weight of such a tool.
Intro
Let this tale be a map for your own journeys. It seeks to answer the questions of where and when one should call upon the aid of Redis. We shall begin by exploring the lay of the land, covering its fundamental principles and the varied forms of its craft. From there, we will chart the more advanced paths, some yet left to be fully discovered.
Furthermore, I will provide guidance on how to provision Redis for your own ventures, with practical examples drawn from real-world quests to light the way.
What is Redis?
Many might think that redis is only useful for caching data in key-value pairs. However that is not by any means true. As is explained later, redis has many interesting data structures which would be hard to deploy or store anywhere else. It's important to mention that redis runs in memory therefore is not persistent by default. However you can configure redis to use backups and be persistent and fail tolerate. Redis is highly scalable which is especially useful when you need high volume of connections, inserts and reads which usually exceeds standard database limitations. From my testing a simple 10 shard managed redis cluster on GCP was more than happy to handle over 500k commands per second with the lowest machine settings.
Data types
Let's start with the data types which are the single most important thing in redis and usually they are to blame for good and bad decisions in your project. I've categorized them into 2 buckets The primitives and The interesting ones.
The primitives
Strings ****(ref) (commands)
String type is the one you'll need and use in the 90% of use cases. String is the underlying data types in redis. But it can be used for numbers, booleans a stringified json values as well. The basic commands for strings are GET (link)SET(link) and DEL(link) . I think I don't need to explain these. You'll sometimes use strings with expiration TTL values, you can set the expiration in the SET command with EX argument in seconds. One thing I'd like to point out is validation when storing and retrieving stringified JSON values. If you're the one saving the data into redis and the data is validated/typed before insertion, it's not necessary to validate the result. However if another service does it I would highly recommend doing a schema check. But in all cases its better to be safe than sorry.
Hashes (ref) (commands)
Hashes are a collection of field-value pairs. Similar to Map<string, string | number> or Record<string, string | number> in JS. I recommend using Hashes for values that are grouped together under a similar key and share the same data type, but of course the possibilities are endless.
Few examples:
- Storing user analytics either grouped by user - key:
app:analytics:user:USER_IDand fields likeloginspage viewsactions. This is great because you can increment individual fields usingHINCRBYcommand in O(1) - Storing time series* (charts) data without wandering into the lands of time series data type. With key looking like
app:statistics:…optionsand fields looking like2025_22_052025_23_05. Its especially great for this because you can retrieve all the values at once usingHGETALLor set individual fields usingHSETor set the expiration for individual fields usingHEXPIRE - And many more, my fantasy is limited
List (ref) (commands)
Lists in redis are indeed linked lists instead of arrays as many might think. I wouldn't personally use lists as arrays in javascript since it doesn't leverage the constant time access for first and last elements and inserts to first and last place (LPOP LPUSH RPOP RPUSH) and accessing items by index is not in constant time. Lists are a perfect choice for implementing custom queues with either LIFO FIFO or both.
Set (ref) (commands)
Sets as in javascript or other languages are unordered collections of unique elements. They can be used for categorization, rate limiting, entity relationship management, blacklisting or leader boards (sorted sets). Most of the commands regarding sets are done in O(1) (even SCARD which retrieves the cardinality - length of a set) except for SMEMBERS which gets all of the members. Sets are powerful with union, intersection and difference operations. Sets can be used to track analytics data, but that takes a lot of memory when storing thousands or millions of items (refer to hyperloglog or bloom/cuckoo filters below)
The Interesting ones
Streams (ref) (commands)
Stream in redis is basically an append only log but with few features built on top. They're great for storing events, notifications, messages between different services or very specific queues. Each item in a stream has its own auto generated ID which is basically a millisecond timestamp with a sequence add on (meaning you can have basically unlimited items inserted at the same millisecond). Streams are made to store objects. However for ease of use you utilize the only one key-value pair with key being data (for example) and the value being a stringified JSON. You trim the size a of a stream anytime using the XTRIM command, especially useful when inserting new items using XADD and keeping the stream fixed size. You can "subscribe" to a stream using the XREAD command where you specify the first element id (0 at first), which retrieve all the new records and then you can call XREAD again but with the ID of the last element. That way you can achieve real time functionality when handling logs and events. For reading specific parts of a stream you can use XRANGE and XREVRANGE with start and end ID boundaries. You can use streams to handle communication between services with keeping track of the history (slightly different from Pub/sub)
Hyperloglog (ref) (commands) (spec)
Imagine you want to use set to keep track of unique items (visitors on a page, device IDs, … ) but only want to access the cardinality of the set, not the items individually. When you have thousands and more of these items sets can become pretty memory hungry and that's when Hyperloglog comes into place. Hyperloglog's only purpose is to tell you how many unique items are in it with 0.81% error rate (offset). Hyperlolog is basically a large table with many and many columns in which it puts the hashes of the items using some boring algorithm (quite interesting but you shouldn't care ;)). The best par of it is that it's memory usage doesn't grow beyond 12KB making redis memory usage predictable. It only exposes 3 commands PFADD which adds an item, PFCOUNT which retrieves the cardinality in O(1) with up to 0.81% error rate as said before and PFMERGE which you can use to merge 2 and more hyperloglog structures into one (note that when using redis in cluster mode, the hyperloglog structures have to be in the same shard, which you can force using HashTags …more on that later)
Bloom and Cuckoo filters
Bloom (ref) (commands)
Cuckoo (ref) (commands)
Bloom and Cuckoo filters are in a way similar to hyperloglog, but they don't give you the cardinality (size) of a collection, but check if an item is present in a collection. For better developer experience and less debugging I suggest only using these as well as the hyperlolog if you've already reached the limitations of a set in your scenario - either performance wise or pricing/memory wise. I'm not going into the depths of Bloom and Cuckoo filters here, but I'll mention few interesting things. You can set the error rate and expected size of these filters using their own settings mentioned in the docs.
Timeseries (ref) (commands)
As the name suggests timeseries is used to store time series data, especially useful for monitoring and analytics. Note that timeseries is not by default included in every redis instance, based on the license and deployment, you might have to check that first or deploy it yourself (ref). I'd recommend using redis timeseries only if you have high volume inserts which would make a traditional database struggle.
Redis has many other data structures which to me don't seem too useful but may be perfect for you and your use case. Check the data type section in the docs for yourself and see if you'll find your match (ref)
Pub/sub (ref) (commands)
Pub/subs are great for messaging or sharing events across services or horizontally scaled apps (cloud run, …). Pub/subs are a characterized into channels where publishers publish messages and one, many or no subscribers can listen to them and react respectively. Publishers shouldn't care whether there is a subscriber listening to them, basically fire&forget. You wouldn't want to use Pub/sub for critical information, since you cannot guarantee that someone is listening, for that you should use streams as mentioned before.
I've usually used Pub/subs for real-time notifications and chat systems in horizontally scaled applications in pair with NestJS's Event Emitter 2 (ref). For example when you want to send a real time notification to others users when one creates a post. One user creates the post and a redis publisher sends an events. Each user has an open SSE or WS connection to the backend (each user can be connected to different instance of the application). Each instance is subscribed to the channel and when an event arrives, it checks which of the connected users should receive the message and using Event Emitter and leveraging rxjs functionality you can send it over SSE or WS to all of the users, usually this process takes less than a second making it great for chat systems and notifications.
Code example:
// This code is from a Nest js app slightly transformed to standard implementation
// If you'd like to see the whole implementation feel free to ask
// Service
const publisher = new Redis()
const subscriber = new Redis()
const eventEmitter = new EventEmitter2()
// Each instance has it's own subscriber meaning the result will be the same if you have 1 or 1000 instances of your backend
subscriber.subscribe('chat')
subscriber.on('message', (_, event) => {
const message = JSON.parse(event);
// Event emitter is used to share the message accross different server sent events open on the SAME instance
eventEmitter.emit(`sse.chat.${message.chatId}`, message)
})
interface Message {
content: string;
chatId: string;
}
async function sendMessage(message: Message, userId: string): Promise<void>{
// Handle authorization and storing the message
await db.insert(...)
// Publish the event
await publisher.publish('chat', JSON.stringify(message))
}
// Controller
@Sse('/sse/:chatId')
async chatSse(@Param('chatId') chatId: string, @Query('token') token: string){
// Handle your authentication using token from query
// Authorize user to listen to specified chat id
return fromEvent(eventEmitter, `sse.chat.${chatId}`).pipe(
map((payload) => ({data: payload}))
)
}
Concepts
Pipelines (ref)
Pipelines are useful when you have a lot of commands you need to send at once ( I don't mean 10, I mean 10k). You can batch commands into pipelines and sent all of the commands in one network trip. The commands will not be executed faster, however the RTT - round trip time will be lower since you need to make less network requests. You don't need to use them if you only have a few commands since redis is pretty fast by itself even when running on a different machine, but in the same data center
Transactions (ref)
Transactions in redis are not the same as with traditional SQL databases. Meaning it doesn't rollback if the commands fail. What it does is that if you have 10 commands but some of them might fail due to invalid types, number of arguments, the whole thing wont happen, but there is no rollback.
Scripts (ref)
Scripts are great if you need to perform a logical operation on lots of keys in redis. Where commands will depend on the value of other commands etc. If you can write your operation in a Lua script (with few limitations). Then you can "upload" the script to the redis server where it gets executed and a result is returned. If you execute one script often, you don't need to send it every time, you can send it to the redis server once using SCRIPT LOAD which will return and SHA value which you can then call using EVALSHA instead of the usual EVAL where you need to send the script as well. Note that when using redis in cluster mode, the script can only access keys which are stored on the same shard which you can force using HashTags)
Cluster
If your redis grows in size or in number of commands you need to execute per second. Redis clusters are a great way. Redis distributes keys in up to 1000 nodes using "predictable" CRC16 table. You can also use replicas for each node in a redis cluster, which you can do in single redis setup as well which results in high available and highly scalable setup. I wouldn't personally use redis cluster unless I absolutely have to due to application limitations, since it becomes harder to use some of the data structures mentioned above and when you use redis cluster at its max not every write has to be stores and some can be forgotten even with persistency (you won't run into this in 99.99% in scenarios, but it's possible). Before going all the way with redis cluster think twice if you can improve the way you store the data and the way you can retrieve it possible lowering the memory usage and the throughput.
Key eviction
Redis runs in memory, which makes it prone to running out of memory. Which can either happen when your data exceeds the size or when you have a high throughput with lots of connections - which takes a lot of memory as well. When redis runs out of memory it evicts keys resulting in memory loss. You can check the number of evicted keys using the INFO command. You probably won't run into this issue if you monitor your redis and scale it accordingly but bet aware of the possibility.
Summary
Redis is a great a tool and no one can take away that from me. It's great for caching results, handling high throughput, insert buffers and is one of the simplest tools for implement job queues and messaging between microservices. It has interesting data types which standard databases do not have and which you can leverage in your app. It's definitely more expensive to run than traditional databases since it runs in memory (RAM) and that is expensive in data centers and even more expensive when running in managed more with persistence and backups. I would highly suggest running redis in managed mode just for the simplicity and ease of use, the added costs of that usually make up for the fuckups you may encounter when deploying redis by yourself.
My take on redis
Redis is great but based on my not so great experience with redis in the past year or so I have some things to point out.
Benefits:
- Redis is fast - you can achieve sub millisecond responses when redis runs on the same machine as the apps or few millisecond responses when it's on the same network in a data center
- Redis is schema less (unless you use redis query engine), making it great for prototyping and not worrying about migrations and getting things done in record time.
- Stupidly easy to deploy, when using redis just as a cache you can deploy a redis docker in matter of seconds and not worrying about any authentication or other setup makes it really easy to use (assuming it only runs on your network)
Drawbacks:
- I wouldn't choose redis as a primary persistent database. Redis doesn't have any schema (unless you use redis query engine) which can become a pain in the ass
- Redis is slow … once you need to access a lot of keys to get a result which can be done in a single SQL statement in SQL database.
- Redis is not persistent in it's nature. Yes, you can configure persistency but when experiencing high memory usage or throughput keys may be evicted and it's really important to keep that in mind.
- Do not use redis for something you can do in standard database. The DX is usually much better in standard databases and its also less expensive.
Redis Insight
If you're using redis for anything else than a simple cache I suggest deploying Redis Insight (ref) which is a great tool to see whats actually stored in redis and monitor redis
Alternatives (OSS and CSS)
In recent years Redis (as in the company) made a weird decisions regarding to licensing redis and separating redis oss. In result few tools arise made to be a drop in replacement for redis. I'm only going to mention 2 of them.
Valkey (ref) is fully open source drop in replacement for redis which tries to keep up with redis new features and can deployed on GCP as managed, just like redis. I am using valkey for local development and never had any issues)
Dragonfly (ref) is also a drop in alternative to redis. It's designed to be even more scalable than redis choosing multi threaded approach instead of redis single threaded approach. I didn't use dragonfly myself, but when comparing dragonfly to redis single instance the benchmarks (25x in QPS) that dragonfly provides are true, however when comparing to redis in cluster mode the results are usually neck to neck (ref)