We were running redis server in one of our machine which serves one of our mirco service. The redis service was running happily for 3 months. One day suddenly redis server stopped responding because machine ran out of disk space. Initially, Redis was used by only one service. Over the months many service started using the same redis. When redis went down almost all our components came stand still.
Redis has grown to be the critical component of our system. We wanted it to be highly available and fault tolerant. We started exploring the HA options. There are two ways we can go with:
- Since our whole infra runs on AWS we can go with Elasticache
- We can setup our own HA cluster in EC2 machines.
The workload on the redis is very minimal regarding cpu , network, and memory with occasional bursts on load. We have decided to setup our own redis-server because the minimum cost required to run a redis HA cluster in Elasticache is 130$.
Sentinel is very light on resource usage and easy to configure. Our current setup goes like the following
1 Master (us-east-1a zone)<— 1 Slave (us-east-1b zone)
3 sentinel servers in 3 availability zone.
After the setup when we killed the redis master the automatic promotion of slave didn’t happen. After lots of frustrating hours we found out that we have disabled SLAVEOF Command in the redis server config. Since we disabled the SLAVEOF Command, sentinal has no way of promoting the slave to master. After we fixed the issue sentinel promotion worked way cool.