With our new infrastructure based on kubernetes, we want to standardize everything.
We are trying to have only 2 databases - MongoDB for RocketChat and PostgreSQL for all the rest, ok maybe some MySQL if needed, we’ll see. And for data, we exclusively use Object Storage (Object Storage (S3 compatible) is already compatible with Discourse, RocketChat and Nextcloud).
So for completeness, I’d say we have 2 types of data to take care of:
- Object Storage (S3 compatible) buckets
For an Internet service provider like us, we need 3 distinct things:
- High Availability (HA)
- Point in time recovery (PITR)
- Disaster Recovery (DR)
It means at any moment in time, if a disk or a server fails, we are still able to serve our users. This should be transparent to the user.
In case of a mistake, we are able to go back, one day, one week and are able to restore data as it was some time ago to recover from that mistake. Ideally the user could self serve here.
Something really bad happened, and we want to recover from that too. Typically it involves keeping a copy on a remote place, some hundreds of kilometers away.
What could happen so that we would loose your data. And what is the probability of those events?
- admin computer gets hacked
- admin makes a mistake and delete data
- there is a bug in a software and data is lost
- we loose 3 disks at a time
- datacenter explodes
Whatever you do, there is always a risk. I think the highest probability here is that we make a mistake. We have to find ways to mitigate that risk but it can happen.
We use ceph in our kubernetes cluster. Let’s have a look at what is the plan for each kind of data.
The tooling around databases is now amazing.
We started to use wal-g and we’ll generalize to all our databases.
You can send an encrypted stream of data to a remote Object Storage (S3 compatible) bucket. And send the diff (based on binlog, oplog or wal) at the interval you want.
We replicate directly on NVMe 3 times at the database level (PG replicas and so on).
Moreover there are not so much data in databases. Most data will be on Object Storage.
Conclusion, it is easy to cover the 3 cases of data conservancy, and we are really happy there.
We have buckets versionning so we cover PITR.
Our object storage technology in base on ceph rados gateway, and the data is replicated 3 times, so we cover HA.
And we have a nightly copy to a remote site (Helsinky) so we cover DR.
What about use case? So let’s just discuss about 3 things:
We think that it is not that much of a big deal if you loose your cat emojis on the chat.
This might be a bit more problematic if you loose images here.
This is probably already a copy of a data that you have on your laptop or elsewhere.
We have a disaster recovery. But for each bit you write on our servers, it is already replicated 3 times on the ceph cluster. If you add the disaster recovery, it is 6 times. So we need to double the number of CPU/RAM/Disks/SSDs and so on.
The main question is now, what is the durability of data with and without Disaster recovery.
This is hard to calculate, and I don’t have a PhD in statistics. But I could say that roughly:
- 99.99% durability without disaster recovery
- 99.99999% with disaster recovery
- 99.9999999999% if the disaster recovery is in your hand
99.99% is quiet high already, but I promise that if we do hosting for the next 50years, what we plan it will happen. And if you are the 0.01%, it is 100% data loss for you. Same for each percentage.
This is what is called extreme risk. Even google lost data (it was hard to find on google strangely…).
So this post is to warn you about this risk, and possible solutions to overcome it.
And here is the strategy for our different users:
- by default, you get disaster recovery.
- but we still recommend you to set it up on your side. And this is for free for everybody. We’ll even explain you how to do if you need! This is an extra measure, but we think it is the best.