Moving forwards, to avoid a situation like this again we will:
- Keep making backups, and backing up even more things than before (system SSH keys, Mastodon media, etc.)
- Set up alerts for old drives so we can replace them safely
- Replace drives on all servers if they are old
- Never use Minio again for something with as much content as Mastodon, instead using a cloud provider to mitigate drive strain
Next time we have extended unexpected maintenance again, we will endeavour to put a updates page up again.
Aurieh and I learned a lot from this event. Hopefully, it never happens again.
@dean Thank you for the maintenance page. It was refreshing to see an honest log of developments unfold, unlike many other status pages and post-mortems I've seen.
@dean Thank both you and Aurieh for your hard work, hardware reliability (esp. around the time some/most/all hardware starts failing from age) can be a real nightmare! Fortunately (from what I saw in replies / local timeline) you also have a very understanding community. Thank you both, again, for the hard work and the transparency!