uwu.social is back online after our scheduled HDD replacement. There were a few roadbumps here and there, so it took 6.5 hours instead of the 4-5 we planned for.
Hetzner put a 16k hour drive in the server instead of a 0 hour server at first, so we had to wait for them to replace it twice.
The bootloader was on the removed disk, so we weren't able to boot the server until we got KVM and figured out the cause.
Next on our agenda is moving all Mastodon media to a cloud storage provider to reduce strain on the disks in the future and to free up some space. We will probably start copying media in the background shortly. The main copy operation will not require downtime, but when we do the switch from local to B2 it will require some more planned downtime in the future.
Thank you for your understanding in regards to the downtime we've been having recently.
Moving forwards, to avoid a situation like this again we will:
- Keep making backups, and backing up even more things than before (system SSH keys, Mastodon media, etc.)
- Set up alerts for old drives so we can replace them safely
- Replace drives on all servers if they are old
- Never use Minio again for something with as much content as Mastodon, instead using a cloud provider to mitigate drive strain
Next time we have extended unexpected maintenance again, we will endeavour to put a updates page up again.
Aurieh and I learned a lot from this event. Hopefully, it never happens again.
We're sorry about the downtime that occurred yesterday. One of the drives in uwu.social's server failed. We were running in RAID-1, but because the software raid was misconfigured we encountered corrupted filesystems due to high I/O.
All data was recovered successfully. The RAID array has been rebuilt with a new (0 hour) drive. One of our concerns was that the remaining drive in the replica would fail during the rebuild but that didn't happen, which we're grateful for. We will be putting that hard drive out of service within the next month (which will require scheduled downtime).
It seems like the drive died while processing the huge amount of strain imposed on it by Mastodon and Minio during the old media removal process. We will move to Backblaze B2 before running out again (which will require more scheduled downtime).
Recently I made a shitty DDR pad to make up for the fact the local arcade sold their ITG cabinet.
The sensors are made out of tin foil lol. Eventually I'll make a improved version with real sensors (weight sensors).
Haven't made a bar for it yet, using an unused bedside table temporarily (it sucks since it's the wrong height and position)
This happened after updating to the newest version, so we were convinced it was broken code but no one else was reporting similar issues.
For debugging, we undid our patches and added tracing to sidekiq and mastodon, which lead us to believe mastodon was filling memory because of elasticsearch failures. We'll reapply the patches shortly.
We've been having a lot of troubles with mastodon recently (filling memory and swap), which we traced back to elasticsearch freaking out about there not being enough space on the storage drive.
We cleared off some old backups/unnecessary data from the disk, so everything should be fine now. Sorry about the recent downtime.
mastodon moderation tools suck. performing moderation actions takes way longer than it should (it loads for a while), and for some reason disabled accounts can still post? I disabled 2 accounts a week ago and they somehow both posted today...
and to top it all off, after I suspended both of the accounts, the posts from one were still available on local timeline until I manually deleted the post.
I switched to fastmail today to try it out, they added labels in beta which was the last thing I needed to switch. gonna stay on the trial for a month but it seems pretty good so far.
the android app felt a bit sluggish but it's whatever. I can use a different mail app if it's a problem, plus I don't use my phone for viewing my mail very often.
and since I couldn't get alsa to work, I have to attach my microphone to the VM to be able to call people from inside which sucks but it's whatever
doom eternal runs good, so all is well
looking glass is working but the quality is ass, not sure why. maybe it's a wayland thing who knows. for now, I'm just using the GPU directly by switching inputs on my monitors, works fine. steam streaming works well too
setting up alsa audio with QEMU was a fucking nightmare, so I opted for scream, which is a windows virtual sound driver that just sends the pcm data over the network. multicast didn't work, so I had to change it to be unicast with my host's IP but it works great with very low latency
you'd expect the scream shit to be a pain in the ass and slow as hell, but I guess there's hardly any latency anyways since it's virtio net so whatever