nwipe 0.24 (based
4 is done now, everything is served off of local filesystem now. We will be optimizing delivery so files are served from the file server rather than over NFS with nginx over the next few days
3 is done and 4 is started now
We added a captcha to uwu.social signups using Friendly Captcha, which doesn't violate privacy + the JS code is open source. The captcha is just a simple proof of work and doesn't store cookies or have secretive algorithms like recaptcha
If people don't want to solve a captcha to sign up (maybe because they have noscript or something) we'll give people invite links over email if they ask. Users can't create invites for other people in case there are still bots
Bot signups were getting annoying and Mastodon doesn't have any captcha support
Registrations are disabled on the main site, with a link to a separate page that requires you to solve a captcha. Once you solve it and submit, you'll get redirected to a newly generated single-use invite link. Mastodon API doesn't have invite link capability so we have to scrape csrf tokens and shit which sucks though
The code's not pretty, but we don't care as long as it works
We're gonna attempt migrating uwu.social mastodon media again from minio to local storage over NFS again today. Our battle plan is as follows:
1. origin.files.uwu.social nginx block will be changed to serve from local first, then try minio if 404
2. delete all remote media older than 7 days again to reduce the amount of stuff we have to copy
3. switch mastodon to use local storage rather than S3 storage (requires small downtime)
4. migrate data from minio to the local directory in the background over the next few weeks
We've already done 1 and started 2, once that finishes we'll start 3 and eventually start 4 in the background
I think the only downside this method has is that deleted media from the 7 day period before we turn on local storage won't be able to be deleted until the files are finished moving over
We've tested switching to local storage by changing the .env on a testing server and it seemed to work without issue
uwu.social media migration (from Minio to plain FS on a new server) was called off. Aurieh and I are both very tired so we decided it would be best to stop for now and try again some other time when we're not tired and have a better battle plan.
Copy performance from Minio (or directly from Minio's data dir) is extremely slow because stat calls take ages for some reason. We'll look into the performance issues with the FS for next time and explore ways of listing files without using stat (i.e. via Mastodon's DB).
We're sorry for severely underestimating the amount of time it should've taken. I hope we didn't cause too much inconvenience from being offline.
""""""""""""""""""""""""high performance"""""""""""""""""""""""""""""""""""""""""" ™️
41 eur a pop for a <1000 hour drive
😂 😂 😂 😂
uwu.social is back online after our scheduled HDD replacement. There were a few roadbumps here and there, so it took 6.5 hours instead of the 4-5 we planned for.
Hetzner put a 16k hour drive in the server instead of a 0 hour server at first, so we had to wait for them to replace it twice.
The bootloader was on the removed disk, so we weren't able to boot the server until we got KVM and figured out the cause.
Next on our agenda is moving all Mastodon media to a cloud storage provider to reduce strain on the disks in the future and to free up some space. We will probably start copying media in the background shortly. The main copy operation will not require downtime, but when we do the switch from local to B2 it will require some more planned downtime in the future.
Thank you for your understanding in regards to the downtime we've been having recently.
Moving forwards, to avoid a situation like this again we will:
- Keep making backups, and backing up even more things than before (system SSH keys, Mastodon media, etc.)
- Set up alerts for old drives so we can replace them safely
- Replace drives on all servers if they are old
- Never use Minio again for something with as much content as Mastodon, instead using a cloud provider to mitigate drive strain
Next time we have extended unexpected maintenance again, we will endeavour to put a updates page up again.
Aurieh and I learned a lot from this event. Hopefully, it never happens again.
We're sorry about the downtime that occurred yesterday. One of the drives in uwu.social's server failed. We were running in RAID-1, but because the software raid was misconfigured we encountered corrupted filesystems due to high I/O.
All data was recovered successfully. The RAID array has been rebuilt with a new (0 hour) drive. One of our concerns was that the remaining drive in the replica would fail during the rebuild but that didn't happen, which we're grateful for. We will be putting that hard drive out of service within the next month (which will require scheduled downtime).
It seems like the drive died while processing the huge amount of strain imposed on it by Mastodon and Minio during the old media removal process. We will move to Backblaze B2 before running out again (which will require more scheduled downtime).
admin of the uwu.social