Yesterday evening @phiresky@phiresky@lemmy.world did some SQL troubleshooting with some of the lemmy.world admins. After that, phiresky submitted some PRs to github.
@cetra3@lemmy.ml created a docker image containing 3PR's: Disable retry queue, Get follower Inbox Fix, Admin Index Fix
We started using this image, and saw a big drop in CPU usage and disk load.
We saw thousands of errors per minute in the nginx log for old clients trying to access the websockets (which were removed in 0.18), so we added a return 404 in nginx conf for /api/v3/ws.
We updated lemmy-ui from RC7 to RC10 which fixed a lot, among which the issue with replying to DMs
We found that the many 502-errors were caused by an issue in Lemmy/markdown-it.actix or whatever, causing nginx to temporarily mark an upstream to be dead. As a workaround we can either 1.) Only use 1 container or 2.) set ~~proxy_next_upstream timeout;~~ max_fails=5 in nginx.

Currently we're running with 1 lemmy container, so the 502-errors are completely gone so far, and because of the fixes in the Lemmy code everything seems to be running smooth. If needed we could spin up a second lemmy container using the ~~proxy_next_upstream timeout;~~ max_fails=5 workaround but for now it seems to hold with 1.

Thanks to @phiresky@lemmy.world , @cetra3@lemmy.ml , @stanford@discuss.as200950.com, @db0@lemmy.dbzer0.com , @jelloeater85@lemmy.world , @TragicNotCute@lemmy.world for their help!

And not to forget, thanks to @nutomic@lemmy.ml and @dessalines@lemmy.ml for their continuing hard work on Lemmy!

And thank you all for your patience, we'll keep working on it!

Oh, and as bonus, an image (thanks Phiresky!) of the change in bandwidth after implementing the new Lemmy docker image with the PRs.

Edit So as soon as the US folks wake up (hi!) we seem to need the second Lemmy container for performance. So that's now started, and I noticed the proxy_next_upstream timeout setting didn't work (or I didn't set it properly) so I used max_fails=5 for each upstream, that does actually work.

(page 2) 50 comments

sorted by: hot top controversial new old

[–] MetricExpansion@lemmy.world 24 points 1 year ago (1 children)

I'm very curious: does single Lemmy instance have the ability to horizontally scale to multiple machines? You can only get so big of a machine. You did mention a second container, so that would suggest that the Lemmy software is able to do so, but I'm curious if I'm reading that right.

[–] DoomBot5@lemmy.world 20 points 1 year ago (5 children)

A single instance, no. You run multiple instances on multiple machines, then put a frontend (nginx in this case) to distribute the traffic among them.

load more comments (5 replies)

[–] KSPAtlas@sopuli.xyz 23 points 1 year ago (1 children)

Shouldn't the correct HTTP status code for a removed API be 410? 404 indicates the domain wasn't found or doesn't exist, 410 indicates a resource being removed

[–] Hupf@feddit.de 14 points 1 year ago (2 children)

Or 418 for the wrong API being used :^)

load more comments (2 replies)

[–] dyslexicdainbroner@lemmy.world 22 points 1 year ago

How great is it to be a part of history in the making -

This is Web 3 in its fomenting -

Headlines ~5yrs:

The ending of Web 2 was unceremonious and just ugly. u/spez and moron@musk watched as their social media networks signaled the end of Web 2 and slowly dissolved. Blu bird’s value disintegrated and Reddit’s hopes for IPO did likewise. Twitter and Reddit dissolved into odorous flatulence as centralization fell apart to the world’s benefit. Decentralized/federated social media such as Mastodon and Lemmy made their convoluted progress and led Web 3’s development and growth…

This is how history is made, it’s ugly and convoluted but comes out sweeet…

[–] Kodiack@lemmy.world 22 points 1 year ago* (last edited 1 year ago) (1 children)

Awesome work - things seem to be running much more smoothly today.

Do you have anything behind CDN by chance? Looking at the lemmy.world IPs, the server appears to be hosted in Europe and web traffic goes directly there? IPv4 apparently seems to be resolving to a Finland-based address, and IPv6 apparently seems to be resolving to a Germany-based address.

If you put the site behind a CDN, it should significantly reduce your bandwidth requirements and greatly drop the number of requests that need to hit the origin server. CDNs would also make content load faster for people in other parts of the world. I'm in New Zealand, for example, and I'm seeing 300-350 ms latency to lemmy.world currently. If static content such as images could be served via CDN, that would make for a much snappier browsing experience.

[–] ruud@lemmy.world 13 points 1 year ago (3 children)

Yes that's one of the things on our To Do list

load more comments (3 replies)

[–] pathief@lemmy.world 21 points 1 year ago* (last edited 1 year ago) (8 children)

Is it safe to use 2FA yet?

load more comments (8 replies)

[–] httperror418@lemmy.world 20 points 1 year ago (1 children)

Whilst I'm aware that too many users on one instance can be a bad thing for the wider Fediverse, I think it is a great thing at the moment in terms of how well people are banding together to fix the issues being encountered from such a surge in users.

The issues being found on lemmy.world results in better lemmy instances for everyone and improves the whole Fediverse of lemmy instances.

I'm very impressed with how well things are being debugged under pressure, well done to all those involved 👏

load more comments (1 replies)

[–] shotgun_crab@lemmy.world 18 points 1 year ago

You guys are absolute legends, thanks for the update!

[–] Contravariant@lemmy.world 18 points 1 year ago

Hey I can upvote now!

[–] lwuy9v5@lemmy.world 18 points 1 year ago (1 children)

That's so awesome! Look at that GRAPH!

I'd volunteer to be a technical troubleshooter - very familiar with docker/javascript/SQL, not super familiar with rust - but I'm sure yall also have an abundance of nerds to lend a hand.

load more comments (1 replies)

[–] cani@lemmy.world 17 points 1 year ago

I just love the transparancy you guys are coming forward with. It's absolutely awesome! Thank you for that and for all the work you put in. It means a lot to me that you folks are taking the time to keep us updated. Much love!

[–] DelvianSeek@lemmy.world 16 points 1 year ago

You guys are absolutely amazing. So many thanks to you @Ruud and the entire admin/troubleshooting team! Thank you.

[–] CIA_chatbot@lemmy.world 16 points 1 year ago* (last edited 1 year ago) (4 children)

It blows my mind with the amount of traffic you guys must be getting that you are only running one container and not running in a k8s cluster with multiple pods (or similar container orchestration system)

Edit: misread that a second was coming up, but still crazy that this doesn’t take some multi node cluster with multiple pods. Fucking awesome

load more comments (4 replies)

[–] Puzzlehead@lemmy.world 15 points 1 year ago

smoooooooooth! Keep up the good work!

[–] nuzzlerat@lemmy.world 13 points 1 year ago (1 children)

Is it weird that I’m always excited to read the update posts?

load more comments (1 replies)

[–] InfiniteVariables@lemmy.world 13 points 1 year ago

Wow it is smooth as butter now. Great job ruud and team!

[–] Anti_Weeb_Penguin@lemmy.world 12 points 1 year ago (1 children)

Installed Jerboa again and it feels smoother than Reddit itself, great job!

load more comments (1 replies)

[–] EvilCartyen@lemmy.world 12 points 1 year ago (1 children)

Things have been super smooth lately, thanks for all the work!

load more comments (1 replies)

[–] Zrob@lemmy.world 11 points 1 year ago (5 children)

Awesome work. Any way other devs can contribute?

load more comments (5 replies)

[–] MiddleWeigh@lemmy.world 11 points 1 year ago* (last edited 1 year ago)

I took a SM break for a few days, and it's running noticeably better today...I think. (:

Thanks a bunch for floating us degenerates.

[–] StringyCheese@mastodon.social 11 points 1 year ago (1 children)

@ruud crazy impressive

load more comments (1 replies)

[–] Datzevo@lemmy.world 11 points 1 year ago

You know there's something about dealing with the lagginess in the past few days makes me appreciate the fast and responsive of the update. It nice to see the community grows and makes the experience at Lemmy feels authentic.

[–] WolfhoundRO@lemmy.world 11 points 1 year ago (1 children)

Really great job, guys! I know from my experience in SRE that these types of debugs, monitoring and fixes can be much pain, so you have all my appreciation. I'm even determined to donate on Patreon if it's available

load more comments (1 replies)

[–] Tygr@lemmy.world 11 points 1 year ago

It felt like I’d jinx us all if I commented but THANK YOU! This has been a wonderful experience today. Absolutely loving it and knew you just needed some time to work out the kinks that happen with fast growth.

[–] bathorygod@lemmy.world 10 points 1 year ago

You guys did an awesome job.

load more comments