this post was submitted on 15 Jun 2023
178 points (94.1% liked)

Programming

17484 readers
50 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Rules

  • Follow the programming.dev instance rules
  • Keep content related to programming in some way
  • If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities !webdev@programming.dev



founded 1 year ago
MODERATORS
 

My first experience with Lemmy was thinking that the UI was beautiful, and lemmy.ml (the first instance I looked at) was asking people not to join because they already had 1500 users and were struggling to scale.

1500 users just doesn't seem like much, it seems like the type of load you could handle with a Raspberry Pi in a dusty corner.

Are the Lemmy servers struggling to scale because of the federation process / protocols?

Maybe I underestimate how much compute goes into hosting user generated content? Users generate very little text, but uploading pictures takes more space. Users are generating millions of bytes of content and it's overloading computers that can handle billions of bytes with ease, what happened? Am I missing something here?

Or maybe the code is just inefficient?

Which brings me to the title's question: Does Lemmy benefit from using Rust? None of the problems I can imagine are related to code execution speed.

If the federation process and protocols are inefficient, then everything is being built on sand. Popular protocols are hard to change. How often does the HTTP protocol change? Never. The language used for the code doesn't matter in this case.

If the code is just inefficient, well, inefficient Rust is probably slower than efficient Python or JavaScript. Could the complexity of Rust have pushed the devs towards a simpler but less efficient solution that ends up being slower than garbage collected languages? I'm sure this has happened before, but I don't know anything about the Lemmy code.

Or, again, maybe I'm just underestimating the amount of compute required to support 1500 users sharing a little bit of text and a few images?

you are viewing a single comment's thread
view the rest of the comments
[–] knoland@kbin.social 4 points 1 year ago (1 children)

The real benefit as I see it for using rust for backends is memory safety.

[–] loren@sh.itjust.works 5 points 1 year ago (2 children)

All the major languages for web backends are memory safe. Java, C#, etc

[–] C8H10N4O2@kbin.social 5 points 1 year ago (1 children)

These are garbage collected languages and come with the overhead of such a process. Rust has no GC process and instead relies on reference counters to statically track live memory.

[–] eddythompson@kbin.social 13 points 1 year ago (3 children)

"GC overhead" only matter for extreme realtime applications, like emulators, games, drivers, simulators, etc. a 10msec (or even a 100msec) pause in a request processing isn't gonna even be noticed when your network, database and disk IO are literally orders of magnitude higher. Use Rust for web services if you like the language, comfortable with it, etc. Don't use it because you think it'll give you "more performance" or "reduce GC overhead".

Java, C#, Python, Node, or even PHP as languages will never be your web backend bottleneck. Large scale web services performance tuning is entirely architectural. What caches you keep, how you organize your data, how many network operation does 1 user interaction translate to, stateful vs stateless components etc.

[–] C8H10N4O2@kbin.social 3 points 1 year ago

You're entirely correct, I was just commenting that Rust is also very memory safe in response to the first statement. As much as Rust interests me, I'm also in agreement that the problem it solves as a language isn't really a concern with modern web development.

[–] clawlor@programming.dev 2 points 1 year ago (2 children)

+1, exactly this.

As an aside, "stop the world" GC pauses can affect web server performance in interesting ways. Some web application servers have a perf profile where throughput drops off a cliff as the server approaches max memory load. This is fine, so long as you know what's happening, and can tune your auto scaling to spin up new servers before you start to hit that threshold. This likely wouldn't be a reason to not use a particular lang / server, except at the most massive scales.

[–] dragontamer@lemmy.world 3 points 1 year ago (2 children)

Meta: Hmmm... replying to kbin.social users appears to be bugged from my instance (lemmy.world).

I'm replying to you instead. It doesn't change the meaning of my post at least, but we're definitely experiencing some bugs / growing pains with regards to Lemmy (and particularly lemmy.world).


GC overhead is mostly memory-based too, not CPU-based.

Because modern C++ (and Rust) is almost entirely based around refcount++ and refcount-- (and if refcount==0 then call destructor), the CPU-usage of such calls is surprisingly high in a multithreaded environment. That refcount++ and refcount-- needs to be synchronized between threads (atomics + memory barriers, or lock/unlock), which is slower than people expect.

Even then, C malloc/free isn't really cheap either. Its just that in C we can do tricks like struct Foo{ char endOfStructTrick[0]; } and store malloc((sizeof(struct Foo)) + 255); or whatever the size of the end-of-struct string is, to collate malloc / frees together and otherwise abuse memory-layouts for faster code.

If you don't use such tricks, I don't think that C's malloc/free is much faster than GC.


Furthermore, Fragmentation is worse in C's malloc/free land (many GCs can compact and fix fragmentation issues). Once we take into account fragmentation issues, the memory advantage diminishes.

Still, C and C++ almost always seems to use less memory than Java and other GC languages. So the memory-savings are substantial. But CPU-power savings? I don't think that's a major concern. Maybe its just CPUs are so much faster today than before that its memory that we practically care about.

[–] balder1993@programming.dev 1 points 1 year ago

I remember some old papers talking about Android’s runtime (which is garbage collected) x iOS (reference counted) in which Android was more efficient with high memory, but less efficient with lower available memory.

[–] valpackett@lemmy.blahaj.zone 0 points 1 year ago (1 children)

That refcount++ and refcount-- needs to be synchronized between threads

Only for things that you specifically want shared between threads – namely this (synchronized refcount) is an std::sync::Arc. What you want to share really depends on the app; in database-backed web services it's quite common to have pretty much zero state shared across threads. Multithreaded environment doesn't imply sharing!

[–] dragontamer@lemmy.world -1 points 1 year ago

The refcount absolutely is shared state across threads.

If Thread#1 thinks the refcount is 5, but Thread#2 thinks the refcount is 0, you've got problems.

[–] dragontamer@lemmy.world 1 points 1 year ago (1 children)

This is a test message. Lemmy isn't making me do the posts I want right now... does this one work?

[–] dragontamer@lemmy.world 1 points 1 year ago

this is a test post 2

[–] SomeGuyNamedMy@vlemmy.net -3 points 1 year ago (1 children)

Rust is not fully memory safe like garbage collected languages due to having to use smart pointers for self referencial datastructures from my understanding

[–] sekhat@lemmy.temporus.me 1 points 1 year ago* (last edited 1 year ago)

Eh, if by smart pointer you mean Pin. It's not really a smart pointer. It's just a struct that holds onto a particular reference kind. What it holds onto can be a smart pointer, or a mutable reference. Either way, once done, the constraints of the language's ownership and borrowing mean the item that has been Pinned can't be moved.

An item being unable to be moved is pretty important for self referential structures of course, since to self reference, you generally refer to something by some form of pointer inside yourself. If you are able to be moved, your own root address changes and thus the address of anything inside you would be different, which would invalidate your self references.

Pin was quite a clever realization.

However, unfortunately, not all considerations you need to be aware of when using Pin can be enforced by the type system, usually around when you need to Unpin something. And you get that wrong you might end up in a place that would cause Undefined Behavior. Which is why the general advice is, once you've Pinned something, it should stay Pinned.