I discovered yesterday evening that Lemmy.ml is blocking all inbound ActivityPub requests from /kbin instances. Specifically, a 403 'access denied' is returned when the user agent contains "kbinBot" anywhere in the string. This has been causing a cascade of failures with federation for many server owners, flooding the message queue with transport errors.
This doesn't appear to be a mistake; it has been done very deliberately, only on Lemmy.ml. Lemmy.world and other large instances do not exhibit the same behavior. It also isn't a side effect of the bug introduced in Lemmy 0.18. You can observe by sending the following in a terminal
> curl -I --user-agent "kbinBot v0.1" https://lemmy.world/u/test
HTTP/2 200
[...]
> curl -I --user-agent "kbinBot v0.1" https://lemmy.ml/u/test
HTTP/2 403
[...]
> curl -I --user-agent "notKbinBot v0.1" https://lemmy.ml/u/test
HTTP/2 403
[...]
> curl -I --user-agent "placeholder-user-agent" https://lemmy.ml/u/test
HTTP/2 200
[...]
Additional evidence of this not being a Lemmy 0.18 bug:
-
This occurs when making web requests to any location on the Lemmy.ml webserver, not just ActivityPub endpoints.
-
Go to https://fedidb.org/software/lemmy and pick an instance running 0.18.0. Perform the above commands, replacing the URL for Lemmy.ml with that particular instance's address.
If this continues, my instance may need to defederate from Lemmy.ml. This is especially problematic because Lemmy.ml continues to federate information outbound to other kbin instances while refusing to allow inbound communication from them.
Spoofing the user agent is less than ideal, and doesn't respect Lemmy.ml's potential wish to not be contacted by /kbin instances. I don't post this to create division between communities, but I do hope that I can draw awareness to what's going on here. Defederating /kbin instances entirely would even be better than arbitrarily denying access one-way. This said, we should all attempt to maintain a good-faith interpretation until otherwise indicated by the Lemmy developers. It's possibel that this is a firewall misconfiguration or some other webserver-related bug.
Relevant comment from me (#354 - [BUG] Critical errors/failed messages during messenger:consume)
Edits:
-
Yes, people have already tried reaching out to the Lemmy instance admins in their Matrix room with no answer.
-
Someone has posed a question on Lemmy.ml about the block here: https://lemmy.ml/post/1563840
It's possible that this is a consequence of the latest Lemmy update, in which a lot has changed. I have noted that kbin has some issues with request signature in communication with certain instances. I will try to check it tomorrow first thing in the morning.
Sorry to go off topic here, but it seems that no messages are coming in and out from kbin.social.
See, e.g., https://kbin.dk/m/kbinMeta@kbin.social/t/3054/Lemmy-ml-is-blocking-all-requests-from-kbin-Instances and https://kbin.melroy.org/m/kbinMeta@kbin.social/t/2560/Lemmy-ml-is-blocking-all-requests-from-kbin-Instances
Those links are working fine. Kbin is federating well with those instances. A bit of latency is normal.
No upvotes, comments or boosts go through
It takes time. I just setup my own instance and I sent a comment from there, it took 40 minutes to arrive on kbin.social, upvotes and replies have not made it back to my instance yet some 45-60mins after they happened.
It takes time
Why?
Because, when you post here from kbin.social any other instance with kbinMeta@kbin.social will get a copy of that and vice-versa. But each side is exchanging posts from multiple magazines to multiple other instances. It's also balancing resource usage for people visiting the site too.
Also new instances are gradually fetching the back-catalog of posts for various magazines (and communities on lemmy). So all of this leads to a delay.
Anecdotally the delay is quite short this morning. Yesterday it reached up to 2 hours from my view at least.
Case in point, it took less than a minute for this to reach my instance (this is me, posting from my instance... Maybe I should have used another username... This one has a picture, is the instance me).
I don't understand why the delay is so high thought.
I just downloaded several GB in a few seconds, what is stalling the process that when only a few bits of information are exchanged? That seems unnatural.
I am severely underestimating the bandwidth load?
Well, there's a bit to unpack here. When you download something in seconds that's your whole connection to a server that is on a super fast connection.
Most people running instances are on a much more modest combination of hardware and connection and even the bigger ones (kbin.social etc) are not going to have a connection as fast as dedicated download CDNs can offer. I would expect they probably have a gigabit, and at most 10gbit. That's shared between everyone on the site downloading cat pictures, posting, refreshing AND the federation of all the new content to and from other instances.
But that's really not the problem here at all. Far more of a load is the processing of the incoming and outgoing messages to and from many instances. This takes CPU load (and to a lesser extent memory), and this is shared between the message queue for these inter-instance messages, the web server and database.
When you look at how the fediverse of kbin and lemmy is laid out. You will see there's a handful of larger instances with most of the popular magazines/communities. This means that they're doing the lion's share of this processing. Couple the fact that the population has exploded over the last month or so, even these larger instances might be struggling with hardware and/or infrastructure layout (maybe running all on one box, and needing to split the load for example). That's speculation though.
At any rate, for whatever reason (maybe the lemmy problems backing up message queues with errors) things were MUCH slower last night.
Still after 5 hours.. the upvotes are behind.. I hope AP will scale well in the future.. This since looks not great.