Reddit Refugee RIP

  • 3 Posts
  • 26 Comments
Joined 1 year ago
cake
Cake day: June 12th, 2023

help-circle




  • Yup you nailed it. For additional context, Ruud is running an almost identical server for his Mastadon.world server which has 160k users. Relatively speaking, these are large, performant, and expensive servers. They can absolutely handle the current user influx we are getting from the Reddit exodus. Are hands are tied by software limitations unfortunately. I can confidently tell you were are constantly in communication about ways we can amplify user experience with the tools that we do have access to. For instance, this status page was recently spun up which you can access anytime you think there might be server issues to help confirm that what you are seeing is recognized at a server level. Things like that.

    All that being said, for users who are looking for a smoother experience right now, I can recommend lemm.ee as a solid home as well. Their admin Sunaurus has been very active and helpful throughout this process and handles his instance very professionally. He is essentially another Ruud (though Ruud is the best! ;)). Just something to keep in mind going forward as I can’t make any promises about the time frames for these issues being resolved. Hopefully once we get contact back from the Lemmy devs we can start expediting a resolution. They have a lot on their plates right now though, haha, so we will see. Cheers!


  • FYI, this is due to a confluence of issues.

    • We are the largest instance with the highest active user count - and by a good margin.
    • We are dealing with a premature software base that was not designed to handle this load. For example, the way the ActivityPub Federation queues are handled are not conducive to high volume requests. Failed messages stay in queue for 60 seconds before they retry once, and if that attempt fails it sits in queue for one hour before attempting to retry. These queued messages sit in memory the whole time. It’s not great, and there isn’t much we can currently do to change this, other than to manually defederate from ‘dead’ servers in order to drop the number of items stuck in queue that are never going to get a response. Not an elegant solution by any means, and one we will go back and address when future tools are in place, but we have seen significant improvement because of this.
    • We have attempted contacting Lemmy devs for some insight/assistance with this, but have not heard back yet, at this time. Much of this is in their hands.
    • We were able to confirm receipt of our federation messages (from lemmy.world) to other instance admins instances at lemm.ee and discuss.as200950.com. As such we do know that federation is working at least to some degree, but it is obviously still in need of some work. As mentioned above, we have reached out to the Lemmy devs, who are instance owners of Lemmy.ml, to collaborate. I cannot confirm if they are getting our federation at this time. Hopefully in coming Lemmy releases this becomes easier to analyze without needing direct server access to both instances servers.

    As you can see, we are trying to juggle several different parameters here to try and provide the best experience we can, with the tools we have at our disposal. You may consider raising an issue on their GitHub about this to try to get more visibility to them from affected users.





  • Technically speaking, yes, a portion of our issues are due to the highest user base of an Lemmy instance. So in theory, if half of our users dispersed to other instances, we would likely see some performance improvement here. However, lemmy.world is intended to be an accessible instance for the general population. The server itself that is running lemmy.world is beyond spec’d to handle much more than this user load. We are running up against code-level issues that we may or may not be able to get around with our internal configurations. This is just part of developing software in an environment were you go from a few thousand users total to hundreds of thousands in the space of a few weeks. There is no directive to have users create accounts on new instances, though if you are looking for an immediate performance improvement, that may be your best option currently. That is up to you to decide :)








  • PSA from Admin Team: The update completed roughly two hours ago. Since that time, the Admin team (and other site admins) have been working on the noted performance issues. We believe we have found a solution, but we still need time to test this out. You may still see brief outages and differences in performance as we are testing different configurations. We are trying to prevent rolling back.

    While I know this can be frustrating - especially today - please keep in mind we have a team of volunteer techies (from around the globe!) collaborating on this issue. It is an inspiring situation. Also keep in mind that lemmy.world is quite a bit larger (and more active than any other instance). As such, we are a bit of a ‘test instance’ in regards to high volume requests. This is just part of the growing pain. We appreciate your understanding.

    @[email protected] will provide a debrief once we have completed testing.