Hi everyone,

ever since I switched to Arch about two months ago, most applications segfault multiple times a day. There doesn’t seem to be any pattern for the crashes, sometimes it’s even happening while idling (e.g. reading a news article).

Things I’ve tried without any luck so far:

  • Running Firefox in safe-mode without any extensions
  • Switching from regular to LTS kernel
  • Disable Hardware Acceleration in Firefox
  • Change RAM speed and timings
  • Run Memtest successfully
  • Replace entire RAM with a new certified kit
  • Use only a single RAM slot
  • Apply Ryzen fixes (iommu=soft, limit c-states)
  • Use only a single CPU core (maxcpus=1)
  • Downgrade Nvidia driver to 535xx
  • Use Nouveau instead of the nvidia driver
  • Use Openbox instead of KDE
  • Disable zswap and THP

Here’s full journalctl from a day where both Spotify and Firefox crashed at the end, a few seconds after each other:

https://pastebin.com/BH0LMnD9

Some more info about my system:

  • Ryzen 5 3600X
  • MSI B450M PRO-VDH Max
  • 32GB RAM @ 3200MHz
  • Geforce RTX 2070 SUPER (using nvidia-dkms)
  • Plasma 5.27.10 on X11

I’m pretty sure that it’s not hardware related, because I’ve booted up a Debian 12 live image where everything ran for several hours without a crash. But it seems to be Arch related, as I also booted up a fresh EndeavourOS live image (so basically Arch), where applications also randomly segfaulted. Any idea why everything works fine on Debian but not on Arch? Debian uses the 6.1 kernel, which I already tried, so that’s not it.

Let me know if you need any more information that might help solve this issue. Thanks!

Edit [solved]: It looks like disabling PBO in the UEFI/BIOS did the trick. The strange thing is, after enabling it again, it’s still not crashing again. Someone suspected that the MoBo default/training settings were faulty, so I guess this was a very rare case here. That’s probably why it took so long to find a solution. Thanks everyone for helping me out!

  • DefederateLemmyMl@feddit.nl
    link
    fedilink
    English
    arrow-up
    4
    ·
    11 months ago

    I’m pretty sure that it’s not hardware related

    Random segfaulting is not something that “just happens” because of an OS misconfiguration, then if the same problem happens on Arch as well as on a clean EndeavourOS live image it convinces me that it is in fact hardware related somehow. As you have already replaced the RAM, my guess is CPU or motherboard issue.

    Zen2/B450 is a widely used and well supported configuration on Linux that you normally shouldn’t have issues with, but Zen2 CPUs are rather notorious for having fragile memory controllers, and sometimes dodgy AGESA firmware releases that can cause issues on some CPUs. I used to have a 3600X myself that started crashing at idle around a particular firmware release of my motherboard, and it was fixed by a subsequent release.

    BTW the fact that it doesn’t happen on Debian doesn’t necessarily mean that Arch is the culprit. It could just be that Debian is not triggering the fault because of different, perhaps more conservative, compiler optimizations.

    As a last ditch effort, you could try resetting your entire UEFI (bios) settings to default, preferably by pulling the CMOS battery.

    BTW, is it only GUI applications that are segfaulting? Or other programs as well? Do you have an old spare GPU you can test with?

    • NoisyFlake@lemm.eeOP
      link
      fedilink
      arrow-up
      2
      ·
      11 months ago

      I already did a UEFI reset, that didn’t help. As far as I can tell, it’s only GUI applications, I haven’t seen a segfault for something else so far. Unfortunately I don’t have any other GPU right now.

      It seems that a solution was found though (at least for now, it didn’t crash since a few hours) here: https://lemm.ee/comment/8161085

      • DefederateLemmyMl@feddit.nl
        link
        fedilink
        English
        arrow-up
        3
        ·
        11 months ago

        Glad to hear that disabling PBO helped, but it does indicate that something may not be entirely healthy with your CPU (or with the way the motherboard is driving it, that also can’t be excluded)