[plug] AMD Machine Check errors

Brad Campbell brad at fnarfbargle.com
Fri Oct 24 20:40:37 AWST 2025


On 24/10/25 20:33, Chris McCormick wrote:
> On 24/10/2025 20:07, Brad Campbell wrote:
>> I've replaced the CPU, the RAM and the PSU.
> 
> 
> This may be a stupid suggestion, but years ago we had a server that was randomly rebooting. After a lot of debugging we finally figured out it was the actual power cord that wasn't plugged all the way into the PSU. Absolute face-palm moment. You said you changed the PSU, but did you change the power cord? Are you sure the wall outlet doesn't have issues? Long shot I know.
> 

G'day Chris,

Thanks, but done that. It was plugged into a powerboard that was plugged into a PDU that was plugged into an ATS that was plugged into a UPS which was plugged into the wall.
I bypassed most of that and plugged it directly into the ATS, and then UPS as a test.

The clincher is reboots were *always* logged (remote UDP netconsole receiver).
On top of that there are several SED drives in the machine, so any form of power cycles they come up locked. That never happened.

Most of the faults with the 3950x were related to panics in the idle handler. Rarely I got a straight segfault.

I'm almost positive it's a glitch with the motherboard or BIOS, but it's interesting the 5950x logs MCEs rather than causes a panic.
My next move is to upgrade to an X570 board. I'd upgrade to a new AM5 platform, but new board/processor/RAM starts to get $$$ and what I have mostly works.


More information about the plug mailing list