/var/log/mcelog contains the following. This happens on Ubuntu 16.04 reporting the mcelog version as (128+dfsg-1).
Hardware event. This is not a software error.
MCE 0
CPU 1 BANK 8 TSC 235983e523450
MISC 2000000a6646 ADDR 93e6e4300
TIME 1603741601 Mon Oct 26 20:46:41 2020
MCG status:
MCi status:
Corrected error
MCi_MISC register valid
MCi_ADDR register valid
MCA: MEMORY CONTROLLER RD_CHANNELunspecified_ERR
Transaction: Memory read error
Memory read ECC error
Memory corrected error count (CORE_ERR_CNT): 1
Memory transaction Tracker ID (RTId): 46
Memory DIMM ID of error: 2
Memory channel ID of error: 2
Memory ECC syndrome: 2000
STATUS 8c0000400001009f MCGSTATUS 0
MCGCAP 1c09 APICID 20 SOCKETID 1
CPUID Vendor Intel Family 6 Model 44
I have been trying to switch the DIMM in memory bank 8 at CPU 1 as labeled on the motherboard. However, that particular kind of error has been reported again at the same location (CPU 1 BANK 8). Before wildly switching DIMMs around, I am hoping that somebody might be able to tell me what kind of numbering mcelog uses, starting from zero or from one (presumably the same for all kinds of objects).
For comparison, dmidecode uses labels like PROC {1,2} DIMM {1..9} which would make the numbering from one an obvious candidate. However, I have seen examples of mcelog counting the CPUs from zero. As for using both numberings, lshw lists cpu:{0,1} in slot: Proc {1,2} and memory:{0,1} with bank:{0..8} as physical id: {0..9} in PROC {1,2} DIMM {1..9}. Finally, it could even depend on the kind of machine and how its BIOS reports to the kernel.
I have been totally unsuccessful in finding any answer to my question and I am afraid that I would not be any more successful when digging through the code. Can anybody answer this question authoritatively? Thanks in advance for your consideration!
/var/log/mcelog contains the following. This happens on Ubuntu 16.04 reporting the mcelog version as (128+dfsg-1).
I have been trying to switch the DIMM in memory bank 8 at CPU 1 as labeled on the motherboard. However, that particular kind of error has been reported again at the same location (CPU 1 BANK 8). Before wildly switching DIMMs around, I am hoping that somebody might be able to tell me what kind of numbering mcelog uses, starting from zero or from one (presumably the same for all kinds of objects).
For comparison, dmidecode uses labels like
PROC {1,2} DIMM {1..9}which would make the numbering from one an obvious candidate. However, I have seen examples of mcelog counting the CPUs from zero. As for using both numberings, lshw listscpu:{0,1}inslot: Proc {1,2}andmemory:{0,1}withbank:{0..8}asphysical id: {0..9}inPROC {1,2} DIMM {1..9}. Finally, it could even depend on the kind of machine and how its BIOS reports to the kernel.I have been totally unsuccessful in finding any answer to my question and I am afraid that I would not be any more successful when digging through the code. Can anybody answer this question authoritatively? Thanks in advance for your consideration!