Found the mce error on dmesg. But mcelog didn't catch it and /var/log/mcelog is empty,
[root@test ~]#dmesg -T |grep mce
[Tue Apr 21 16:02:26 2020] mce: Using 22 MCE banks
[Sat May 1 08:56:53 2021] mce: [Hardware Error]: Machine check events logged
[root@test ~]# mcelog --client
[root@test ~]# cat /var/log/mcelog
[root@test ~]#
[root@test ~]# cat /etc/mcelog/mcelog.conf
config file for mcelog
For further options, see the mcelog manpage and documentation
by default, disable extended error logging on newer Intel processors
#syslog = yes
logfile = /var/log/mcelog
no-imc-log = yes
Filter out known broken events by default
filter = yes
don't log memory errors individually
#filter-memory-errors = yes
output in undecoded raw format to be easier machine readable
#raw = yes
[server]
An upstream bug prevents this from being disabled
Only allow root to connect by default
client-user = root
Path to socket client uses to connect
socket-path = /var/run/mcelog-client
[dimm]
Enable DIMM-tracking
dimm-tracking-enabled = yes
Disable DIMM DMI pre-population unless supported on your system
dmi-prepopulate = no
execute these triggers when the rate of corrected or uncorrected
errors per DIMM exceeds the threshold
uc-error-trigger = dimm-error-trigger
uc-error-threshold = 1 / 24h
ce-error-trigger = dimm-error-trigger
ce-error-threshold = 10 / 24h
[socket]
Memory error accounting per socket
socket-tracing-enabled = yes
mem-uc-error-threshold = 100 / 24h
mem-ce-error-trigger = socket-memory-error-trigger
mem-ce-error-threshold = 100 / 24h
mem-ce-error-log = yes
[cache]
Attempt to off-line CPUs causing cache errors
cache-threshold-trigger = cache-error-trigger
cache-threshold-log = yes
[page]
Try to soft-offline a 4K page if it exceeds the threshold
memory-ce-threshold = 10 / 24h
memory-ce-trigger = page-error-trigger
memory-ce-log = yes
memory-ce-action = soft
[trigger]
Maximum number of running triggers
children-max = 2
directory = /etc/mcelog/triggers
[root@test ~]#
Found the mce error on dmesg. But mcelog didn't catch it and /var/log/mcelog is empty,
[root@test ~]#dmesg -T |grep mce
[Tue Apr 21 16:02:26 2020] mce: Using 22 MCE banks
[Sat May 1 08:56:53 2021] mce: [Hardware Error]: Machine check events logged
[root@test ~]# mcelog --client
[root@test ~]# cat /var/log/mcelog
[root@test ~]#
[root@test ~]# cat /etc/mcelog/mcelog.conf
config file for mcelog
For further options, see the mcelog manpage and documentation
by default, disable extended error logging on newer Intel processors
#syslog = yes
logfile = /var/log/mcelog
no-imc-log = yes
Filter out known broken events by default
filter = yes
don't log memory errors individually
#filter-memory-errors = yes
output in undecoded raw format to be easier machine readable
#raw = yes
[server]
An upstream bug prevents this from being disabled
Only allow root to connect by default
client-user = root
Path to socket client uses to connect
socket-path = /var/run/mcelog-client
[dimm]
Enable DIMM-tracking
dimm-tracking-enabled = yes
Disable DIMM DMI pre-population unless supported on your system
dmi-prepopulate = no
execute these triggers when the rate of corrected or uncorrected
errors per DIMM exceeds the threshold
uc-error-trigger = dimm-error-trigger
uc-error-threshold = 1 / 24h
ce-error-trigger = dimm-error-trigger
ce-error-threshold = 10 / 24h
[socket]
Memory error accounting per socket
socket-tracing-enabled = yes
mem-uc-error-threshold = 100 / 24h
mem-ce-error-trigger = socket-memory-error-trigger
mem-ce-error-threshold = 100 / 24h
mem-ce-error-log = yes
[cache]
Attempt to off-line CPUs causing cache errors
cache-threshold-trigger = cache-error-trigger
cache-threshold-log = yes
[page]
Try to soft-offline a 4K page if it exceeds the threshold
memory-ce-threshold = 10 / 24h
memory-ce-trigger = page-error-trigger
memory-ce-log = yes
memory-ce-action = soft
[trigger]
Maximum number of running triggers
children-max = 2
directory = /etc/mcelog/triggers
[root@test ~]#