Skip to content

mcelog didn't catch the mce memory #91

@mysnoopy

Description

@mysnoopy

Found the mce error on dmesg. But mcelog didn't catch it and /var/log/mcelog is empty,

[root@test ~]#dmesg -T |grep mce
[Tue Apr 21 16:02:26 2020] mce: Using 22 MCE banks
[Sat May 1 08:56:53 2021] mce: [Hardware Error]: Machine check events logged

[root@test ~]# mcelog --client
[root@test ~]# cat /var/log/mcelog
[root@test ~]#

[root@test ~]# cat /etc/mcelog/mcelog.conf

config file for mcelog

For further options, see the mcelog manpage and documentation

by default, disable extended error logging on newer Intel processors

#syslog = yes

logfile = /var/log/mcelog

no-imc-log = yes

Filter out known broken events by default

filter = yes

don't log memory errors individually

#filter-memory-errors = yes

output in undecoded raw format to be easier machine readable

#raw = yes

[server]

An upstream bug prevents this from being disabled

Only allow root to connect by default

client-user = root

Path to socket client uses to connect

socket-path = /var/run/mcelog-client

[dimm]

Enable DIMM-tracking

dimm-tracking-enabled = yes

Disable DIMM DMI pre-population unless supported on your system

dmi-prepopulate = no

execute these triggers when the rate of corrected or uncorrected

errors per DIMM exceeds the threshold

uc-error-trigger = dimm-error-trigger
uc-error-threshold = 1 / 24h
ce-error-trigger = dimm-error-trigger
ce-error-threshold = 10 / 24h

[socket]

Memory error accounting per socket

socket-tracing-enabled = yes
mem-uc-error-threshold = 100 / 24h
mem-ce-error-trigger = socket-memory-error-trigger
mem-ce-error-threshold = 100 / 24h
mem-ce-error-log = yes

[cache]

Attempt to off-line CPUs causing cache errors

cache-threshold-trigger = cache-error-trigger
cache-threshold-log = yes

[page]

Try to soft-offline a 4K page if it exceeds the threshold

memory-ce-threshold = 10 / 24h
memory-ce-trigger = page-error-trigger
memory-ce-log = yes
memory-ce-action = soft

[trigger]

Maximum number of running triggers

children-max = 2
directory = /etc/mcelog/triggers
[root@test ~]#

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions