Skip to content

Commit

Permalink
Introduce vmcore creation notification to kdump
Browse files Browse the repository at this point in the history
Motivation
==========

People may forget to recheck to ensure kdump works, which as a result, a
possibility of no vmcores generated after a real system crash. It is
unexpected for kdump.

It is highly recommended people to recheck kdump after any system
modification, such as:

a. after kernel patching or whole yum update, as it might break something
   on which kdump is dependent, maybe due to introduction of any new bug etc.
b. after any change at hardware level, maybe storage, networking,
   firmware upgrading etc.
c. after implementing any new application, like which involves 3rd party modules
   etc.

Though these exceed the range of kdump, however a simple vmcore creation
status notification is good to have for now.

Design
======

Kdump currently will check any relating files/fs/drivers modified before
determine if initrd should rebuild when (re)start. A rebuild is an
indicator of such modification, and kdump need to be rechecked. This will
clear the vmcore creation status specified in $VMCORE_CREATION_STATUS.

Vmcore creation check will happen at "kdumpctl (re)start/status", and will
report the creation success/fail status to users. A "success" status indicates
previously there has been a vmcore successfully generated based on the current
env, so it is more likely a vmcore will be generated later when real crash
happens; A "fail" status indicates previously there was no vmcore
generated, or has been a vmcore creation failed based on current env. User
should check the 2nd kernel log or the kexec-dmesg.log for the failing reason.

$VMCORE_CREATION_STATUS is used for recording the vmcore creation status of
the current env. The format will be like:

   success 1718682002

Which means, there has been a vmcore generated successfully at this
timestamp for the current env.

Usage
=====

[root@localhost ~]# kdumpctl restart
kdump: kexec: unloaded kdump kernel
kdump: Stopping kdump: [OK]
kdump: kexec: loaded kdump kernel
kdump: Starting kdump: [OK]
kdump: Notice: No vmcore creation test performed!

[root@localhost ~]# kdumpctl test

[root@localhost ~]# kdumpctl status
kdump: Kdump is operational
kdump: Notice: Last successful vmcore creation on Tue Jun 18 16:39:10 CST 2024

[root@localhost ~]# kdumpctl restart
kdump: kexec: unloaded kdump kernel
kdump: Stopping kdump: [OK]
kdump: kexec: loaded kdump kernel
kdump: Starting kdump: [OK]
kdump: Notice: Last successful vmcore creation on Tue Jun 18 16:39:10 CST 2024

The notification for kdumpctl (re)start/status can be disabled by
setting VMCORE_CREATION_NOTIFICATION in /etc/sysconfig/kdump

===

v3 -> v2:
Always mount
$VMCORE_CREATION_STATUS(/var/crash/vmcore-creation.status)'s device for
2nd kernel, in case /var is a seperate device than rootfs's device.

v4 -> v3:
Add "kdumpctl test" as the entrance for performing the kdump test.

v5 -> v4:
Fix the mounting failure issue in fadump.

v6 -> v5:
Add new argument as customized mount point for add_mount/to_mount.

v7 -> v6:
a. Code refactoring based on Philipp's suggestion.
b. Only mount $VMCORE_CREATION_STATUS(/var/crash/vmcore-creation.status)'s
   device when needed.
c. Add "--force" option for "kdumpctl test", to support the automation test
   script QE may perform.
d. Add check in "kdumpctl test" that $VMCORE_CREATION_STATUS can only be on
   local drive.

v8 -> v7:
a. Rebased the patch on top of upstream commit e2b8463.
b. Code refactoring based on Philipp's suggestion.
c. Updated the "test" entry of kdumpctl.8.

Signed-off-by: Tao Liu <[email protected]>
  • Loading branch information
liutgnu authored and coiby committed Sep 29, 2024
1 parent a627ee9 commit 88525eb
Show file tree
Hide file tree
Showing 7 changed files with 139 additions and 3 deletions.
11 changes: 11 additions & 0 deletions dracut/99kdumpbase/kdump.sh
Original file line number Diff line number Diff line change
Expand Up @@ -308,11 +308,22 @@ do_final_action() {
}

do_dump() {
if [ -d /vmcorestatus ]; then
_vmcore_creation_status="/vmcorestatus/$VMCORE_CREATION_STATUS"
else
_vmcore_creation_status="/sysroot/$VMCORE_CREATION_STATUS"
fi

set_vmcore_creation_status 'clear' "$_vmcore_creation_status"

eval "$DUMP_INSTRUCTION"
_ret=$?

if [ $_ret -ne 0 ]; then
set_vmcore_creation_status 'fail' "$_vmcore_creation_status"
derror "saving vmcore failed"
else
set_vmcore_creation_status 'success' "$_vmcore_creation_status"
fi

return $_ret
Expand Down
1 change: 1 addition & 0 deletions dracut/99kdumpbase/module-setup.sh
Original file line number Diff line number Diff line change
Expand Up @@ -1095,6 +1095,7 @@ install() {
inst "/usr/bin/printf" "/sbin/printf"
inst "/usr/bin/logger" "/sbin/logger"
inst "/usr/bin/chmod" "/sbin/chmod"
inst "/usr/bin/dirname" "/sbin/dirname"
inst "/lib/kdump/kdump-lib-initramfs.sh" "/lib/kdump-lib-initramfs.sh"
inst "/lib/kdump/kdump-logger.sh" "/lib/kdump-logger.sh"
inst "$moddir/kdump.sh" "/usr/bin/kdump.sh"
Expand Down
4 changes: 4 additions & 0 deletions gen-kdump-sysconfig.sh
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,10 @@ KDUMP_IMG="vmlinuz"
#What is the images extension. Relocatable kernels don't have one
KDUMP_IMG_EXT=""
# Enable vmcore creation notification by default, disable by setting
# VMCORE_CREATION_NOTIFICATION=""
VMCORE_CREATION_NOTIFICATION="yes"
# Logging is controlled by following variables in the first kernel:
# - @var KDUMP_STDLOGLVL - logging level to standard error (console output)
# - @var KDUMP_SYSLOGLVL - logging level to syslog (by logger command)
Expand Down
33 changes: 33 additions & 0 deletions kdump-lib-initramfs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ KDUMP_CONFIG_FILE="/etc/kdump.conf"
FENCE_KDUMP_CONFIG_FILE="/etc/sysconfig/fence_kdump"
FENCE_KDUMP_SEND="/usr/libexec/fence_kdump_send"
LVM_CONF="/etc/lvm/lvm.conf"
VMCORE_CREATION_STATUS="/var/crash/vmcore-creation.status"

# Read kdump config in well formated style
kdump_read_conf()
Expand Down Expand Up @@ -171,3 +172,35 @@ kdump_get_ip_route_field()
{
echo "$1" | sed -n -e "s/^.*\<$2\>\s\+\(\S\+\).*$/\1/p"
}

# $1: success/fail/clear
# $2: status_file
set_vmcore_creation_status()
{
_status=$1
_status_file=$2
_dir=$(dirname "$_status_file")

[[ -d "$_dir" ]] || mkdir -p "$_dir"

_mnt_op=$(get_mount_info OPTIONS target "$_dir" -f)
case $_mnt_op in
ro*)
dinfo "remounting the vmcore status target in rw mode."
mount -o remount,rw "$(findmnt -n -o TARGET --target $_dir)"
;;
esac

case "$_status" in
success | fail)
dinfo "saving vmcore status file to $_status_file"
echo "$_status $(date +%s)" > "$_status_file"
;;
clear)
rm -f "$_status_file"
;;
*)
return
esac
sync -f "$_dir"
}
67 changes: 66 additions & 1 deletion kdumpctl
Original file line number Diff line number Diff line change
Expand Up @@ -185,6 +185,8 @@ rebuild_initrd()
else
rebuild_kdump_initrd
fi

set_vmcore_creation_status 'clear' "$VMCORE_CREATION_STATUS"
}

#$1: the files to be checked with IFS=' '
Expand Down Expand Up @@ -1043,6 +1045,7 @@ start()
start_dump || return

dinfo "Starting kdump: [OK]"
check_vmcore_creation_status
return 0
}

Expand Down Expand Up @@ -1653,6 +1656,63 @@ _should_reset_crashkernel() {
[[ $(kdump_get_conf_val auto_reset_crashkernel) != no ]] && systemctl is-enabled kdump &> /dev/null
}

check_vmcore_creation_status()
{
local _status _timestamp _status_date

[[ ${VMCORE_CREATION_NOTIFICATION,,} == "yes" ]] || return

if [[ ! -s $VMCORE_CREATION_STATUS ]]; then
dwarn "Notice: No vmcore creation test performed!"
return
fi

read -r _status _timestamp < "$VMCORE_CREATION_STATUS"
_status_date="$(date -d "@$_timestamp")"

if [[ "$_status" == "success" ]]; then
dinfo "Notice: Last successful vmcore creation on $_status_date"
else
dwarn "Notice: Last NOT successful vmcore creation on $_status_date"
fi
}

kdump_test()
{
local _dir

if ! is_kernel_loaded "$DEFAULT_DUMP_MODE"; then
derror "Kdump needs be operational before test."
exit 1
fi

_dir=$(dirname "$VMCORE_CREATION_STATUS")
if ! [[ -d "$_dir" ]]; then
derror "Vmcore status dir $_dir not exist."
exit 1
fi

if ! lsblk $(get_mount_info SOURCE target "$_dir") > /dev/null; then
derror "$VMCORE_CREATION_STATUS must on local drive"
exit 1
fi

if [[ ! "$1" == "--force" ]]; then
read -p "DANGER!!! Will perform a kdump test by crashing the system, proceed? (y/N): " input
case $input in
[Yy] )
dinfo "Start kdump test..."
;;
* )
dinfo "Operation cancelled."
exit 0
;;
esac
fi
set_vmcore_creation_status 'clear' "$VMCORE_CREATION_STATUS"
echo c > /proc/sysrq-trigger
}

main()
{
# Determine if the dump mode is kdump or fadump
Expand Down Expand Up @@ -1684,6 +1744,7 @@ main()
EXIT_CODE=3
;;
esac
check_vmcore_creation_status
exit $EXIT_CODE
;;
reload)
Expand Down Expand Up @@ -1728,8 +1789,12 @@ main()
reset_crashkernel_for_installed_kernel "$2"
fi
;;
test)
shift
kdump_test "$@"
;;
*)
dinfo $"Usage: $0 {estimate|start|stop|status|restart|reload|rebuild|reset-crashkernel|propagate|showmem}"
dinfo $"Usage: $0 {estimate|start|stop|status|restart|reload|rebuild|reset-crashkernel|propagate|showmem|test}"
exit 1
;;
esac
Expand Down
10 changes: 10 additions & 0 deletions kdumpctl.8
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,16 @@ Note: The memory requirements for kdump varies heavily depending on the
used hardware and system configuration. Thus the recommended
crashkernel might not work for your specific setup. Please test if
kdump works after resetting the crashkernel value.
.TP
.I test [--force]
Test the kdump by actually trigger the system crash & dump, and check if a
vmcore can really be generated successfully based on current config and
environment. After system reboot back to normal, check the test result
by "kdumpctl status".

If the optional parameter [--force] is provided, there will be no interact
before triggering the system crash. Dangerous though, this option is meant
for automation testing.

.SH "SEE ALSO"
.BR kdump.conf (5),
Expand Down
16 changes: 14 additions & 2 deletions mkdumprd
Original file line number Diff line number Diff line change
Expand Up @@ -73,9 +73,10 @@ has_dracut_module()
# caller should ensure $1 is valid and mounted in 1st kernel
to_mount()
{
local _target=$1 _fstype=$2 _options=$3 _sed_cmd _new_mntpoint _pdev
local _target=$1 _fstype=$2 _options=$3 _new_mntpoint=$4
local _sed_cmd _pdev

_new_mntpoint=$(get_kdump_mntpoint_from_target "$_target")
_new_mntpoint="${_new_mntpoint:-$(get_kdump_mntpoint_from_target "$_target")}"
_fstype="${_fstype:-$(get_fs_type_from_target "$_target")}"
_options="${_options:-$(get_mntopt_from_target "$_target")}"
_options="${_options:-defaults}"
Expand Down Expand Up @@ -429,4 +430,15 @@ if ! is_fadump_capable; then
fi
fi

status_target=$(get_target_from_path $(dirname "$VMCORE_CREATION_STATUS"))

if [[ $(get_root_fs_device) != "$status_target" ]]; then
new_mntpoint=$(echo /vmcorestatus/$(get_mntpoint_from_target "$status_target") \
| tr -s "/")
add_mount "$status_target" "" "" "$new_mntpoint"
elif ! is_fadump_capable && \
! [[ ${dracut_args[@]} == *"$(kdump_get_persistent_dev $status_target)"* ]]; then
add_mount "$status_target"
fi

dracut "${dracut_args[@]}" "$@"

0 comments on commit 88525eb

Please sign in to comment.