Skip to content

Commit

Permalink
Add support for basic features on CentOS 8
Browse files Browse the repository at this point in the history
* Enable PowerTools Repo so *-devel packages can be installed with DNF
* Install Python3 as system python for CentOS 8
* Do not enforce kernel_devel version because kernel_devel package with same version as kernel release version cannot be found
* Install iptables
* Enable EPEL repo by default

## Fix nfs logic for CentOS 8
* Fix nfs logic in base_install by calling nfs::server4 recipe and providing correct idmap service name, nfs-idmapd
* Workaround to only run nfs::server instead of nfs::server4 for CentOS 8 due to issue: sous-chefs/nfs#116

## EBS
* Modify logic to get EBS device to volume id mapping. Specifically ec2_dev_2_volid.py and parallelcluster-ebsnvme-id are modified for CentOS 8 to use nvme-cli to retrieve volume id for a device following this guide: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/nvme-ebs-volumes.html#identify-nvme-ebs-device
* parallelcluster-ebsnvme-id needs to accept the options -v/-b/-u to output volume id and block device information when called from ec2_dev_2_volid.py and attachVolume.py
* Modify centos8 specific parallelcluster-ebsnvme-id to output correct info based on option specified
* Centos8 specific ec2_dev_2_volid.py no longer needed and removed, as new parallelcluster-ebsnvme-id script will accept -v option to output volume id

## DNS configuration

* Configure DNS settings for CentOS 8. Note dhclient is not enabled by default, so need to provide modified NetworkManager config. Afterwards same logic as CentOS 7 can be used

Signed-off-by: Rex <[email protected]>
  • Loading branch information
rexcsn authored and enrico-usai committed Nov 3, 2020
1 parent 268b17c commit b8753a3
Show file tree
Hide file tree
Showing 9 changed files with 195 additions and 18 deletions.
8 changes: 8 additions & 0 deletions attributes/default.rb
Original file line number Diff line number Diff line change
Expand Up @@ -206,6 +206,14 @@
libical-devel postgresql-devel postgresql-server sendmail libxml2-devel libglvnd-devel mdadm python python-pip
libssh2-devel libgcrypt-devel libevent-devel glibc-static bind-utils]
end
if node['platform_version'].to_i == 8
# Install python3 instead of unversioned python
default['cfncluster']['base_packages'].delete('python')
default['cfncluster']['base_packages'].delete('python-pip')
# Install iptables used in configure-pat.sh
# Install nvme-cli package used to retrieve info about EBS volumes in parallelcluster-ebsnvme-id
default['cfncluster']['base_packages'].push(%w[python3 python3-pip iptables nvme-cli])
end
if node['platform_version'].to_i >= 8
# gdisk required for FSx
# environment-modules required for IntelMPI
Expand Down
51 changes: 51 additions & 0 deletions files/centos-8/NetworkManager.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Configuration file for NetworkManager.
#
# See "man 5 NetworkManager.conf" for details.
#
# The directories /usr/lib/NetworkManager/conf.d/ and /run/NetworkManager/conf.d/
# can contain additional configuration snippets installed by packages. These files are
# read before NetworkManager.conf and have thus lowest priority.
# The directory /etc/NetworkManager/conf.d/ can contain additional configuration
# snippets. Those snippets are merged last and overwrite the settings from this main
# file.
#
# The files within one conf.d/ directory are read in asciibetical order.
#
# If /etc/NetworkManager/conf.d/ contains a file with the same name as
# /usr/lib/NetworkManager/conf.d/, the latter file is shadowed and thus ignored.
# Hence, to disable loading a file from /usr/lib/NetworkManager/conf.d/ you can
# put an empty file to /etc with the same name. The same applies with respect
# to the directory /run/NetworkManager/conf.d where files in /run shadow
# /usr/lib and are themselves shadowed by files under /etc.
#
# If two files define the same key, the one that is read afterwards will overwrite
# the previous one.

[main]
plugins = ifcfg-rh,
dhcp = dhclient


[logging]
# When debugging NetworkManager, enabling debug logging is of great help.
#
# Logfiles contain no passwords and little sensitive information. But please
# check before posting the file online. You can also personally hand over the
# logfile to a NM developer to treat it confidential. Meet us on #nm on freenode.
# Please post full logfiles except minimal modifications of private data.
#
# You can also change the log-level at runtime via
# $ nmcli general logging level TRACE domains ALL
# However, usually it's cleaner to enable debug logging
# in the configuration and restart NetworkManager so that
# debug logging is enabled from the start.
#
# You will find the logfiles in syslog, for example via
# $ journalctl -u NetworkManager
#
# Note that debug logging of NetworkManager can be quite verbose. Some messages
# might be rate-limited by the logging daemon (see RateLimitIntervalSec, RateLimitBurst
# in man journald.conf). Please disable rate-limiting before collecting debug logs.
#
#level=TRACE
#domains=ALL
90 changes: 90 additions & 0 deletions files/centos-8/parallelcluster-ebsnvme-id
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
#!/bin/bash

# Copyright (C) 2020 Amazon.com, Inc. or its affiliates.
# All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License").
# You may not use this file except in compliance with the License.
# A copy of the License is located at
#
# http://aws.amazon.com/apache2.0/
#
# or in the "license" file accompanying this file. This file is
# distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS
# OF ANY KIND, either express or implied. See the License for the
# specific language governing permissions and limitations under the
# License.

# Usage:
# Read EBS device information using nvme-cli and provide information about the volume.
display_help() {
echo "Usage: $0 [options] {device_name}" >&2
echo
echo " -v, --volume Return volume-id"
echo " -b, --block-dev Return block device mapping"
echo " -u, --udev Output data in format suitable for udev rules, i.e. /dev/sdb -> sdb"
echo " -h, --help Print usage info"
echo
}

print_all=1
print_volume=0
print_block_device_mapping=0
print_udev_format=0

# Parse arguments
for i in "$@"
do
case $i in
-h|--help)
display_help
exit 0
;;
-v|--volume)
print_volume=1
print_all=0
shift
;;
-b|--block-dev)
print_block_device_mapping=1
print_all=0
shift
;;
-u|--udev)
print_udev_format=1
print_all=0
shift
;;
*)
;;
esac
done

# Check if device argument is provided
if [[ "$#" -ne 1 ]]; then
display_help
exit 1
fi

if [[ $print_all -eq 1 || $print_volume -eq 1 ]]; then
# Sample volume info from nvme-cli:
# sn : vol01234567890abcdef
# See https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/nvme-ebs-volumes.html#identify-nvme-ebs-device
# Insert '-' after 'vol' so that output looks like vol-067f083a4f6xxxxx
vol_id=$(sudo nvme id-ctrl -v ${1} | grep -oP "sn\s+:\s\K(.+)" | sed 's/^vol/&-/')
echo "Volume ID: ${vol_id}"
fi

if [[ $print_all -eq 1 || $print_block_device_mapping -eq 1 || $print_udev_format -eq 1 ]]; then
# Sample device name info from nvme-cli:
# 0000: 2f 64 65 76 2f 73 64 6a 20 20 20 20 20 20 20 20 "/dev/sdf..."
# See https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/nvme-ebs-volumes.html#identify-nvme-ebs-device
if [[ $print_udev_format -eq 1 ]]; then
# Strip /dev/ prefix if -u option is specified
device_name=$(sudo nvme id-ctrl -v ${1} | grep -oP '^0000:.+"\K([\/\w]+)(?=\.*"$)' | sed "s/^\/dev\///")
else
device_name=$(sudo nvme id-ctrl -v ${1} | grep -oP '^0000:.+"\K([\/\w]+)(?=\.*"$)')
fi

echo "${device_name}"
fi
2 changes: 1 addition & 1 deletion files/default/attachVolume.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
def convert_dev(dev):
# Translate the device name as provided by the OS to the one used by EC2
# FIXME This approach could be broken in some OS variants, see
# https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/device_naming.html
# https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/nvme-ebs-volumes.html#identify-nvme-ebs-device
if '/nvme' in dev:
return '/dev/' + os.popen('sudo /usr/local/sbin/parallelcluster-ebsnvme-id -u -b ' + dev).read().strip()
elif '/hd' in dev:
Expand Down
10 changes: 8 additions & 2 deletions recipes/_update_packages.rb
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,14 @@
# not CentOS6
case node['platform_family']
when 'rhel', 'amazon'
execute 'yum-update' do
command "yum -y update && package-cleanup -y --oldkernels --count=1"
if node['platform'] == 'centos' && node['platform_version'].to_i == 8
execute 'dnf-update' do
command "dnf -y update"
end
else
execute 'yum-update' do
command "yum -y update && package-cleanup -y --oldkernels --count=1"
end
end
when 'debian'
apt_update
Expand Down
1 change: 0 additions & 1 deletion recipes/base_config.rb
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,6 @@
# EFA runtime configuration
include_recipe "aws-parallelcluster::efa_config"

# case node['cfncluster']['cfn_node_type']
case node['cfncluster']['cfn_node_type']
when 'MasterServer'
include_recipe 'aws-parallelcluster::_master_base_config'
Expand Down
37 changes: 24 additions & 13 deletions recipes/base_install.rb
Original file line number Diff line number Diff line change
Expand Up @@ -26,12 +26,18 @@
include_recipe "yum-epel"
end


unless node['platform_version'].to_i < 7
execute 'yum-config-manager_skip_if_unavail' do
command "yum-config-manager --setopt=\*.skip_if_unavailable=1 --save"
end
end
if node['platform'] == 'centos' && node['platform_version'].to_i == 8
# Enable PowerTools Repo so *-devel packages can be installed with DNF
# Enable EPEL repos
execute 'dnf enable powertools and EPEL repos' do
command "dnf config-manager --set-enabled PowerTools && dnf install -y epel-release"
end
end

if node['platform'] == 'redhat'
execute 'yum-config-manager-rhel' do
Expand Down Expand Up @@ -66,18 +72,19 @@
end
end

case node['platform_family']
when 'rhel', 'amazon'
yum_package node['cfncluster']['kernel_devel_pkg']['name'] do
version node['cfncluster']['kernel_devel_pkg']['version']
retries 3
retry_delay 5
end
when 'debian'
apt_package node['cfncluster']['kernel_generic_pkg'] do
retries 3
retry_delay 5
package "install kernel packages" do
case node['platform_family']
when 'rhel', 'amazon'
package_name node['cfncluster']['kernel_devel_pkg']['name']
if node['platform'] == 'centos' && node['platform_version'].to_i < 8
# Do not enforce kernel_devel version on CentOS8 because kernel_devel package with same version as kernel release version cannot be found
version node['cfncluster']['kernel_devel_pkg']['version']
end
when 'debian'
package_name node['cfncluster']['kernel_generic_pkg']
end
retries 3
retry_delay 5
end

bash "install awscli" do
Expand All @@ -86,7 +93,7 @@
set -e
curl --retry 5 --retry-delay 5 "https://s3.amazonaws.com/aws-cli/awscli-bundle.zip" -o "awscli-bundle.zip"
unzip awscli-bundle.zip
./awscli-bundle/install -i /usr/local/aws -b /usr/local/bin/aws
#{node['cfncluster']['cookbook_virtualenv_path']}/bin/python awscli-bundle/install -i /usr/local/aws -b /usr/local/bin/aws
CLI
not_if { ::File.exist?("/usr/local/bin/aws") }
end
Expand Down Expand Up @@ -120,6 +127,10 @@
# FIXME: https://github.com/atomic-penguin/cookbook-nfs/issues/93
include_recipe "nfs::server"
end
if node['platform'] == 'centos' && node['platform_version'].to_i == 8
# Workaround for issue: https://github.com/atomic-penguin/cookbook-nfs/issues/116
node.force_override['nfs']['service']['idmap'] = 'nfs-idmapd'
end
include_recipe "nfs::server4"

# Put configure-pat.sh onto the host
Expand Down
12 changes: 12 additions & 0 deletions recipes/dns_config.rb
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,18 @@
line "append domain-name \" #{node['cfncluster']['cfn_dns_domain']}\";"
end
end

if platform?('centos') && node['platform_version'].to_i == 8
# On CentOS8 dhclient is not enabled by default
# Put pcluster version of NetworkManager.conf in place
# dhcp = dhclient needs to be added under [main] section to enable dhclient
cookbook_file 'NetworkManager.conf' do
path '/etc/NetworkManager/NetworkManager.conf'
user 'root'
group 'root'
mode '0644'
end
end
end
restart_network_service
end
Expand Down
2 changes: 1 addition & 1 deletion recipes/ganglia_config.rb
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@

# For ComputeFleet and MasterServer

if node['platform_family'] == 'rhel' && node['platform_version'].to_i == 7 || node['platform'] == 'amazon' && node['platform_version'].to_i == 2
if node['platform'] == 'centos' && node['platform_version'].to_i >= 7 || node['platform'] == 'amazon' && node['platform_version'].to_i == 2
# Fix circular dependency multi-user.target -> cloud-init-> gmond -> multi-user.target
# gmond is started by chef during cloud-init, but gmond service is configured to start after multi-user.target
# which doesn't start until cloud-init run is finished. So gmond service is stuck into starting, which keep
Expand Down

0 comments on commit b8753a3

Please sign in to comment.