Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect nfs idmap service name for CentOS 8 #116

Open
rexcsn opened this issue Sep 30, 2020 · 0 comments
Open

Incorrect nfs idmap service name for CentOS 8 #116

rexcsn opened this issue Sep 30, 2020 · 0 comments

Comments

@rexcsn
Copy link
Contributor

rexcsn commented Sep 30, 2020

Hi,

When running nfs::server4 recipe on CentOS 8 I encountered this error:

* service[idmap] action start[2020-09-30T18:08:57+00:00] INFO: Processing service[idmap] action start (nfs::_idmap line 29)

    
    ================================================================================
    Error executing action `start` on resource 'service[idmap]'
    ================================================================================
    
    Mixlib::ShellOut::ShellCommandFailed
    ------------------------------------
    Expected process to exit with [0], but received '5'
    ---- Begin output of /bin/systemctl --system start nfs-idmap ----
    STDOUT: 
    STDERR: Failed to start nfs-idmap.service: Unit nfs-idmap.service not found.
    ---- End output of /bin/systemctl --system start nfs-idmap ----
    Ran /bin/systemctl --system start nfs-idmap returned 5

I believe this is due to idmap service name being incorrect for CentOS 8, as shown in attributes here?
I think the correct service name should be nfs-idmapd instead of nfs-idmap?

$ sudo systemctl status nfs-idmapd
● nfs-idmapd.service - NFSv4 ID-name mapping service
   Loaded: loaded (/usr/lib/systemd/system/nfs-idmapd.service; static; vendor preset: disabled)
   Active: active (running) since Wed 2020-09-30 18:08:57 UTC; 1h 37min ago
  Process: 97264 ExecStart=/usr/sbin/rpc.idmapd (code=exited, status=0/SUCCESS)
 Main PID: 97266 (rpc.idmapd)
    Tasks: 1 (limit: 47436)
   Memory: 844.0K
   CGroup: /system.slice/nfs-idmapd.service
           └─97266 /usr/sbin/rpc.idmapd

$ sudo systemctl status nfs-idmap
Unit nfs-idmap.service could not be found.

This seem to be an issue only related to nfs::_idmap recipe, as running nfs::server alone works without a problem.

Thank you!

rexcsn added a commit to rexcsn/aws-parallelcluster-cookbook that referenced this issue Sep 30, 2020
* Enable PowerTools Repo so *-devel packages can be installed with DNF
* Install Python3 as system python for CentOS 8
* Remove obsolete logic of running nfs::server recipe when Ubuntu, nfs::server4 already includes nfs::server recipe
* Workaround to only run nfs::server instead of nfs::server4 for CentOS 8 due to issue: sous-chefs/nfs#116
* Comment out other install recipes included in base_install that will be addressed by separate PRs

Signed-off-by: Rex <[email protected]>
rexcsn added a commit to rexcsn/aws-parallelcluster-cookbook that referenced this issue Sep 30, 2020
* Enable PowerTools Repo so *-devel packages can be installed with DNF
* Install Python3 as system python for CentOS 8
* Remove obsolete logic of running nfs::server recipe when Ubuntu, nfs::server4 already includes nfs::server recipe
* Workaround to only run nfs::server instead of nfs::server4 for CentOS 8 due to issue: sous-chefs/nfs#116
* Comment out other install recipes included in base_install that will be addressed by separate PRs

Signed-off-by: Rex <[email protected]>
rexcsn added a commit to rexcsn/aws-parallelcluster-cookbook that referenced this issue Sep 30, 2020
* Enable PowerTools Repo so *-devel packages can be installed with DNF
* Install Python3 as system python for CentOS 8
* Remove obsolete logic of running nfs::server recipe when Ubuntu, nfs::server4 already includes nfs::server recipe
* Workaround to only run nfs::server instead of nfs::server4 for CentOS 8 due to issue: sous-chefs/nfs#116
* Do not enforce kernel_devel version because kernel_devel package with same version as kernel release version cannot be found
* Comment out other install recipes included in base_install that will be addressed by separate PRs

Signed-off-by: Rex <[email protected]>
rexcsn added a commit to rexcsn/aws-parallelcluster-cookbook that referenced this issue Oct 5, 2020
* Enable PowerTools Repo so *-devel packages can be installed with DNF
* Install Python3 as system python for CentOS 8
* Remove obsolete logic of running nfs::server recipe when Ubuntu, nfs::server4 already includes nfs::server recipe
* Workaround to only run nfs::server instead of nfs::server4 for CentOS 8 due to issue: sous-chefs/nfs#116
* Do not enforce kernel_devel version because kernel_devel package with same version as kernel release version cannot be found
* Comment out other install recipes included in base_install that will be addressed by separate PRs

Signed-off-by: Rex <[email protected]>
rexcsn added a commit to rexcsn/aws-parallelcluster-cookbook that referenced this issue Oct 6, 2020
* Enable PowerTools Repo so *-devel packages can be installed with DNF
* Install Python3 as system python for CentOS 8
* Remove obsolete logic of running nfs::server recipe when Ubuntu, nfs::server4 already includes nfs::server recipe
* Workaround to only run nfs::server instead of nfs::server4 for CentOS 8 due to issue: sous-chefs/nfs#116
* Do not enforce kernel_devel version because kernel_devel package with same version as kernel release version cannot be found
* Comment out other install recipes included in base_install that will be addressed by separate PRs

Signed-off-by: Rex <[email protected]>
enrico-usai pushed a commit to aws/aws-parallelcluster-cookbook that referenced this issue Oct 7, 2020
* Enable PowerTools Repo so *-devel packages can be installed with DNF
* Install Python3 as system python for CentOS 8
* Remove obsolete logic of running nfs::server recipe when Ubuntu, nfs::server4 already includes nfs::server recipe
* Workaround to only run nfs::server instead of nfs::server4 for CentOS 8 due to issue: sous-chefs/nfs#116
* Do not enforce kernel_devel version because kernel_devel package with same version as kernel release version cannot be found
* Comment out other install recipes included in base_install that will be addressed by separate PRs

Signed-off-by: Rex <[email protected]>
enrico-usai pushed a commit to enrico-usai/aws-parallelcluster-cookbook that referenced this issue Oct 27, 2020
* Enable PowerTools Repo so *-devel packages can be installed with DNF
* Install Python3 as system python for CentOS 8
* Remove obsolete logic of running nfs::server recipe when Ubuntu, nfs::server4 already includes nfs::server recipe
* Workaround to only run nfs::server instead of nfs::server4 for CentOS 8 due to issue: sous-chefs/nfs#116
* Do not enforce kernel_devel version because kernel_devel package with same version as kernel release version cannot be found
* Comment out other install recipes included in base_install that will be addressed by separate PRs

Signed-off-by: Rex <[email protected]>
enrico-usai pushed a commit to enrico-usai/aws-parallelcluster-cookbook that referenced this issue Oct 27, 2020
* Enable PowerTools Repo so *-devel packages can be installed with DNF
* Install Python3 as system python for CentOS 8
* Remove obsolete logic of running nfs::server recipe when Ubuntu, nfs::server4 already includes nfs::server recipe
* Workaround to only run nfs::server instead of nfs::server4 for CentOS 8 due to issue: sous-chefs/nfs#116
* Do not enforce kernel_devel version because kernel_devel package with same version as kernel release version cannot be found
* Comment out other install recipes included in base_install that will be addressed by separate PRs

Signed-off-by: Rex <[email protected]>
enrico-usai pushed a commit to enrico-usai/aws-parallelcluster-cookbook that referenced this issue Nov 2, 2020
* Enable PowerTools Repo so *-devel packages can be installed with DNF
* Install Python3 as system python for CentOS 8
* Remove obsolete logic of running nfs::server recipe when Ubuntu, nfs::server4 already includes nfs::server recipe
* Workaround to only run nfs::server instead of nfs::server4 for CentOS 8 due to issue: sous-chefs/nfs#116
* Do not enforce kernel_devel version because kernel_devel package with same version as kernel release version cannot be found
* Comment out other install recipes included in base_install that will be addressed by separate PRs

Signed-off-by: Rex <[email protected]>
enrico-usai pushed a commit to enrico-usai/aws-parallelcluster-cookbook that referenced this issue Nov 2, 2020
* Enable PowerTools Repo so *-devel packages can be installed with DNF
* Install Python3 as system python for CentOS 8
* Remove obsolete logic of running nfs::server recipe when Ubuntu, nfs::server4 already includes nfs::server recipe
* Workaround to only run nfs::server instead of nfs::server4 for CentOS 8 due to issue: sous-chefs/nfs#116
* Do not enforce kernel_devel version because kernel_devel package with same version as kernel release version cannot be found
* Comment out other install recipes included in base_install that will be addressed by separate PRs

Signed-off-by: Rex <[email protected]>
enrico-usai pushed a commit to enrico-usai/aws-parallelcluster-cookbook that referenced this issue Nov 3, 2020
* Enable PowerTools Repo so *-devel packages can be installed with DNF
* Install Python3 as system python for CentOS 8
* Remove obsolete logic of running nfs::server recipe when Ubuntu, nfs::server4 already includes nfs::server recipe
* Workaround to only run nfs::server instead of nfs::server4 for CentOS 8 due to issue: sous-chefs/nfs#116
* Do not enforce kernel_devel version because kernel_devel package with same version as kernel release version cannot be found
* Comment out other install recipes included in base_install that will be addressed by separate PRs

Signed-off-by: Rex <[email protected]>
enrico-usai pushed a commit to enrico-usai/aws-parallelcluster-cookbook that referenced this issue Nov 3, 2020
* Enable PowerTools Repo so *-devel packages can be installed with DNF
* Install Python3 as system python for CentOS 8
* Do not enforce kernel_devel version because kernel_devel package with same version as kernel release version cannot be found
* Install iptables
* Enable EPEL repo by default

## Fix nfs logic for CentOS 8
* Fix nfs logic in base_install by calling nfs::server4 recipe and providing correct idmap service name, nfs-idmapd
* Workaround to only run nfs::server instead of nfs::server4 for CentOS 8 due to issue: sous-chefs/nfs#116

## EBS
* Modify logic to get EBS device to volume id mapping. Specifically ec2_dev_2_volid.py and parallelcluster-ebsnvme-id are modified for CentOS 8 to use nvme-cli to retrieve volume id for a device following this guide: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/nvme-ebs-volumes.html#identify-nvme-ebs-device
* parallelcluster-ebsnvme-id needs to accept the options -v/-b/-u to output volume id and block device information when called from ec2_dev_2_volid.py and attachVolume.py
* Modify centos8 specific parallelcluster-ebsnvme-id to output correct info based on option specified
* Centos8 specific ec2_dev_2_volid.py no longer needed and removed, as new parallelcluster-ebsnvme-id script will accept -v option to output volume id

## DNS configuration

* Configure DNS settings for CentOS 8. Note dhclient is not enabled by default, so need to provide modified NetworkManager config. Afterwards same logic as CentOS 7 can be used

Signed-off-by: Rex <[email protected]>
enrico-usai pushed a commit to enrico-usai/aws-parallelcluster-cookbook that referenced this issue Nov 3, 2020
* Enable PowerTools Repo so *-devel packages can be installed with DNF
* Install Python3 as system python for CentOS 8
* Do not enforce kernel_devel version because kernel_devel package with same version as kernel release version cannot be found
* Install iptables
* Enable EPEL repo by default

## Fix nfs logic for CentOS 8
* Fix nfs logic in base_install by calling nfs::server4 recipe and providing correct idmap service name, nfs-idmapd
* Workaround to only run nfs::server instead of nfs::server4 for CentOS 8 due to issue: sous-chefs/nfs#116

## EBS
* Modify logic to get EBS device to volume id mapping. Specifically ec2_dev_2_volid.py and parallelcluster-ebsnvme-id are modified for CentOS 8 to use nvme-cli to retrieve volume id for a device following this guide: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/nvme-ebs-volumes.html#identify-nvme-ebs-device
* parallelcluster-ebsnvme-id needs to accept the options -v/-b/-u to output volume id and block device information when called from ec2_dev_2_volid.py and attachVolume.py
* Modify centos8 specific parallelcluster-ebsnvme-id to output correct info based on option specified
* Centos8 specific ec2_dev_2_volid.py no longer needed and removed, as new parallelcluster-ebsnvme-id script will accept -v option to output volume id

## DNS configuration

* Configure DNS settings for CentOS 8. Note dhclient is not enabled by default, so need to provide modified NetworkManager config. Afterwards same logic as CentOS 7 can be used

Signed-off-by: Rex <[email protected]>
enrico-usai added a commit to enrico-usai/aws-parallelcluster-cookbook that referenced this issue Nov 3, 2020
* Enable PowerTools Repo so *-devel packages can be installed with DNF
* Install Python3 as system python for CentOS 8
* Do not enforce kernel_devel version because kernel_devel package with same version as kernel release version cannot be found
* Install iptables
* Enable EPEL repo by default

## IntelMPI

The `environment-modules` package installation automatically creates
the `/usr/share/Modules/` folder, required by the intel_mpi recipe.

References:
* https://forums.centos.org/viewtopic.php?t=74035

## NFS

* Fix nfs logic in base_install by calling nfs::server4 recipe and providing correct idmap service name, nfs-idmapd
* Workaround to only run nfs::server instead of nfs::server4 for CentOS 8 due to issue: sous-chefs/nfs#116

## EBS

* Modify logic to get EBS device to volume id mapping. Specifically ec2_dev_2_volid.py and parallelcluster-ebsnvme-id are modified for CentOS 8 to use nvme-cli to retrieve volume id for a device following this guide: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/nvme-ebs-volumes.html#identify-nvme-ebs-device
* parallelcluster-ebsnvme-id needs to accept the options -v/-b/-u to output volume id and block device information when called from ec2_dev_2_volid.py and attachVolume.py
* Modify centos8 specific parallelcluster-ebsnvme-id to output correct info based on option specified
* Centos8 specific ec2_dev_2_volid.py no longer needed and removed, as new parallelcluster-ebsnvme-id script will accept -v option to output volume id

## DNS configuration

* Configure DNS settings for CentOS 8. Note dhclient is not enabled by default, so need to provide modified NetworkManager config. Afterwards same logic as CentOS 7 can be used

# Torque

I'm using some compilation flags, like we're already doing for Ubuntu 18 and Amazon Linux 2.

`c++03` is the 1998 ISO C++ standard plus the 2003 technical corrigendum and some additional defect reports.
Note: The compilation succeeded even without the `c++03` flag. I'm adding it for coherency.
Source: https://gcc.gnu.org/onlinedocs/gcc/C-Dialect-Options.html

`-fpermissive` downgrades some diagnostics about nonconformant code from errors to warnings.
Source: https://gcc.gnu.org/onlinedocs/gcc-4.0.4/gcc/C_002b_002b-Dialect-Options.html#index-fpermissive-140

## FSx

* Use `package` in place of `yum_package` to support `dnf`.
* Added `gdisk` package, required by `update initramfs` action, called by `kernel_module 'lnet'` resource.
* Use `['platform_version'].to_f`  to compare minor version of the OS in place of `['platform_version'].split('.')[1].to_i`
  to be compatible with multiple major OS values.
* Explicitly added `x86_64` at the end of the `base_url` parameter, like we have for CentOS 7.
* Improved check for CentOS 7.5. and 7.6.
* Use `package` in place of `apt_package` to be aligned to other OSes.

References:
* Lustre installation guide: https://docs.aws.amazon.com/fsx/latest/LustreGuide/install-lustre-client.html
* gdisk issue: https://www.spinics.net/lists/centos-devel/msg18766.html
* package resource: https://docs.chef.io/resources/package/

## EFA

Mark CentOS 8 as unsupported OS for EFA. Supported AMIs are: Amazon Linux, Amazon Linux 2,
RHEL 7.6, RHEL 7.7, RHEL 7.8, CentOS 7, Ubuntu 16.04, and Ubuntu 18.04.
Source: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa-working-with.html

Signed-off-by: Enrico Usai <[email protected]>
Signed-off-by: Rex <[email protected]>
enrico-usai pushed a commit to enrico-usai/aws-parallelcluster-cookbook that referenced this issue Nov 4, 2020
# Packer and kitchen

* Create packer file for CentOS8 with similar logic to CentOS7
* Using `dnf` instead of `yum` as package manager everywhere
* Add kitchen tests for CentOS 8
* Install `python3` version of `aws-cfn-bootstrap` scripts to support CentOS8

# Basic features

* Enable `PowerTools` repo so `*-devel` packages can be installed with DNF
* Install Python3 as system python for CentOS 8
* Do not enforce `kernel_devel` version because `kernel_devel` package with same version as kernel release version cannot be found
* Install `iptables`
* Enable `EPEL` repo by default

# IntelMPI

The `environment-modules` package installation automatically creates
the `/usr/share/Modules/` folder, required by the intel_mpi recipe.

References:
* https://forums.centos.org/viewtopic.php?t=74035

# NFS

* Fix nfs logic in `base_install` by calling `nfs::server4` recipe and providing correct idmap service name, `nfs-idmapd`
* Workaround to only run `nfs::server` instead of `nfs::server4` for CentOS 8 due to issue: sous-chefs/nfs#116

# EBS

* Modify logic to get EBS device to volume id mapping.
  Specifically `ec2_dev_2_volid.py` and `parallelcluster-ebsnvme-id` are modified for CentOS 8
  to use `nvme-cli` to retrieve volume id for a device following this guide:
  https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/nvme-ebs-volumes.html#identify-nvme-ebs-device
* `parallelcluster-ebsnvme-id` needs to accept the options `-v/-b/-u` to output volume id and block device
  information when called from `ec2_dev_2_volid.py` and `attachVolume.py`
* Centos8 specific `ec2_dev_2_volid.py` no needed, as new `parallelcluster-ebsnvme-id` script will accept `-v` option to output volume id

# DNS

* Configure DNS settings for CentOS 8. Note dhclient is not enabled by default, so need to provide modified NetworkManager config. Afterwards same logic as CentOS 7 can be used

# Torque

I'm using some compilation flags, like we're already doing for Ubuntu 18 and Amazon Linux 2.

* `c++03` is the 1998 ISO C++ standard plus the 2003 technical corrigendum and some additional defect reports.
  Note: The compilation succeeded even without the `c++03` flag. I'm adding it for coherency.
  Source: https://gcc.gnu.org/onlinedocs/gcc/C-Dialect-Options.html
* `-fpermissive` downgrades some diagnostics about nonconformant code from errors to warnings.
  Source: https://gcc.gnu.org/onlinedocs/gcc-4.0.4/gcc/C_002b_002b-Dialect-Options.html#index-fpermissive-140

# FSx

* Use `package` in place of `yum_package` to support `dnf`.
* Added `gdisk` package, required by `update initramfs` action, called by `kernel_module 'lnet'` resource.
* Use `['platform_version'].to_f`  to compare minor version of the OS in place of `['platform_version'].split('.')[1].to_i`
  to be compatible with multiple major OS values.
* Explicitly added `x86_64` at the end of the `base_url` parameter, like we have for CentOS 7.
* Improved check for CentOS 7.5. and 7.6.
* Use `package` in place of `apt_package` to be aligned to other OSes.

References:
* Lustre installation guide: https://docs.aws.amazon.com/fsx/latest/LustreGuide/install-lustre-client.html
* gdisk issue: https://www.spinics.net/lists/centos-devel/msg18766.html
* package resource: https://docs.chef.io/resources/package/

# NICE DCV

* Disable Wayland, the default GNOME Display Manager for CentOS 8, that is not supported by DCV
* Add default value for dcv_port
* Add Centos8 to pcluster_dcv_connect.sh

## Selinux notes
Default SELinux policies in RHEL8 can lead to failures in `xdm` processes using NVIDIA drivers/libraries,
thus for example the `gnome-shell` and the `dcv` system agent (being children of `gdm`) can be impacted.
By default SELinux is disabled so it doesn't impact our installation.
If SELinux is re-enabled, custom policies must be defined to grant `xdm_t` processes the permissions they need.

## Tests
* Verified gdm configuration file
* Started session and listed

## References
DCV guide: https://docs.aws.amazon.com/dcv/latest/adminguide/setting-up-installing-linux-server.html

# CloudWatch logging

* Use RedHat ARM cloudwatch agent, since there is no official Centos8 ARM agent available

# Intel HPC

* Prepare recipe to install Intel HPC packages on CentOS 8
* Install Intel PSXE 2020, which does not require yum4 to be used
* Set `keepcache=True` so downloaded packages are not removed after successful installation of any package
* Add retries to Intel install
* The recipes are ready but IntelHPC is not officially supported
  because Centos 8 is not supported by Intel(R) Cluster Checker 2019 Update 9 (build 20200609).
* The tool is looking for `libstdc++.so.5` and is unable to detect `libstdc++.so.6` present in Centos8

The current status is: the packages can be installed and the
recipes are ready to be used but cluster checker doesn't support CentOS 8.

## References
* https://software.intel.com/content/www/us/en/develop/tools/parallel-studio-xe/choose-download/free-trial-cluster-linux.html
* https://software.intel.com/content/www/us/en/develop/documentation/cluster-checker-user-guide/top/installation.html

# RAID

Use version-1 superblock format for RAID on Centos8.

Linux raid reserves a bit of space (called a superblock) on each component device.
This space holds metadata about the RAID device and allows correct assembly of the array.

The Linux kernel RAID subsystem recognizes version-0.90 and version-1 Superblock formats.

Old Linux Kernels can only autodetect arrays with superblock version 0.90.
The older version-0.90 used to be the default format until 2009 but it has several limitations
that limit its applicability for use on large arrays or arrays with many component devices.
The newer version-1 is the default as of Kernel v3.1.1. More specifically, 1.2 is used as of v3.1.2.

The default value for the `metadata` property of `mdadm` chef resource is `0.90`
and it causes failures on Centos8. We're changing this value to `1.2`.

## References
* https://raid.wiki.kernel.org/index.php/Superblock
* https://raid.wiki.kernel.org/index.php/RAID_superblock_formats
* https://man7.org/linux/man-pages/man4/md.4.html
* https://docs.chef.io/resources/mdadm/

# SGE

## SGE code patches
1. Patch to the source code of SGE to support OpenSSL 1.1 that is replacing OpenSSL 1.0 in CentOS8
1. Patch for TCSH 3rd party library included on SGE to support newer versions of glibc.
1. Patch for gmake 3rd party library to build with newer versions of automake

Source:
* OpenSSL: https://copr-dist-git.fedorainfracloud.org/cgit/loveshack/SGE/gridengine.git/tree/?h=epel8
* TCSH: https://copr-dist-git.fedorainfracloud.org/cgit/loveshack/SGE/gridengine.git/tree/sge-tcsh.patch?h=epel8
* Qmake: https://copr-dist-git.fedorainfracloud.org/cgit/loveshack/SGE/gridengine.git/tree/sge-qmake.patch?h=epel8

## Compilation flags
I'm installing libtirpc and libtirpc-devel libraries and using compilation flags because the default path
is `/usr/include/` instead of `/usr/local/include`

To pass these compilation flags to `aimk` it's required to use `SGE_INPUT_CFLAGS` and `SGE_INPUT_LDFLAGS`
as described in Aimk documentation: https://arc.liv.ac.uk/trac/SGE/browser/sge/source/README.aimk

# EFA

Mark CentOS 8 as unsupported OS for EFA. Supported AMIs are: Amazon Linux, Amazon Linux 2,
RHEL 7.6, RHEL 7.7, RHEL 7.8, CentOS 7, Ubuntu 16.04, and Ubuntu 18.04.
Source: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa-working-with.html

Signed-off-by: Enrico Usai <[email protected]>
Signed-off-by: Rex <[email protected]>
enrico-usai added a commit to enrico-usai/aws-parallelcluster-cookbook that referenced this issue Nov 4, 2020
* Enable `PowerTools` repo so `*-devel` packages can be installed with DNF
* Install Python3 as system python for CentOS 8
* Do not enforce `kernel_devel` version because `kernel_devel` package with same version as kernel release version cannot be found
* Install `iptables`
* Enable `EPEL` repo by default

# IntelMPI

The `environment-modules` package installation automatically creates
the `/usr/share/Modules/` folder, required by the `intel_mpi` recipe.

References:
* https://forums.centos.org/viewtopic.php?t=74035

# NFS

* Fix nfs logic in base_install by calling `nfs::server4` recipe and providing correct idmap service name, `nfs-idmapd`
* Workaround to only run `nfs::server` instead of `nfs::server4` for CentOS 8 due to issue: sous-chefs/nfs#116

# EBS

* Modify logic to get EBS device to volume id mapping.
  Specifically ec2_dev_2_volid.py and parallelcluster-ebsnvme-id are modified for CentOS 8 to use nvme-cli to retrieve volume id for a device following this guide:
  https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/nvme-ebs-volumes.html#identify-nvme-ebs-device
* `parallelcluster-ebsnvme-id` needs to accept the options `-v/-b/-u` to output volume id and block device
  information when called from `ec2_dev_2_volid.py` and `attachVolume.py`
* Centos8 specific `ec2_dev_2_volid.py` no needed, as new `parallelcluster-ebsnvme-id` script will accept `-v` option to output volume id

# DNS

* Configure DNS settings for CentOS 8.
  Note `dhclient` is not enabled by default, so need to provide modified NetworkManager config. Afterwards same logic as CentOS 7 can be used

# Torque

I'm using some compilation flags, like we're already doing for Ubuntu 18 and Amazon Linux 2.
* `c++03` is the 1998 ISO C++ standard plus the 2003 technical corrigendum and some additional defect reports.
  Note: The compilation succeeded even without the `c++03` flag. I'm adding it for coherency.
  Source: https://gcc.gnu.org/onlinedocs/gcc/C-Dialect-Options.html
* `-fpermissive` downgrades some diagnostics about nonconformant code from errors to warnings.
Source: https://gcc.gnu.org/onlinedocs/gcc-4.0.4/gcc/C_002b_002b-Dialect-Options.html#index-fpermissive-140

# FSx

* Use `package` in place of `yum_package` to support `dnf`.
* Added `gdisk` package, required by `update initramfs` action, called by `kernel_module 'lnet'` resource.
* Use `['platform_version'].to_f`  to compare minor version of the OS in place of `['platform_version'].split('.')[1].to_i`
  to be compatible with multiple major OS values.
* Explicitly added `x86_64` at the end of the `base_url` parameter, like we have for CentOS 7.
* Improved check for CentOS 7.5. and 7.6.
* Use `package` in place of `apt_package` to be aligned to other OSes.

References:
* Lustre installation guide: https://docs.aws.amazon.com/fsx/latest/LustreGuide/install-lustre-client.html
* gdisk issue: https://www.spinics.net/lists/centos-devel/msg18766.html
* package resource: https://docs.chef.io/resources/package/

# EFA

Mark CentOS 8 as unsupported OS for EFA. Supported AMIs are: Amazon Linux, Amazon Linux 2,
RHEL 7.6, RHEL 7.7, RHEL 7.8, CentOS 7, Ubuntu 16.04, and Ubuntu 18.04.
Source: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa-working-with.html

# CloudWatch

* Use RedHat ARM cloudwatch agent, since there is no official Centos8 ARM agent available

Signed-off-by: Enrico Usai <[email protected]>
Signed-off-by: Rex <[email protected]>
enrico-usai added a commit to aws/aws-parallelcluster-cookbook that referenced this issue Nov 4, 2020
* Enable `PowerTools` repo so `*-devel` packages can be installed with DNF
* Install Python3 as system python for CentOS 8
* Do not enforce `kernel_devel` version because `kernel_devel` package with same version as kernel release version cannot be found
* Install `iptables`
* Enable `EPEL` repo by default

# IntelMPI

The `environment-modules` package installation automatically creates
the `/usr/share/Modules/` folder, required by the `intel_mpi` recipe.

References:
* https://forums.centos.org/viewtopic.php?t=74035

# NFS

* Fix nfs logic in base_install by calling `nfs::server4` recipe and providing correct idmap service name, `nfs-idmapd`
* Workaround to only run `nfs::server` instead of `nfs::server4` for CentOS 8 due to issue: sous-chefs/nfs#116

# EBS

* Modify logic to get EBS device to volume id mapping.
  Specifically ec2_dev_2_volid.py and parallelcluster-ebsnvme-id are modified for CentOS 8 to use nvme-cli to retrieve volume id for a device following this guide:
  https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/nvme-ebs-volumes.html#identify-nvme-ebs-device
* `parallelcluster-ebsnvme-id` needs to accept the options `-v/-b/-u` to output volume id and block device
  information when called from `ec2_dev_2_volid.py` and `attachVolume.py`
* Centos8 specific `ec2_dev_2_volid.py` no needed, as new `parallelcluster-ebsnvme-id` script will accept `-v` option to output volume id

# DNS

* Configure DNS settings for CentOS 8.
  Note `dhclient` is not enabled by default, so need to provide modified NetworkManager config. Afterwards same logic as CentOS 7 can be used

# Torque

I'm using some compilation flags, like we're already doing for Ubuntu 18 and Amazon Linux 2.
* `c++03` is the 1998 ISO C++ standard plus the 2003 technical corrigendum and some additional defect reports.
  Note: The compilation succeeded even without the `c++03` flag. I'm adding it for coherency.
  Source: https://gcc.gnu.org/onlinedocs/gcc/C-Dialect-Options.html
* `-fpermissive` downgrades some diagnostics about nonconformant code from errors to warnings.
Source: https://gcc.gnu.org/onlinedocs/gcc-4.0.4/gcc/C_002b_002b-Dialect-Options.html#index-fpermissive-140

# FSx

* Use `package` in place of `yum_package` to support `dnf`.
* Added `gdisk` package, required by `update initramfs` action, called by `kernel_module 'lnet'` resource.
* Use `['platform_version'].to_f`  to compare minor version of the OS in place of `['platform_version'].split('.')[1].to_i`
  to be compatible with multiple major OS values.
* Explicitly added `x86_64` at the end of the `base_url` parameter, like we have for CentOS 7.
* Improved check for CentOS 7.5. and 7.6.
* Use `package` in place of `apt_package` to be aligned to other OSes.

References:
* Lustre installation guide: https://docs.aws.amazon.com/fsx/latest/LustreGuide/install-lustre-client.html
* gdisk issue: https://www.spinics.net/lists/centos-devel/msg18766.html
* package resource: https://docs.chef.io/resources/package/

# EFA

Mark CentOS 8 as unsupported OS for EFA. Supported AMIs are: Amazon Linux, Amazon Linux 2,
RHEL 7.6, RHEL 7.7, RHEL 7.8, CentOS 7, Ubuntu 16.04, and Ubuntu 18.04.
Source: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa-working-with.html

# CloudWatch

* Use RedHat ARM cloudwatch agent, since there is no official Centos8 ARM agent available

Signed-off-by: Enrico Usai <[email protected]>
Signed-off-by: Rex <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant