diff --git a/src/contents.rst b/src/contents.rst index be77cdf..b89f8ad 100644 --- a/src/contents.rst +++ b/src/contents.rst @@ -93,6 +93,7 @@ Table of Contents Basics Backend SSSD Errors + Performance Tuning in SSSD Log Analyzer Fleet Commander SUDO diff --git a/src/troubleshooting/performance.rst b/src/troubleshooting/performance.rst new file mode 100644 index 0000000..1797a8b --- /dev/null +++ b/src/troubleshooting/performance.rst @@ -0,0 +1,52 @@ +Performance tuning SSSD +####################### + +Slow id lookup +************** +This has been noticed id lookup become slow if the LDAP/AD user is a member of large groups say for example user is a member of 300+ groups. ``id`` is very heavy. ``id`` does a lot under its hood. +Behind the scenes, when the ``id $user`` command is executed it triggers the following: + +- Get user information - getpwnam() for the user + +- Get primary group information - getgrgid() for the primary group of the user + +- Get list of groups - getgrouplist() for the user + +- Get group information for each group returned from step 3 - getgrid() for all GIDs returned from getgrouplist() call. + +We need to identify out of the above 4 steps which step is actually slow. In order to collect detailed infromation we need to add ``debug_level = 9`` under the ``[$domain]`` section of the ``/etc/sssd/sssd.conf`` followed by a ``sssd`` restart. We often noticed step 4 is the step where sssd takes most of its time because the most data-intensive operation is downloading the groups including their members and by default this feature is enabled we can turn this off by setting ``ignore_group_members = true``. +Usually, we are interested in what groups a user is a member of (id aduser@ad_domain) as the initial step rather than what members do specific groups include (getent group adgroup@ad_domain). Setting the ignore_group_members option to True makes all groups appear as empty, thus downloading only information about the group objects themselves and not their members, providing a significant performance boost. Please note that id aduser@ad_domain would still return all the correct groups. + +- Pros: getgrnam/getgrgid calls are significantly faster. +- Cons: getgrnam/getgrgid calls only return the group information, not the members + +**WARNING! If the compat tree is used, DO NOT SET ignore_group_members = true because it breaks the compatibility logic.** + +If after disbaling the group_members call still the look is slow in that case we can get into the logs and verify how long the ``Initgroups`` call is taking this can be done by grepping the ``CR`` no. of that id lookup request. In this example here ``sssd_nss`` is taking ``1 sec`` to process the user group membership here we have only 39 groups associated with the user if the count is large say for example 300-400 and the ``ignore_group_members`` is not to set true then this is expected the id lookup will take some time with the cold cache. + +.. code-block:: sssd-log + + $ grep 'CR #3\:' /var/log/sssd/sssd_nss.log + (2023-06-08 12:21:31): [nss] [cache_req_set_plugin] (0x2000): CR #3: Setting "Initgroups by name" plugin + (2023-06-08 12:21:31): [nss] [cache_req_send] (0x0400): CR #3: New request 'Initgroups by name' + (2023-06-08 12:21:31): [nss] [cache_req_process_input] (0x0400): CR #3: Parsing input name [roy] + (2023-06-08 12:21:31): [nss] [cache_req_set_name] (0x0400): CR #3: Setting name [roy] + (2023-06-08 12:21:31): [nss] [cache_req_select_domains] (0x0400): CR #3: Performing a multi-domain search + (2023-06-08 12:21:31): [nss] [cache_req_search_domains] (0x0400): CR #3: Search will check the cache and check the data provider + (2023-06-08 12:21:31): [nss] [cache_req_set_domain] (0x0400): CR #3: Using domain [redhat.com] + (2023-06-08 12:21:31): [nss] [cache_req_prepare_domain_data] (0x0400): CR #3: Preparing input data for domain [redhat.com] rules + (2023-06-08 12:21:31): [nss] [cache_req_search_send] (0x0400): CR #3: Looking up roy@redhat.com + (2023-06-08 12:21:31): [nss] [cache_req_search_ncache] (0x0400): CR #3: Checking negative cache for [roy@redhat.com] + (2023-06-08 12:21:31): [nss] [cache_req_search_ncache] (0x0400): CR #3: [roy@redhat.com] is not present in negative cache + (2023-06-08 12:21:31): [nss] [cache_req_search_cache] (0x0400): CR #3: Looking up [roy@redhat.com] in cache + (2023-06-08 12:21:31): [nss] [cache_req_search_send] (0x0400): CR #3: Object found, but needs to be refreshed. + (2023-06-08 12:21:31): [nss] [cache_req_search_dp] (0x0400): CR #3: Looking up [roy@redhat.com] in data provider + (2023-06-08 12:21:32): [nss] [cache_req_search_cache] (0x0400): CR #3: Looking up [roy@redhat.com] in cache + (2023-06-08 12:21:32): [nss] [cache_req_search_ncache_filter] (0x0400): CR #3: This request type does not support filtering result by negative cache + (2023-06-08 12:21:32): [nss] [cache_req_search_done] (0x0400): CR #3: Returning updated object [roy@redhat.com] + (2023-06-08 12:21:32): [nss] [cache_req_create_and_add_result] (0x0400): CR #3: Found 39 entries in domain redhat.com <--------- + (2023-06-08 12:21:32): [nss] [cache_req_done] (0x0400): CR #3: Finished: Success + +The above troubleshooting can be done easily with ``sssctl analyze request list`` and ``sssctl analyze request show ``. For more details, please refer to :doc:`Log Analyzer `. + +Moreover, if the initgroup call is taking time we could try to check in ``sssd_nss.log`` and ``sssd_$domain.log`` where it's spending time. In ``sssd_$domain_.log`` start chaecking from ``[BE_USER]`` it will show you if sssd can retrieve the user after that sssd will start processing groups of the respective user which can be seen in ``[Initgroups]`` call in ``sssd_$domain.log``.By looking at this call you will be able to confirm if the user lookup and processing of membership is done properly. Here you will able see where the lookup is taking time or if there is any failure.