Skip to content

Commit

Permalink
TROUBLESHOOTING: Performance tuning in sssd
Browse files Browse the repository at this point in the history
This page will describe the performance related issues in sssd and how
to troubleshoot those issues.
  • Loading branch information
Roy214 committed Aug 24, 2023
1 parent bb6c855 commit 9ca9cbb
Show file tree
Hide file tree
Showing 2 changed files with 53 additions and 0 deletions.
1 change: 1 addition & 0 deletions src/contents.rst
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,7 @@ Table of Contents
Basics <troubleshooting/basics>
Backend <troubleshooting/backend>
SSSD Errors <troubleshooting/errors>
Performance Tuning in SSSD <troubleshooting/performance>
Log Analyzer <troubleshooting/analyzer>
Fleet Commander <troubleshooting/fleet-commander>
SUDO <troubleshooting/sudo>
Expand Down
52 changes: 52 additions & 0 deletions src/troubleshooting/performance.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
Performance tuning SSSD
#######################

Slow id lookup
**************
This has been noticed id lookup become slow if the LDAP/AD user is a member of large groups say for example user is a member of 300+ groups. ``id`` is very heavy. ``id`` does a lot under its hood.
Behind the scenes, when the ``id $user`` command is executed it triggers the following:

- Get user information - getpwnam() for the user

- Get primary group information - getgrgid() for the primary group of the user

- Get list of groups - getgrouplist() for the user

- Get group information for each group returned from step 3 - getgrid() for all GIDs returned from getgrouplist() call.

We need to identify out of the above 4 steps which step is actually slow. In order to collect detailed infromation we need to add ``debug_level = 9`` under the ``[$domain]`` section of the ``/etc/sssd/sssd.conf`` followed by a ``sssd`` restart. We often noticed step 4 is the step where sssd takes most of its time because the most data-intensive operation is downloading the groups including their members and by default this feature is enabled we can turn this off by setting ``ignore_group_members = true``.
Usually, we are interested in what groups a user is a member of (id aduser@ad_domain) as the initial step rather than what members do specific groups include (getent group adgroup@ad_domain). Setting the ignore_group_members option to True makes all groups appear as empty, thus downloading only information about the group objects themselves and not their members, providing a significant performance boost. Please note that id aduser@ad_domain would still return all the correct groups.

- Pros: getgrnam/getgrgid calls are significantly faster.
- Cons: getgrnam/getgrgid calls only return the group information, not the members

**WARNING! If the compat tree is used, DO NOT SET ignore_group_members = true because it breaks the compatibility logic.**

If after disbaling the group_members call still the look is slow in that case we can get into the logs and verify how long the ``Initgroups`` call is taking this can be done by grepping the ``CR`` no. of that id lookup request. In this example here ``sssd_nss`` is taking ``1 sec`` to process the user group membership here we have only 39 groups associated with the user if the count is large say for example 300-400 and the ``ignore_group_members`` is not to set true then this is expected the id lookup will take some time with the cold cache.

.. code-block:: sssd-log
$ grep 'CR #3\:' /var/log/sssd/sssd_nss.log
(2023-06-08 12:21:31): [nss] [cache_req_set_plugin] (0x2000): CR #3: Setting "Initgroups by name" plugin
(2023-06-08 12:21:31): [nss] [cache_req_send] (0x0400): CR #3: New request 'Initgroups by name'
(2023-06-08 12:21:31): [nss] [cache_req_process_input] (0x0400): CR #3: Parsing input name [roy]
(2023-06-08 12:21:31): [nss] [cache_req_set_name] (0x0400): CR #3: Setting name [roy]
(2023-06-08 12:21:31): [nss] [cache_req_select_domains] (0x0400): CR #3: Performing a multi-domain search
(2023-06-08 12:21:31): [nss] [cache_req_search_domains] (0x0400): CR #3: Search will check the cache and check the data provider
(2023-06-08 12:21:31): [nss] [cache_req_set_domain] (0x0400): CR #3: Using domain [redhat.com]
(2023-06-08 12:21:31): [nss] [cache_req_prepare_domain_data] (0x0400): CR #3: Preparing input data for domain [redhat.com] rules
(2023-06-08 12:21:31): [nss] [cache_req_search_send] (0x0400): CR #3: Looking up [email protected]
(2023-06-08 12:21:31): [nss] [cache_req_search_ncache] (0x0400): CR #3: Checking negative cache for [[email protected]]
(2023-06-08 12:21:31): [nss] [cache_req_search_ncache] (0x0400): CR #3: [[email protected]] is not present in negative cache
(2023-06-08 12:21:31): [nss] [cache_req_search_cache] (0x0400): CR #3: Looking up [[email protected]] in cache
(2023-06-08 12:21:31): [nss] [cache_req_search_send] (0x0400): CR #3: Object found, but needs to be refreshed.
(2023-06-08 12:21:31): [nss] [cache_req_search_dp] (0x0400): CR #3: Looking up [[email protected]] in data provider
(2023-06-08 12:21:32): [nss] [cache_req_search_cache] (0x0400): CR #3: Looking up [[email protected]] in cache
(2023-06-08 12:21:32): [nss] [cache_req_search_ncache_filter] (0x0400): CR #3: This request type does not support filtering result by negative cache
(2023-06-08 12:21:32): [nss] [cache_req_search_done] (0x0400): CR #3: Returning updated object [[email protected]]
(2023-06-08 12:21:32): [nss] [cache_req_create_and_add_result] (0x0400): CR #3: Found 39 entries in domain redhat.com <---------
(2023-06-08 12:21:32): [nss] [cache_req_done] (0x0400): CR #3: Finished: Success
The above troubleshooting can be done easily with ``sssctl analyze request list`` and ``sssctl analyze request show <request_no>``. For more details, please refer to :doc:`Log Analyzer <analyzer>`.

Moreover, if the initgroup call is taking time we could try to check in ``sssd_nss.log`` and ``sssd_$domain.log`` where it's spending time. In ``sssd_$domain_.log`` start chaecking from ``[BE_USER]`` it will show you if sssd can retrieve the user after that sssd will start processing groups of the respective user which can be seen in ``[Initgroups]`` call in ``sssd_$domain.log``.By looking at this call you will be able to confirm if the user lookup and processing of membership is done properly. Here you will able see where the lookup is taking time or if there is any failure.

0 comments on commit 9ca9cbb

Please sign in to comment.