Skip to content

Backport request: sdap: eliminate O(N²) loop in sdap_add_incomplete_groups() → sssd-2-9-4, fix for denial of service for big caches #8716

@karlg100

Description

@karlg100

Summary

Please backport upstream commit f91c7bb ("sdap: eliminate O(N²) loop in
sdap_add_incomplete_groups()") to the sssd-2-9-4 branch. This fix addresses a
significant performance regression visible on RHEL 8 deployments running sssd 2.9.4
in environments with large numbers of LDAP groups.

It's my belief this is a major contributing factor to the WATCHDOG death spiral causing
denial of service to users once the cache reaches a certain size. New writes to the cache
on login cause big delays, triggering the WATCHDOG to terminate the process, relaunching
and never allowing the user to login.


Problem

sdap_add_incomplete_groups() uses a two-pass design:

  1. Walk sysdb_groupnames[], check each against sysdb, build a missing[] list.
  2. For each missing group, scan the entire ldap_groups[] array again to find matching LDAP attributes.

When the cache is cold (e.g., after a service restart or cache flush), every group is
missing, so the inner scan runs N times over an N-element array — O(N²) complexity.
In production environments with hundreds or thousands of groups per user this causes
id / getent lookups to stall noticeably and can cause timeouts that propagate to
PAM/SSH authentication.

The upstream commit message also notes a secondary effect: the repeated
sdap_get_group_primary_name() calls inside the inner loop caused excess talloc
allocations that trashed ldb/tdb cache pages, slowing the subsequent
sysdb_update_members() call in sdap_initgr_common_store() as well.


Upstream fix

Commit f91c7bb (merged 2026-02-16, authored by Alexey Tikhonov, reviewed by
Justin Stephenson and Sumit Bose) replaces the two-pass design with a single O(N)
loop
that iterates ldap_groups[] directly: for each entry, check sysdb and, if
missing, create the incomplete entry immediately. The sysdb_groupnames parameter is
removed as it is no longer needed.

Files changed:

File Change
src/providers/ldap/sdap_async_groups.c Remove sysdb_groupnamelist call-site
src/providers/ldap/sdap_async_initgroups.c Restructure main loop
src/providers/ldap/sdap_async_private.h Drop sysdb_groupnames from prototype

Why sssd-2-9-4 specifically

RHEL 8 ships sssd 2.9.4 and will not receive newer minor versions. Without a
backport to the sssd-2-9-4 branch, this fix cannot reach RHEL 8 users through a
normal errata update. The fix is a pure algorithmic improvement with no new
dependencies or behavior changes — it is a low-risk candidate for backport.


Proof-of-concept backport

I have already attempted/applied this patch to sssd-2-9-4 locally. The cherry-pick required
a manual conflict resolution in sdap_async_initgroups.c (the 2-9-4 branch carries a
slightly different structure in the if (use_id_mapping) block), but the resolved
result is semantically identical to the upstream version. I'm working to test on my RHEL8 test
system and perform the same 10k lookups and measure performance. If the team prefers
another back port, happy to test.


References

  • Upstream commit: f91c7bbc38e41eeb31f2132acc7263bd4ac9d47c
  • Target branch: sssd-2-9-4

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions