Syncing latest changes from upstream master for drbd by df-build-team · Pull Request #3 · red-hat-storage/drbd

df-build-team · 2026-04-29T08:08:21Z

PR containing the latest commits from upstream master branch

Signed-off-by: Roland Kammerer <roland.kammerer@linbit.com>

DRBD used a custom mechanism to mark netlink attributes as "mandatory": bit 14 of nla_type was repurposed as DRBD_GENLA_F_MANDATORY. Attributes sent from userspace that had this bit present and that were unknown to the kernel would lead to an error. Since Linux 5.2 commit ef6243acb478 ("genetlink: optionally validate strictly/dumps"), the generic netlink layer rejects unknown top-level attributes when strict validation is enabled. DRBD never opted out of strict validation, so unknown top-level attributes are already rejected by the netlink core. The mandatory flag mechanism was required for nested attributes, because these are parsed liberally, silently dropping attributes unknown to the kernel. This prepares for the move to a new YNL-based family, which will use the now-default strict parsing. The current family is not expected to gain any new (mandatory) attributes, which makes this change safe. Old userspace that still sets bit 14 is unaffected: nla_type() strips it before __nla_validate_parse() performs attribute validation, so the bit never reaches DRBD. Remove all references to the mandatory flag in DRBD. Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>

Some of these are *ancient*, the oldest ones (debian/*) have not been touched since 2004! Some of the rpm-macro-fixes turn 16 this year, and they were only useful on very old distributions. The redhat/suse filelists were superseded by a spec macro in 2023. The kernel-compat tests and patches are unused, they just slipped through the usual unused checks somehow. Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>

Remove verbose boilerplates from leading comments in all files. Only the SPDX-License-Identifier and a copyright notice stay. The copyright year is the year when the work was originally created, so we take the file creation date from git for that. Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>

A user running DRBD 9.3.1 in 8.4-compatibility mode on a 6 TiB volume hit "ASSERTION bitmap->bm_pages FAILED in __bm_op" after enabling d_bitmap on a live filesystem. The volume had been attached with d_bitmap=no (no peers). About 86 s into ext4 traffic, "drbdsetup disk-options --d_bitmap=yes" reached drbd_adm_disk_opts(), which did: device->bitmap = drbd_bm_alloc(...); /* publishes pointer */ err = drbd_bm_resize(device, ..., true); /* allocates bm_pages */ drbd_bm_alloc() returns a struct with bm_pages == NULL; bm_pages is wired up only inside the spin-locked block of drbd_bm_resize(). Concurrent IO in __bm_op() observed device->bitmap != NULL with bitmap->bm_pages == NULL and tripped the assertion. Fix this by keeping the new bitmap in a local until drbd_bm_resize() has populated bm_pages, then publishing it via smp_store_release(). To make that possible, thread an explicit struct drbd_bitmap * through drbd_bm_resize() and the inner helpers it calls (____bm_op, __bm_op, bm_op, ___bm_op, bm_realloc_pages, bm_count_bits, _drbd_bm_lock / _drbd_bm_unlock) instead of re-reading device->bitmap. Public wrappers keep their device-only signatures and fetch device->bitmap once. The intent is to make publication of device->bitmap atomic with respect to readers and to codify the invariant "device->bitmap != NULL implies bm_pages != NULL", so the reasonable assertion in __bm_op() can stay. The other two drbd_bm_resize() callers (drbd_determine_dev_size() and drbd_adm_attach()) operate on an already-published bitmap and just pass device->bitmap through the new parameter; behaviour there is unchanged. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>

drbd_adm_attach() has the same shape as the disk-options bug fixed in the previous commit: drbd_bm_alloc() returns a struct with bm_pages == NULL, and the pointer is published into device->bitmap immediately, long before drbd_bm_resize() (called either at the on-disk-bitmap-read step or, when there is no prior history, inside drbd_determine_dev_size()) wires up bm_pages. In practice this window is quiescent in the attach path -- the device is still D_ATTACHING, no peer is connected, and no IO is reaching the bitmap -- so the assertion in __bm_op() has not been observed there. But the pattern is fragile; defend against future regressions by applying the same fix shape. Fix this by keeping the new bitmap in a local until drbd_bm_resize() has populated bm_pages. Mirroring drbd_adm_disk_opts(). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>

Oops, turns out this was still referenced in our build system, so SUSE builds are broken now. Partial revert of d107002 (drbd: housekeeping, remove some unused files). Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>

The approach using `override EXTRA_CFLAGS +=` does not work on some modern kernels. For instance 7.0.0-14-generic from Ubuntu Resolute / 26.04. This is a partial backport of commit 4d2b2ac ("drbd: add fault injection and debugfs for bio chaining"). Signed-off-by: Joel Colledge <joel.colledge@linbit.com>

Today an arbitrary number of volumes can resync simultaneously, slicing disk and network bandwidth thinly across all of them. On a host with many volumes sharing the same physical disks or network link, total time-to-in-sync suffers. Introduce a runtime-changeable module parameter max_parallel_resyncs that caps how many volumes may have an unpaused resync (or verify) at any given time. The default of 0 preserves existing behaviour (unlimited). Counting is per-volume (struct drbd_device): multiple peer_devices of the same volume that resync in parallel still occupy a single slot. L_SYNC_S/T and L_VERIFY_S/T count toward the cap; their L_PAUSED_* counterparts do not. When admitting paused volumes, prefer volumes whose resource already has a running resync. That way we finish all volumes of one resource before starting volumes of the next, instead of spreading the resync of every resource thinly across the cluster. Reuse the existing resync-pause mechanism by adding a new pause-reason flag resync_susp_max_parallel[] alongside the existing user / peer / dependency / other_c flags. The flag plumbs through the state-change snapshot, the resync_suspended() aggregator that maps L_SYNC_* to L_PAUSED_SYNC_*, the change-detection notifier path, and the per- peer_device suspension-reason printer (shows "max_parallel,"). Re-evaluation runs in three places: on every state change that finishes a resync (right next to the existing resume_next_sg() call), on every write to /sys/module/drbd/parameters/max_parallel_resyncs, and inline in drbd_start_resync() so the very first transition lands in L_PAUSED_SYNC_* instead of flicking through L_SYNC_* first. The flag is a local admission-control decision; it is not propagated to peers, so the wire protocol is unchanged. Each node makes its own admission decision based on its own max_parallel_resyncs. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>

This lets drbdsetup status display the parallel-resync suspension explicitly, alongside the other resync_susp_* reasons. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>

The compat patches are concatenated into a single .compat.cocci before spatch processes them, so metavariable names declared in one rule leak into the parser state for all subsequent rules. This produced a flood of "previously declared as a metavariable, is used as an identifier" and "should X be a metavariable?" warnings whenever a later rule used a name (bdev, device, info, skb, order) that an earlier rule had bound. Declare these names with 'symbol' (or, for the handle->bdev struct field accesses where 'symbol' does not silence the warning, a regex-constrained 'identifier bdev =~ "^bdev$"') so spatch knows they are literal C identifiers in this rule. Also drop the unused 'e' and 'lim' metavariables from queue_limits_features and add the missing 'identifier device' to the first rule of queue_flag_discard. No functional change to the generated patches. Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>

Upstream Linux v6.12 commit aaed5c7739be ("kbuild: slim down package for building external modules") removed the explicit copy of .config into the linux-headers Debian package built by make bindeb-pkg. With that change, $(objtree)/.config is absent in installed headers for any kernel produced via stock bindeb-pkg, and the date -r / touch -r calls that drive the compat-h freshness check abort the build. include/config/auto.conf is regenerated from .config on every kconfig change, is part of the public kbuild interface, and is shipped by every kernel-headers package (Ubuntu, Debian, RHEL kernel-devel, and stock bindeb-pkg all include it). Use it as the freshness reference instead. Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>

Consider the following configuration: * DRBD Nodes A, B, C * Dedicated DRBD proxy nodes P, Q * Direct connection A-B * Proxy connections A-P-Q-C and B-P-Q-C Now there are 2 connections "Q-C" from the proxy inside to DRBD. Prior to this change, if the same port is used on C for each of these connections, then DRBD on C cannot distinguish the incoming TCP connections. More generally, when two paths share the same listener and the same peer IP, the old IP-only lookup always returned the first matching path. Incoming sockets for the second path were then silently misrouted to the first, causing "Peer presented a node_id of X instead of Y" errors and wait-connect timeouts. When searching for the path for an incoming connection, prefer an exact (IP, port) match. A corresponding change to DRBD proxy causes it to bind outgoing inside connections to the configured port. The result is that DRBD can reliably distinguish the incoming connections. Direct DRBD-to-DRBD peers use an ephemeral source port so they continue to match via the IP-only fallback, preserving existing behaviour for all configurations. Signed-off-by: Joel Colledge <joel.colledge@linbit.com>

…ags-y' Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>

…nfig/auto.conf, not .config' Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>

…rbd_find_path_by_addr()' Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>

Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>

Today an arbitrary number of volumes can resync simultaneously, slicing disk and network bandwidth thinly across all of them. On a host with many volumes sharing the same physical disks or network link, total time-to-in-sync suffers. Introduce a runtime-changeable module parameter max_parallel_resyncs that caps how many volumes may have an unpaused resync (or verify) at any given time. The default of 0 preserves existing behaviour (unlimited). Counting is per-volume (struct drbd_device): multiple peer_devices of the same volume that resync in parallel still occupy a single slot. L_SYNC_S/T and L_VERIFY_S/T count toward the cap; their L_PAUSED_* counterparts do not. When admitting paused volumes, prefer volumes whose resource already has a running resync. That way we finish all volumes of one resource before starting volumes of the next, instead of spreading the resync of every resource thinly across the cluster. Reuse the existing resync-pause mechanism by adding a new pause-reason flag resync_susp_max_parallel[] alongside the existing user / peer / dependency / other_c flags. The flag plumbs through the state-change snapshot, the resync_suspended() aggregator that maps L_SYNC_* to L_PAUSED_SYNC_*, the change-detection notifier path, and the per- peer_device suspension-reason printer (shows "max_parallel,"). Re-evaluation runs in three places: on every state change that finishes a resync (right next to the existing resume_next_sg() call), on every write to /sys/module/drbd/parameters/max_parallel_resyncs, and inline in drbd_start_resync() so the very first transition lands in L_PAUSED_SYNC_* instead of flicking through L_SYNC_* first. The flag is a local admission-control decision; it is not propagated to peers, so the wire protocol is unchanged. Each node makes its own admission decision based on its own max_parallel_resyncs. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>

drbd_determine_dev_size() acquired md_buffer first and then the AL transaction lock, while al_write_transaction() (reached from peer requests via drbd_al_begin_io_commit()) acquired them in the opposite order. Running concurrently the two callers can deadlock: the resize thread holds md_buffer and waits for the AL transaction lock; the AL committer holds the AL transaction lock and waits for md_buffer. In the field this manifested as drbdsetup resize hanging indefinitely inside drbd_determine_dev_size() and the corresponding submit worker stuck in drbd_md_get_buffer() inside al_write_transaction(). Fix this by taking the AL transaction lock in drbd_determine_dev_size() before drbd_md_get_buffer(), matching the ordering used everywhere else, and release both on every exit path. Fixes: 08050c7 ("drbd: Allow online change of al-stripes and al-stripe-size") Co-developed-by: Joel Colledge <joel.colledge@linbit.com> Signed-off-by: Joel Colledge <joel.colledge@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>

When got_peer_ack() splices a peer_request from connection->peer_requests onto peer_ack_connection->send_oos via drbd_queue_send_out_of_sync(), the EE_ON_RECV_ORDER flag is not cleared. That flag was originally a 1:1 marker for membership in connection->peer_requests, which is protected by connection->peer_reqs_lock; send_oos is a different list, protected by send_oos_lock. drbd_send_oos_from() then iterates send_oos in parallel from multiple sender threads (one per oos_node_id), each clearing its bit in send_oos_pending. The last sender that drives send_oos_pending to zero calls drbd_free_peer_req() outside send_oos_lock. drbd_free_peer_req() sees EE_ON_RECV_ORDER set, takes peer_reqs_lock, and list_del's a node that is actually linked on send_oos. Concurrent operations on send_oos under send_oos_lock (other senders, or list_splice_tail() in drbd_queue_send_out_of_sync()) then corrupt the list: list_del corruption. next->prev should be ..., but was ... kernel BUG at lib/list_debug.c:65! ? drbd_free_peer_req+0xe2/0x1b0 [drbd] drbd_send_out_of_sync_wf+0x1be/0x260 [drbd] drbd_sender+0x166/0x510 [drbd] Fix this by removing the peer_request from send_oos and clearing EE_ON_RECV_ORDER in drbd_send_oos_from(), under send_oos_lock, before calling drbd_free_peer_req(). Call drbd_send_oos_next_req() before the list_del so the iteration anchor stays on the list while the next position is computed. Fixes: 31f14b9 ("drbd: send P_OUT_OF_SYNC after P_PEER_ACK in parallel") Co-developed-by: Joel Colledge <joel.colledge@linbit.com> Signed-off-by: Joel Colledge <joel.colledge@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>

Peer_requests on peer_req->recv_order can be linked on four different lists: connection->peer_requests, connection->peer_reads, and peer_device->resync_requests, and connection->send_oos. EE_ON_RECV_ORDER carried the membership marker for all four. Introduce EE_ON_SEND_OOS to specifically mark membership on send_oos. EE_ON_RECV_ORDER keeps its meaning, now narrowed to membership on a list protected by connection->peer_reqs_lock. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>

This reverts commit 624e5e1. This approach causes problems when a connection is lost and then quickly re-established. The first TCP connection is still in TIME_WAIT or a similar state, preventing the connect of the new connection. As a result, this feature brings no real benefit. Remove it. Signed-off-by: Joel Colledge <joel.colledge@linbit.com>

…in parallel'

…resize and AL transactions'

…eer_req from send_oos list'

…ip with EE_ON_SEND_OOS'

…tch in drbd_find_path_by_addr()"'

DRBD assigns incoming connections to paths in two steps. First, the accepted socket is matched to a listener by (local_addr, local_port); listeners are shared per resource across all paths with those values (find_listener). Second, drbd_find_path_by_addr() walks the listener's waiters list and picks the path whose peer_addr matches the incoming peer_addr by IP only -- peer_addr_port is unusable because the remote side uses an ephemeral source port on incoming connections. Therefore, if two paths in distinct connections of a resource share (my_addr_IP, my_addr_port, peer_addr_IP), both become waiters on the same listener and the receiver cannot decide which connection an incoming socket belongs to. The existing check_path_usable() rejects exact byte-equal duplicates globally but explicitly allows same- resource partial-match duplicates regardless of connection. Fix this by rejecting such paths at adm_add_path() time, using a new helper drbd_path_conflicts_by_listener() that captures the routing condition above. Signed-off-by: Joel Colledge <joel.colledge@linbit.com>

dtt_wait_for_connect() slept on listener->wait, where listener is the struct dtt_listener pinned only by path->listener. While sleeping in wait_event_interruptible_timeout(), a concurrent del-path -> dtt_remove_path() -> drbd_put_listener() may drop the last reference on the listener and free it. The on-stack wait_queue_entry is then dangling on a freed wait_queue_head_t, and finish_wait() crashes: kernel BUG at lib/list_debug.c:51! RIP: 0010:__list_del_entry_valid.cold+0x31/0x47 Call Trace: finish_wait+0x42/0x90 dtt_wait_for_connect.constprop.0+0x407/0x580 [drbd_transport_tcp] dtt_connect+0x28e/0xdc3 [drbd_transport_tcp] conn_connect+0xc6/0x760 [drbd] drbd_receiver+0x14/0x60 [drbd] Fix this by getting an explicit reference on the listener in dtt_wait_for_connect(). Fixes: dfa4fa2 ("drbd_transport_tcp: Fix connect time") Co-developed-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Co-developed-by: Joel Colledge <joel.colledge@linbit.com> Signed-off-by: Joel Colledge <joel.colledge@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>

TCP was the odd one out among the three transports: lb-tcp and rdma both place their connect-side wait primitive in the per-transport struct (dtl_transport->connected_paths_change, dtr_transport->connected) and fan out from the listener via a work_struct or rdma_cm callback. Only TCP's listener owned a wait_queue_head_t that the receiver slept on directly. Move wait from struct dtt_listener to struct drbd_tcp_transport, which has connection lifetime. This aligns the three transports and removes the thundering-herd wake-up on a shared listener: previously every receiver on every path under the listener woke and re-ran dtt_wait_connect_cond(); with the per-transport waitqueue the producer can wake only the target path's transport. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>

…onnect() to fix use-after-free'

…p_transport'

…-master Signed-off-by: DF Build Team <df-build-team@redhat.com>

rck and others added 6 commits April 28, 2026 14:20

containers: add Ubuntu Resolute

df38d28

Signed-off-by: Roland Kammerer <roland.kammerer@linbit.com>

df-build-team force-pushed the sync_us--master branch 7 times, most recently from 1463bf5 to 1653ca6 Compare May 6, 2026 08:03

Philipp-Reisner and others added 5 commits May 6, 2026 11:18

Merge drbd-9.2/df38d2887517 'containers: add Ubuntu Resolute'

81d43a3

drbd: restore filelist-suse

cd43625

Oops, turns out this was still referenced in our build system, so SUSE builds are broken now. Partial revert of d107002 (drbd: housekeeping, remove some unused files). Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>

drbd: expose resync_susp_max_parallel to userspace

cc4589e

This lets drbdsetup status display the parallel-resync suspension explicitly, alongside the other resync_susp_* reasons. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>

df-build-team force-pushed the sync_us--master branch 2 times, most recently from 8927d96 to 97e5365 Compare May 8, 2026 08:03

df-build-team force-pushed the sync_us--master branch 3 times, most recently from f2f7062 to d70f033 Compare May 11, 2026 08:11

df-build-team force-pushed the sync_us--master branch from d70f033 to fd323c6 Compare May 12, 2026 08:07

JoelColledge and others added 4 commits May 12, 2026 14:06

Merge drbd-9.2/7ed3bf75d58f 'build: enable fault injection using ccfl…

fa8b654

…ags-y' Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>

Merge drbd-9.2/d8892467521c 'build: stamp compat.h against include/co…

490c6ed

…nfig/auto.conf, not .config' Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>

Merge drbd-9.2/624e5e1585fc 'drbd: prefer exact (IP, port) match in d…

3e289fb

…rbd_find_path_by_addr()' Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>

debian: point to new drbd-dev mailing list

2c5cf10

Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>

df-build-team force-pushed the sync_us--master branch 2 times, most recently from 5ea3711 to 61a155c Compare May 14, 2026 08:12

df-build-team force-pushed the sync_us--master branch 3 times, most recently from 08c1dd9 to 84e15bd Compare May 17, 2026 08:06

df-build-team force-pushed the sync_us--master branch from 84e15bd to e05b07e Compare May 18, 2026 08:04

Philipp-Reisner and others added 10 commits May 18, 2026 12:01

Merge drbd-9.2/2c5cf1050a0a 'debian: point to new drbd-dev mailing list'

0a724c2

Merge drbd-9.2/0d2f376a7d72 'drbd: limit number of volumes resyncing …

b440781

…in parallel'

Merge drbd-9.2/1432b01ffcc1 'drbd: fix AB-BA deadlock between online …

5c318c3

…resize and AL transactions'

Merge drbd-9.2/d7eea74d67e5 'drbd: fix list corruption when freeing p…

f29be52

…eer_req from send_oos list'

Merge drbd-9.2/198d4ebdaa12 'drbd: distinguish send_oos list membersh…

9733fff

…ip with EE_ON_SEND_OOS'

Merge drbd-9.2/f1d6d44889b7 'Revert "drbd: prefer exact (IP, port) ma…

0b05e43

…tch in drbd_find_path_by_addr()"'

df-build-team force-pushed the sync_us--master branch 2 times, most recently from 0944a02 to 4e3ae8c Compare May 20, 2026 08:03

Philipp-Reisner and others added 5 commits May 20, 2026 12:29

Merge drbd-9.2/aad3ba1b37aa 'drbd: pin listener across dtt_wait_for_c…

ead3619

…onnect() to fix use-after-free'

Merge drbd-9.2/36b894b77d23 'drbd: move TCP listener wait queue to tc…

cf3dedd

…p_transport'

Merge commit 'cf3dedd09e1cc5becea4fcb5b847ba4b3ea79e94' into sync_us-…

bd57542

…-master Signed-off-by: DF Build Team <df-build-team@redhat.com>

df-build-team force-pushed the sync_us--master branch from 4e3ae8c to bd57542 Compare May 21, 2026 08:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Syncing latest changes from upstream master for drbd#3

Syncing latest changes from upstream master for drbd#3
df-build-team wants to merge 35 commits into
masterfrom
sync_us--master

df-build-team commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

df-build-team commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants