mon,mds: make standby-replay state more capable #4
Draft
ifed01 wants to merge 980 commits into
Draft
Conversation
cmake/rgw: WITH_RADOSGW_POSIX depends on WITH_RADOSGW_DBSTORE Reviewed-by: Kefu Chai <tchaikov@gmail.com>
Fix single backticks to double backticks to properly end the inline preformatted formatting. Fixes the formatting overflowing until the next occurrence of double backticks seen in rendered docs, URL: https://docs.ceph.com/en/latest/radosgw/config-ref/#confval-rgw_scheduler_type Add full stops that seemed to be missing in desc attribute. Use singular word "value" in desc attribute when there's only one possible other value. Remove unnecessary "the". Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>
Expose rbd_default_clone_format option which has a fairly comprehensive description (much more verbose than most other options, anyway). This should help with understanding the difference between clone v1 and v2. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
doc/rbd/rbd-config-ref: add clone settings section Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
…dosgw src/common: Fix text formatting in options/rgw.yaml.in
* Introduce qa/clusters/crimson
4 deployment clusters (1/2/3/4 nodes) options same as classic.
* Symlink all cluster dirs to the common dir above
For now keep using only 1/2, we could add 3/4 later on.
* Move to "crimson cpu num" instead of specifying
"crimson cpu set" set.
- We expect users to mostly use this option for deploying
clusters, so use this as testing default.
* remove "crimson bluestore cpu set" which is responsible for
cpu pinning exclusiveness in seastar/alien cores.
* ignore "for optimal performance" cluster warning now that we
no longer pin cpus for testing.
Signed-off-by: Matan Breizman <mbreizma@redhat.com>
This is a preperation step for addind new backend options testing (e.g caching type) * move crimson's objectstore yamls from qa/config to qa/objectstore/crimson * Use the entire qa/objectstore/crimson where possible instead of symlinking each backend definition See: ``` ├── objectstore_tool │ ├── clusters │ ├── crimson-supported-all-distro -> .qa/distros/crimson-supported-all-distro/ │ ├── deploy │ ├── objectstore │ └── tasks ├── perf │ ├── clusters │ ├── crimson-supported-all-distro -> .qa/distros/crimson-supported-all-distro/ │ ├── deploy │ ├── objectstore -> .qa/objectstore/crimson │ ├── settings │ └── workloads ├── rbd │ ├── clusters │ ├── crimson-supported-all-distro -> .qa/distros/crimson-supported-all-distro/ │ ├── deploy │ ├── objectstore -> .qa/objectstore/crimson │ └── tasks ├── singleton │ ├── all │ ├── crimson-supported-all-distro -> .qa/distros/crimson-supported-all-distro/ │ └── objectstore -> .qa/objectstore/crimson ├── thrash │ ├── 0-size-min-size-overrides │ ├── 1-pg-log-overrides │ ├── 2-recovery-overrides │ ├── clusters │ ├── crimson-supported-all-distro -> .qa/distros/crimson-supported-all-distro/ │ ├── deploy │ ├── objectstore │ ├── thrashers │ └── workloads ```` Signed-off-by: Matan Breizman <mbreizma@redhat.com>
The directories which symlink to the common crimson objectstore will now also use 2q/lru randomly. Fixes: https://tracker.ceph.com/issues/72302 Signed-off-by: Matan Breizman <mbreizma@redhat.com>
as rbm is not yet supported in with the tool. Disable it properly. Signed-off-by: Matan Breizman <mbreizma@redhat.com>
When scheduling jobs with --sha1 instead of -c. The ceph-ci branch used is 'main'. However, ceph-ci doesn't actually have a main branch - Instead use ceph.git main branch. ``` Command failed on smithi116 with status 8: "wget -q -O /home/ubuntu/cephtest/admin_socket_client.0/objecter_requests -- 'http://git.ceph.com/?p=ceph-ci.git;a=blob_plain;f=src/test/admin_socket/objecter_requests;hb=main' && chmod u=rx -- /home/ubuntu/cephtest/admin_socket_client.0/objecter_requests" ``` Signed-off-by: Matan Breizman <mbreizma@redhat.com>
Bluestore already runs thrash/default Signed-off-by: Matan Breizman <mbreizma@redhat.com>
Signed-off-by: Matan Breizman <mbreizma@redhat.com>
https://tracker.ceph.com/issues/67446 is merged, We should be able to start testing Seastore similar to `crimson-rados/thrash` suite which uses ceph_test_rados and rados bench. crimson-rados-experimental is a copy of crimson-rados thrash with only objectstore changes. Once the experimental suite is ready, we could add seastore to crimson-rados/thrash and remove crimson-rados/thrash_seastore_* variants. See: https://tracker.ceph.com/issues/71237 Signed-off-by: Matan Breizman <mbreizma@redhat.com>
fixes : https://tracker.ceph.com/issues/73766 Signed-off-by: Abhishek Desai <abhishek.desai1@ibm.com>
fixes : https://tracker.ceph.com/issues/73769 Signed-off-by: Abhishek Desai <abhishek.desai1@ibm.com>
…realm-zonegroup mgr/dashboard: Carbonize Administration module > Create Realm/Zone group/zone Reviewed-by: Afreen Misbah <afreen@ibm.com> Reviewed-by: Nizamudeen A <nia@redhat.com>
…ert-msg mgr/dashboard: fix upgrade's cluster alerts popover Reviewed-by: Afreen Misbah <afreen@ibm.com>
crimson-rados/perf/clusters used fixed-2 though only a single node was used. To preserve the current behavior: * move to the correct fixed-1 definition * introduce ignorelist yaml as previously included in perf/clusters Signed-off-by: Matan Breizman <mbreizma@redhat.com>
…processing on construction. these variable are getting initialized on s3select/CSV flow, no valgrind local run had discovered any issue related to these variables. valgrind reports produced by teuthology points on run_s3select_on_csv to contain UninitCondition warning. sometimes. Signed-off-by: galsalomon66 <gal.salomon@gmail.com>
Fixed a race condition in the Inotify class where the ev_loop() thread and caller threads (add_watch/remove_watch) were accessing the wd_callback_map and wd_remove_map hash maps without synchronization. This caused a segfault during hash table operations when one thread was reading from the map while another was modifying it, leading to iterator invalidation and memory corruption. Backtrace from the crash: Frame 5: file::listing::Inotify::ev_loop()+0x190 Frame 4: ankerl::unordered_dense::v3_1_0::detail::table::find() Crash: Memory access violation during WatchRecord lookup The fix adds: - A mutex (map_mutex) to protect both hash maps - Lock guards in add_watch() and remove_watch() during map modifications - Lock guard in ev_loop() with proper copying of watch record data to avoid holding the lock during callbacks and prevent use-after-free See https://jenkins.ceph.com/job/ceph-pull-requests/169774/testReport/junit/projectroot.src.test/rgw/unittest_rgw_posix_driver/ Signed-off-by: Kefu Chai <k.chai@proxmox.com>
…eanup * in case of prefix per source this would prevent leaking this object * in case of share prefix, it would prevent data loss when other source buckets will try to commit an already comitted temporary object * when updatign the "last committed" attribute, the object must exist. this is so that commit without rollover (in case of cleanup) won't recreate the deleted object * some refactoring of try-catch code to have less nesting Fixes: https://tracker.ceph.com/issues/73675 Signed-off-by: Yuval Lifshitz <ylifshit@ibm.com>
The scrubber calls PG::_scan_rollback_obs() to clean up obsolete rollback objects. This function may queue a transaction to delete such objects. The commit modifies the scrubber, so that no rescheduling of the scrub is mandated if no transaction was queued. Fixes: https://tracker.ceph.com/issues/73773 Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
qa/tasks: make the cephadm and vstart_runner tasks aware of watchdog
The ModeCollector class is used to collect values of some type 'key', each associated with some object identified by an 'ID'. The collector reports the 'mode' value - the value associated with the largest number of distinct IDs. The results structure returned by the collector specifies one of three possible mode_status_t values: - no_mode_value - No clear victory for any value - mode_value - we have a winner, but it has less than half of the samples - authorative_value - more than half of the samples are of the same value Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
…nces mgr, mon, osdc: pass complex parameters by rvalue reference Reviewed-by: Adam Emerson <aemerson@redhat.com> Reviewed-By: J. Eric Ivancich <ivancich@redhat.com>
osd/scrub: scanning the rollbacks not mandating a reschedule Reviewed-by: Samuel Just <sjust@redhat.com>
The entire subsuite is pinned by centos_latest.yaml symlink, so the stanza in memcheck.yaml is redundant. Removing it allows to experiment with other distros just through varying the symlink target. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Co-authored-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Alexander Indenbaum <aindenba@redhat.com>
common: ModeCollector: locating the value of the mode Reviewed-by: Alex Ainscow <aainscow@uk.ibm.com>
that were left out by mistake in the previous commit. (the previous commit: 3efcdbf) Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
The can_serve_replica_read() function is called by replica to determine whether there are any uncommitted writes. If such writes exist, then the system will reject the IO to avoid the risk of reading data from a write which may yet be rolled back. The same code is going to be useful for EC direct reads. The string_view code is not expensive. Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
This was not necessary prior to direct reads, but is essential when the client needs to know which shard the read came from. Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
This allows a backend to expose how an object offset/length translates to an offset/length on a particular shard. For Replica, this is trivial. For EC, this means looking up the start and end offsets, then translating this to shard address space. Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
Sparse reads for EC are simple to implement, as the code is essentially identical to that of replica, with some address translation. When doing a direct read in EC, only a single OSD is involved and that OSD, by definition is the only OSD involved. As such we can do the more performant sync read, rather than async read. Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
This function is necessary for balanced reads and as such is required for EC too. Rename the function to make sense, given this change of purpose, but the functionality does not change. Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
For direct read failures, the locking is such that we cannot immediately send a new IO without deadlocking. This new interface allows an op to be sent as an asio post. Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
This parameter is not used by the _calc_target code. It is being removed just to clean up the code, as we are making some changes to _calc_target in later stages of the split io PR. Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
The functionality is not altered by this commit. In the future we want to post-process split-ios after recombining the read data. Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
…r shard This will eventually be used by SplitIo to direct ops to the correct OSD. Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
…it op When spliting ops, certain addition sub ops (e.g. get xattr) can be simply passed through to the child op. Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
This will provide the ability for Objecter to split up certain ops and distribute them to the OSDs directly if that provides a preformance advantage. This is experimental code and is switched off unless the magic pool flags are enabled. These magic pool flags were pushed in an earlier commit in the same PR. Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
…02511 doc: Fix Sphinx warning about theme option
…e-202511 doc/releases: Fix Sphinx warning in tentacle.rst
doc: Fix Sphinx warnings
…ephfs doc/cephfs: Small improvements in fscrypt.rst
cmake: disable WITH_BREAKPAD on power arch Reviewed-by: Kefu Chai <k.chai@proxmox.com>
debian: include rgw-gap-list manpage and rgw-policy-check in ceph-common Reviewed-by: J. Eric Ivancich <ivancich@redhat.com> Reviewed-by: Matan Breizman <mbreizma@ibm.com>
…tial_fix librbd: rbd_aio_write_with_crc32c store CRC32C with initial value -1 to match msgr2 validation Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
EC Direct Reads: First PR, background work Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com> Reviewed-by: Bill Scales <bill_scales@uk.ibm.com>
Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
Comment on lines
+2448
to
+2455
| int followables = mds_map.is_followable(rank); | ||
| if (followables < 2 ) { | ||
| dout(1) << " setting mds." << info->global_id | ||
| << " to follow mds rank " << rank << dendl; | ||
| fsmap.assign_standby_replay(info->global_id, fs.get_fscid(), rank); | ||
| do_propose = true; | ||
| changed = true; | ||
| break; | ||
| //break; |
There was a problem hiding this comment.
Is it about having at least 2 standby-replay mds?
Author
There was a problem hiding this comment.
Generally this should be rather configurable but for now this allows up to 2 MDS-es in standby-replay mode.
sajibreadd-croit
pushed a commit
that referenced
this pull request
Feb 11, 2026
…yed static" ``` Jan 20 09:27:16 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]: AddressSanitizer:DEADLYSIGNAL Jan 20 09:27:16 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]: ================================================================= Jan 20 09:27:16 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]: ==3==ERROR: AddressSanitizer: stack-overflow on address 0x7b512f6c8dd8 (pc 0x0000046e7a72 bp 0x7b512de7c900 sp 0x7b512f6c8dd8 T0) Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]: #0 0x0000046e7a72 in get_global_options() (/usr/bin/ceph-osd-crimson+0x46e7a72) (BuildId: 2a86043f51c9be9cb19801e276fb3ee36239556a) Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]: #1 0x0000046e540e in build_options() (/usr/bin/ceph-osd-crimson+0x46e540e) (BuildId: 2a86043f51c9be9cb19801e276fb3ee36239556a) Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]: #2 0x0000033b7949 in get_ceph_options() (/usr/bin/ceph-osd-crimson+0x33b7949) (BuildId: 2a86043f51c9be9cb19801e276fb3ee36239556a) Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]: #3 0x000003440540 in md_config_t::md_config_t(ConfigValues&, ConfigTracker const&, bool) (/usr/bin/ceph-osd-crimson+0x3440540) (BuildId: 2a860> Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]: #4 0x0000046856a8 in crimson::common::ConfigProxy::ConfigProxy(EntityName const&, std::basic_string_view<char, std::char_traits<char> >) (/usr> Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]: #5 0x000000eb6cb5 in seastar::shared_ptr_count_for<crimson::common::ConfigProxy>::shared_ptr_count_for<EntityName&, std::__cxx11::basic_string> .. Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]: ceph#40 0x000000ed6434 in seastar::future<int> seastar::futurize<int>::apply<crimson::osd::_get_early_config(int, char const**)::{lambda()#1}::ope> Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]: ceph#41 0x000000ed672b in seastar::async<crimson::osd::_get_early_config(int, char const**)::{lambda()#1}::operator()() const::{lambda()#1}>(seast> ``` This reverts commit 1ab0a8c. Fixes: https://tracker.ceph.com/issues/74481 Signed-off-by: Matan Breizman <mbreizma@redhat.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This allows 2 standby-replay daemons and permits cephfs scrubbing at standby-replay MDS.
To fully benefit from this PR one can use 'ceph mds freeze ' command to 'freeze' specific MDS. This leaves the latter in standby-replay mode permanently but lets other daemons to cycle through their states as designed. Hence FS keeps functioning properly with an additional standby-replay daemon. Which gets an ability to monitor FS in parallel this way.
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins test classic perfJenkins Job | Jenkins Job Definitionjenkins test crimson perfJenkins Job | Jenkins Job Definitionjenkins test signedJenkins Job | Jenkins Job Definitionjenkins test make checkJenkins Job | Jenkins Job Definitionjenkins test make check arm64Jenkins Job | Jenkins Job Definitionjenkins test submodulesJenkins Job | Jenkins Job Definitionjenkins test dashboardJenkins Job | Jenkins Job Definitionjenkins test dashboard cephadmJenkins Job | Jenkins Job Definitionjenkins test apiJenkins Job | Jenkins Job Definitionjenkins test docsReadTheDocs | Github Workflow Definitionjenkins test ceph-volume allJenkins Jobs | Jenkins Jobs Definitionjenkins test windowsJenkins Job | Jenkins Job Definitionjenkins test rook e2eJenkins Job | Jenkins Job DefinitionYou must only issue one Jenkins command per-comment. Jenkins does not understand
comments with more than one command.