Skip to content

Conversation

@nrwahl2
Copy link
Contributor

@nrwahl2 nrwahl2 commented Jan 3, 2026

And some other assorted changes that came up along the way. This is part of a series of based refactors -- see #4011.

@nrwahl2 nrwahl2 requested a review from clumens January 3, 2026 07:03
halib_PROGRAMS = pacemaker-based

noinst_HEADERS = based_io.h \
based_operation.h \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are tab-aligned. GitHub just isn't showing it correctly.

@nrwahl2 nrwahl2 force-pushed the nrwahl2-based_first branch from 3db2f3e to a78443c Compare January 4, 2026 02:11
@nrwahl2
Copy link
Contributor Author

nrwahl2 commented Jan 4, 2026

Fixed a few little things and added one commit on top to drop some log messages.

@clumens FYI, even though this is 50 commits, it should be a pretty quick PR to review.

@nrwahl2 nrwahl2 force-pushed the nrwahl2-based_first branch 2 times, most recently from 96036fa to 3e19008 Compare January 5, 2026 19:33
@nrwahl2
Copy link
Contributor Author

nrwahl2 commented Jan 5, 2026

The first 22 commits (at time of writing) are from #4000. I proactively rebased on those to resolve conflicts. No other changes, except for a commit to include action_relation_internal.h at the correct position.

/* A nonzero exit code will cause further writes to be disabled. Use _exit()
* because exit() could affect the parent adversely.
*
* @TODO How could it affect the parent adversely?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to 9be1f96accf:

9be1f96accf cib/io.c     (Andrew Beekhof    2011-09-02 09:52:25 +1000 652)         /* exit() could potentially affect the parent by closing things it shouldn't
9be1f96accf cib/io.c     (Andrew Beekhof    2011-09-02 09:52:25 +1000 653)          * Use _exit instead
9be1f96accf cib/io.c     (Andrew Beekhof    2011-09-02 09:52:25 +1000 654)          */

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can clarify my comment, but I still don't see how _exit() is safer than exit(). From the exit(2) man page:

NOTES
       For a discussion on the effects of an exit, the transmission of exit 
status, zombie processes, signals sent, and so on, see exit(3).

       The function _exit() is like exit(3), but does not call any functions 
registered with atexit(3) or on_exit(3). Open stdio(3) streams are not flushed. 
On the other hand, _exit() does close open file descriptors, and this may cause 
an unknown delay, waiting for pending output to finish. If the delay is 
undesired, it may be useful to call functions like tcflush(3) before calling 
_exit().  Whether any pending I/O is canceled, and which pending I/O may be 
canceled upon _exit(), is implementation-dependent.

   C library/kernel differences
       The text above in DESCRIPTION describes the traditional effect of 
_exit(), which is to terminate a process, and these are the semantics specified 
by POSIX.1 and implemented by the C library wrapper function.  On modern 
systems, this means termination of all threads in the process.

       By contrast with the C library wrapper function, the raw Linux _exit() 
system call terminates only the calling thread, and actions such as reparenting 
child processes or sending SIGCHLD to the parent process are performed only if 
this is the last thread in the thread group.

       Up to glibc 2.3, the _exit() wrapper function invoked the kernel system 
call of the same name. Since glibc 2.3, the wrapper function invokes 
exit_group(2), in order to terminate all of the threads in a process.

Neither Pacemaker nor libqb calls atexit() or on_exit().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have been a little more clear. The code history gives us a reason why it could affect the parent, but doesn't explain what we're actually worried about could be closed. There are a handful of other places that call _exit but I imagine those are all just copy & paste or refactoring with not much thought given to why it's being used.

The only caller that gives a good reason is in lib/common/mock.c, but that's not going to be a concern here. I also imagine the caller in daemons/based/based_io.c would be pretty easy to test. In summary, I think we could probably just s/_exit/exit/, give it a cts-lab test, and hopefully get rid of these mysterious comments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...give it a cts-lab test...

Do you mean give it a new test, or do you mean run cts-lab? If you mean a new test, what do you have in mind for this?

@clumens
Copy link
Contributor

clumens commented Jan 5, 2026

I didn't notice if this happened over time or not... after so much change to based_io.c, it would be worth running it through include-what-you-use and cleaning up the include block at the top.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
It was added in 2007 by commit f3ec7ec. No motivation was given, the
file doesn't get installed, and it appears that nothing has ever used
it.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
@nrwahl2 nrwahl2 force-pushed the nrwahl2-based_first branch from 3e19008 to 2893d98 Compare January 5, 2026 22:11
@nrwahl2
Copy link
Contributor Author

nrwahl2 commented Jan 5, 2026

Rebased on main, no other changes except "Include action_relation_internal.h at right spot" commit.

@nrwahl2
Copy link
Contributor Author

nrwahl2 commented Jan 5, 2026

I didn't notice if this happened over time or not... after so much change to based_io.c, it would be worth running it through include-what-you-use and cleaning up the include block at the top.

Agreed. I need to set up IWYU. I've had a tab open for it for a few days.

nrwahl2 added 16 commits January 6, 2026 13:35
Also rename cib_writer to write_trigger and create a new based_io.h.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
No code changes aside from moving two function definitions to a position
above based_io_init() and dropping an extern declaration.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
No other code changes, except moving a variable declaration. The p
argument is always NULL, though that was not the case when this function
was created.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Also, drop the return statement after calling _exit(), and rename
exit_rc to rc.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
And drop comments that seem redundant.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
But not much.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
And add Doxygen, and call strerror(errno) since we know it's a system
errno and we don't have to worry about a Pacemaker return code.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
It's accessible via the mainloop_child_t object, now that we're exposing
its definition internally.

We could use the existing mainloop_child_pid() function. However,
nothing is using that yet, so I would rather deprecate it in the future.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Also use bool instead of gboolean.

The --disk-writes command line option seems unnecessary, but we can deal
with that later. But see T227.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
nrwahl2 added 28 commits January 6, 2026 14:58
Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
This has nothing to do with reading the CIB from a file, so it should
not be part of based_read_cib().

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
* Rename function to archive_unusable_file().
* Add Doxygen.
* Improve log messages.
* Rename new_fd variable to fd.
* Allow fd == 0. This should never happen in practice, but it's odd to
  check (fd < 0) for error and (fd > 0) for success when 0 is a valid
  file descriptor.
* Close fd sooner on success. mkstemp() is the safe and idiomatic way to
  generate a temp file name, because it avoids race conditions that can
  occur when generating a file name and then creating a file. However,
  we don't need the file to be open, and we don't need the fd except for
  the purpose of closing the file. By separating the (fd < 0) case from
  the (rename(...) < 0) case, we can be sure that (fd >= 0) when we call
  close().
* Rename old and new variables to old_file and new_file, respectively. I
  suspect that the use of "new" may be problematic when compiling as
  C++.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Previously, in case of a digest mismatch, we renamed the CIB file and
the digest file in cib_root to new unique names generated by mkstemp().
Without checking the error logs, the file names did not make it clear
that the two files were related, or which one was the CIB file and which
one was the digest file.

Now, we create a new subdirectory of cib_root using mkdtemp(). Then we
move both the CIB file and the digest file to the new subdirectory. They
keep the base names "cib.xml" and "cib.xml.sig".

It's not necessary for archive_on_digest_mismatch() to take arguments,
but this avoids the need to allocate redundant strings for the old
paths.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Also add Doxygen and rename to backup_cib_filter().

The trace/debug messages don't seem especially useful, and we recently
got rid of similar ones that were commented out of schema_filter().

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Also add Doxygen, rename to compare_backup_cibs(), and create a helper
function.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Note that the return value is guaranteed not to be NULL, because the
createEmptyCib() return value is guaranteed not to be NULL.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
As long as the_cib is a global variable, this function is unnecessary.
Also, it was too complicated -- it can be a one-liner, as shown here.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
There are two callers. The argument in startCib() is clearly non-NULL.

The argument in cib_process_command() is unsurprisingly less clear.
result_cib gets set by cib_perform_op(), and we call activateCibXml()
only if we're performing an operation that modifies the CIB, rc is
pcmk_rc_ok, and (result_cib != the_cib). result_cib should always be
non-NULL in that case. And if it ever is NULL, then this is something
I'd like to find out about via an assertion log.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Also include stddef.h for NULL, and rename cib_op_functions to
op_functions so that it doesn't look like a public libcib symbol.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
This is just some miscellaneous cleanup. For the most part, we use
xmlNode * instead of xmlNodePtr nowadays, because "const xmlNodePtr"
doesn't work. Also drop some unused includes and add some that were
missing in based_transaction.{c,h}.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
It's a simple wrapper around pcmk__xml_patchset_versions but requires
seven arguments.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
There's a message near the end of cib_process_request() that tells us
almost exactly the same things. The only thing that seems to be missing
is the pre-operation CIB versions. I don't think that's a big loss.

The only caller of based_diff_notify() is cib_process_command(). The
message in cib_process_request() gets logged shortly after the sole call
to cib_process_command(). So any time the based_diff_notify() message
would have been logged, the remaining one in cib_process_request() will
be too.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Using include-what-you-use.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Using include-what-you-use.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Using include-what-you-use.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Using include-what-you-use.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Using include-what-you-use.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Using include-what-you-use.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Using include-what-you-use.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
Using include-what-you-use.

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
@nrwahl2 nrwahl2 force-pushed the nrwahl2-based_first branch from 4299753 to 37e0c9e Compare January 6, 2026 22:58
@nrwahl2
Copy link
Contributor Author

nrwahl2 commented Jan 6, 2026

Updated:

  • Gave more detail in "TODO" within write_cib_async(), about _exit().
  • Added commits on top to typedef the mainloop child callback and to clean up includes in based.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants