libmesh_terminate(), for relatively-quiet exit via the terminate handler #4369

roystgnr · 2026-01-21T16:40:44Z

This should accommodate users who need to exit suddenly (without manually unwinding a stack) and quietly (without distracting from their own exit output) while still cleaning up (so they don't get segfaults when dangling references to our custom output buffers interact with iostreams cleanup later).

This should accomodate users who need to exit suddenly (without manually unwinding a stack) and quietly (without distracting from their own exit output) while still cleaning up (so they don't get segfaults when dangling references to our custom output buffers interact with iostreams cleanup later).

We don't currently have libMesh infrastructure for testing output of lots of executable runs, so this will have to do until we get to MOOSE CI.

loganharbour

This looks good to me

lindsayad · 2026-01-21T19:19:38Z

src/base/libmesh.C


 void libmesh_terminate_handler()
 {
+  bool print_debug_info = true;


Suggested change

bool print_debug_info = true;

bool print_libmesh_info = true;

I don't know the best name here but this feels a little better to me. We're still printing the std_ex.what() so it feels like we're still printing some debug info. And this boolean also controls more things than strictly debug info such as perf log output. Anyway this isn't a big deal

We're actually not printing std_ex.what() if that gets set to off - TerminationException doesn't inherit from std::exception and doesn't even implement what().

"debug info" is a kind of bad name for something that includes the perf log, though, yeah. If this was a public-facing identifier or if this wasn't blocking a huge update I'd change it, but now that Logan reports it (plus a tiny MOOSE patch to use it) works for his segfault issues, I want to get this merged ASAP.

jwpeterson · 2026-01-21T20:30:14Z

If I understand correctly, the idea here is to create an unlikely to be caught kind of exception so that the terminate handler is reached, but isn't it still implementation-defined whether the stack is unwound or not for an uncaught exception? I'm not against the change, just curious if there's any way to guarantee the stack won't be unwound.

moosebuild · 2026-01-21T21:05:12Z

Job Coverage, step Generate coverage on 1693412 wanted to post the following:

Coverage

	b252b8	#4369 169341
	Total	Total	+/-	New
Rate	65.31%	65.31%	-0.01%	0.00%
Hits	77568	77571	+3	0
Misses	41196	41208	+12	20

Diff coverage report

Full coverage report

Warnings

New new line coverage rate 0.00% is less than the suggested 90.0%

This comment will be updated on new commits.

roystgnr · 2026-01-22T05:58:50Z

It is implementation-defined whether the stack will be unwound before an uncaught exception hits the terminate handler. In the case of unwinding ... damn, that's actually a problem (albeit a theoretical and pre-existing problem), isn't it? We'll hit all the FooFinalize() APIs instead of getting to MPI_Abort(), and if we only threw the uncaught exception on a proper subset of ranks then we'll hang in MPI_Finalize() on those ranks while we're in MPI_Whatever() on the rest.

Maybe we really just need to expose the terminate handler (or rather a terminate function with parameters that the handler function can call with default values)? But even that might not work! One process calls MPI_Abort() first, then the rest get killed by the MPI implementation, somehow (it's not specified by the MPI standard!). So even if the process which called MPI_Abort() does its cleanup first, we may still end up with other processes hitting iostreams cleanup (via static destructors or some C++ magic) without hitting our cleanup_stream_buffers() first, at which point those ranks could segfault.

Maybe we could put critical cleanup into an MPI error handler callback? Hand that to MPI_Comm_set_errhandler() and then MPI_Abort() is supposed to call our handler, I think on every process.

roystgnr · 2026-01-22T06:31:13Z

Hand that to MPI_Comm_set_errhandler() and then MPI_Abort() is supposed to call our handler, I think on every process.

No, I think I'm misunderstanding the docs. That error handler isn't for when an abort hits you, it's for when any MPI function associated with that communicator (including MPI_Abort) is about to return an error.

So I'm again out of ideas. OpenMPI says it aborts processes via SIGTERM, so we could set a signal handler for that. With MPICH it looks like it might also be SIGTERM but it depends on process manager?

Maybe what we ought to be doing here is using dumb pointers to manage our _foo_syncd_thread_buffer objects? If what's going wrong is that we're hitting global destructors but not hitting cleanup, then the unique_ptrs we use are deleting those buffers without disconnecting them from the output streams first, which of course leads to any subsequent output stream usage being undefined behavior, a dangling pointer dereference ... but if we were using dumb pointers and manually deleting their targets after swapping them out, then we'd only be deleting after swapping. We'd now have a memory leak we might have to tell valgrind to ignore, but better that than a segfault.

roystgnr added 4 commits January 21, 2026 10:07

TerminationException class

a57f104

Check for and handle TerminationException

d4f1620

Add a libmesh_terminate() use case I can test

1693412

We don't currently have libMesh infrastructure for testing output of lots of executable runs, so this will have to do until we get to MOOSE CI.

loganharbour approved these changes Jan 21, 2026

View reviewed changes

lindsayad reviewed Jan 21, 2026

View reviewed changes

roystgnr mentioned this pull request Jan 21, 2026

Allow specifying element range in project_vector/project_solution #4368

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

libmesh_terminate(), for relatively-quiet exit via the terminate handler #4369

libmesh_terminate(), for relatively-quiet exit via the terminate handler #4369

roystgnr commented Jan 21, 2026

Uh oh!

loganharbour left a comment

Uh oh!

lindsayad Jan 21, 2026

Uh oh!

roystgnr Jan 21, 2026

Uh oh!

jwpeterson commented Jan 21, 2026

Uh oh!

moosebuild commented Jan 21, 2026

Uh oh!

roystgnr commented Jan 22, 2026

Uh oh!

roystgnr commented Jan 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

	bool print_debug_info = true;
	bool print_libmesh_info = true;

libmesh_terminate(), for relatively-quiet exit via the terminate handler #4369

Are you sure you want to change the base?

libmesh_terminate(), for relatively-quiet exit via the terminate handler #4369

Conversation

roystgnr commented Jan 21, 2026

Uh oh!

loganharbour left a comment

Choose a reason for hiding this comment

Uh oh!

lindsayad Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

roystgnr Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

jwpeterson commented Jan 21, 2026

Uh oh!

moosebuild commented Jan 21, 2026

Coverage

Warnings

Uh oh!

roystgnr commented Jan 22, 2026

Uh oh!

roystgnr commented Jan 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants