Skip to content

Fix potential nullptr-dereference in perfdata writer stats functions#10844

Open
jschmidt-icinga wants to merge 1 commit into
masterfrom
perfdata-writer-fix-stats-func
Open

Fix potential nullptr-dereference in perfdata writer stats functions#10844
jschmidt-icinga wants to merge 1 commit into
masterfrom
perfdata-writer-fix-stats-func

Conversation

@jschmidt-icinga
Copy link
Copy Markdown
Contributor

@jschmidt-icinga jschmidt-icinga commented May 12, 2026

This fixes a crash in GelfWriter, GraphiteWriter and OpenTsdbWriter that occurs when the writer's stats functions are called, for example by the icinga check plugin, before the writer has been initialized.

Fixes #10842.

@cla-bot cla-bot Bot added the cla/signed label May 12, 2026
@jschmidt-icinga jschmidt-icinga added bug Something isn't working and removed cla/signed labels May 12, 2026
@jschmidt-icinga jschmidt-icinga added this to the 2.16.1 milestone May 12, 2026
@jschmidt-icinga jschmidt-icinga added the backport-to-support/2.16 PRs with this label will automatically be backported to the v2.16 support branch. label May 12, 2026
Comment thread lib/perfdata/gelfwriter.cpp Outdated
Comment on lines +49 to +52
for (const GelfWriter::Ptr& gelfwriter : ConfigType::GetObjectsByType<GelfWriter>()) {
size_t workQueueItems = gelfwriter->m_WorkQueue.GetLength();
double workQueueItemRate = gelfwriter->m_WorkQueue.GetTaskCount(60) / 60.0;
auto connection = gelfwriter->m_Connection;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it even safe to access that member in that way here? If that pointer can chance concurrently, there should probably be some locking protecting the access.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the queries to be guarded with !obj->IsPaused() instead. I think this should be safe because once the object is no longer paused, it will always have a connection object.

But it occurs to me that this might need an additional ObjectLock, since in the unlikely event that the thread that gathers the stats is preempted after IsPaused(), but before m_Connection is dereferenced, and we then have an entire Pause/Resume cycle, there could still be a race.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the queries to be guarded with !obj->IsPaused() instead. I think this should be safe because once the object is no longer paused, it will always have a connection object.

Doesn't this rely on how the current implementation only initializes m_Connection in Resume(), but doesn't reset it in Pause()? That's somewhat inconsistent and doesn't sound like something I'd want to rely on (it's just an "remove inconsistency" fix away from bringing back an issue).

The PerfdataWriterConnection constructor doesn't already establish a connection by itself, does it? So one option could be to initialize the connection earlier, so that it is guaranteed to exist without any additional check (though that would need another look to ensure that there really isn't yet another race condition).

In Icinga DB, there's a pretty similar situation. Have a look at how it's done there (short version: it stores two copies of the pointer, one for fast internal access without explicit locking (Icinga DB makes sure that version is only ever accessed from one thread/strand) and one for external access with locking):

RedisConnection::Ptr m_Rcon;
// m_RconLocked contains a copy of the value in m_Rcon where all accesses are guarded by a mutex to
// allow safe concurrent access like from the icingadb check command. It's a copy to still allow fast access
// without additional synchronization to m_Rcon within the IcingaDB feature itself.
Locked<RedisConnection::Ptr> m_RconLocked;

@julianbrost julianbrost added the core/crash Shouldn't happen, requires attention label May 12, 2026
@jschmidt-icinga jschmidt-icinga force-pushed the perfdata-writer-fix-stats-func branch from 6a01308 to d010b58 Compare May 13, 2026 14:13
@cla-bot cla-bot Bot added the cla/signed label May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-to-support/2.16 PRs with this label will automatically be backported to the v2.16 support branch. bug Something isn't working cla/signed core/crash Shouldn't happen, requires attention

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2.16.0 icinga2 constantly crashes on secondary master with enabled graphite feature

2 participants