Make osq runner responsive to registration updates#2007
Make osq runner responsive to registration updates#2007zackattack01 wants to merge 23 commits intomainfrom
Conversation
a1d2603 to
01e4063
Compare
01e4063 to
2d74749
Compare
pkg/osquery/runtime/runner.go
Outdated
|
|
||
| func (r *Runner) Run() error { | ||
| for { | ||
| // if our instances ever exit unexpectedly, return immediately |
There was a problem hiding this comment.
I think this comment comment isn't accurate? I think runRegisteredInstances only returns if a shutdown was requested. And it only returns an error if a) shutdown was requested and b) we were trying to restart one or more instances during that time and c) hadn't successfully restarted one of them yet.
Either way, why would we return the error here instead of checking rerunRequired first?
There was a problem hiding this comment.
that's a good point thank you. we'd probably want to rerun regardless if required, I will update that comment and get this fixed up!
…n reload is required
…unner_registration_ids
RebeccaMahany
left a comment
There was a problem hiding this comment.
This makes sense to me! Really nice.
| return nil | ||
| } | ||
|
|
||
| r.slogger.Log(context.TODO(), slog.LevelDebug, |
There was a problem hiding this comment.
Nit -- I think reasonable to log this at the info level (so that we ship this log to the cloud)
There was a problem hiding this comment.
oh good call, will do!
| } | ||
|
|
||
| func (r *Runner) Interrupt(_ error) { | ||
| if r.interrupted.Load() { |
There was a problem hiding this comment.
Do you think it'd be useful to also do r.rerunRequired.Store(false) here? I'm thinking about the case where UpdateRegistrationIDs is called and then Interrupt is called while the shutdown/restart is ongoing. Could maybe add a test case for this as well?
There was a problem hiding this comment.
ohh yeah that seems wise. will do both!
| opts []OsqueryInstanceOption // global options applying to all osquery instances | ||
| shutdown chan struct{} | ||
| shutdown chan struct{} // buffered shutdown channel for to enable shutting down to restart or exit | ||
| rerunRequired atomic.Bool |
There was a problem hiding this comment.
Maybe add a quick comment here? // if set, the runner will restart all instances after Shutdown is called, instead of exiting or similar
Co-authored-by: Rebecca Mahany-Horton <rebeccamahany@gmail.com>
|
|
||
| type Runner struct { | ||
| registrationIds []string // we expect to run one instance per registration ID | ||
| regIDLock sync.Mutex // locks access to registrationIds |
There was a problem hiding this comment.
I discovered that if we are locking for a single value, there is an atmoic.Value or atmoic.Pointer that can be used, like:
var registrationIds atomic.Value
registrationIds.Store([]string{"a", "b", "c"})
theIds := registrationIds.Load().([]string)or
var registrationIds atomic.Pointer[[]string]
registrationIds.Store(&[]string{"a", "b", "c"})
theIds := registrationIds.Load()the internets generally say that atmoics are faster ... but I doubt that it makes any noticeable difference for us, also pointer to an array feels weird and I'm not sure of the implications
feel free to not act on this, I don't know if it's any better
| for range r.instances { | ||
| r.shutdown <- struct{}{} | ||
| } |
There was a problem hiding this comment.
what about giving each instance it's own shutdown channel to avoid the buffered channel?
| for range r.instances { | |
| r.shutdown <- struct{}{} | |
| } | |
| for instance := range r.instances { | |
| close(instance.shutdown) | |
| } |
There was a problem hiding this comment.
oh yeah i'm gonna give this a try, ty! I'm actually seeing some interesting behavior while trying to write a test for Interrupt during restart per Becca's comment here and I wonder if this wouldn't clear things up a little. the behavior I'm seeing in the test is that the interrupt can end up ignored if timed correctly and I think It's related to the buffered channel use so far
OsqRunnerinterface and which encompasses the previousInstanceQuerierinterface and adds our newRegistrationChangeHandlerrequirementsUpdateRegistrationIDsmethod to the runner which will detect any changes and restart the instances accordinglyTo allow for a graceful restart of all known instances after registration IDs has been updated, a few tricky changes were required that I'd like more eyes on. context:
Shutdownmethod, we currently close the shutdown channel to signal our shutdown intent. This works well but isn't compatible with needing to read from that channel again when the runner would stay up (e.g. updated registration ID restarts). resetting the channel did not feel correct and introduces data races due to the way we need to thread/use wait groups hereinstance.Exited()above that)r.shutdownto hold that message- the reason this worked as expected withclose(r.shutdown)is because that channel will read a zero value and fire immediately whenever it is read next, even if the select wasn't open before the close was calledIf anyone has alternate suggestions please let me know!