Conversation
jcollins-axway
left a comment
There was a problem hiding this comment.
Looks good from a high level, needs pounded testing wise
|
copilots review Code Review: APIGOV-31191 vs mainSummary
Findings
Positives
Recommendations
|
|
Not a bad review at all |
|
caching changes require an extra amount of testing as its usually to blame for our duplicate service issues |
|
Give me time to look at this closer |
| } | ||
|
|
||
| func (es *EventSync) RebuildCache() { | ||
| // SDB - NOTE : Do we need to pause jobs. |
There was a problem hiding this comment.
Dang! @vivekschauhan @jcollins-axway I put this note in a very long time ago ;).
I was looking at another race condition down stream, but decided to see who called RebuildCache().
So RebuildCache is reachable by WithOnClientStop (poller) and With EventSyncError (stream) while the event listener go routine is alive, well technically alive.
In those paths the race is meh, maybe not a big deal because event flow has stopped like a harvester error or when the stream is not yet producing. But PublishingLock still doesn't coordinate with the listener. Maybe we should take a look at this again?
There was a problem hiding this comment.
lets put something on the backlog to take a look @sbolosan
There was a problem hiding this comment.
should we eat the error?
At least log the error/warn and then if requirement is to rebuild, then rebuild?
There was a problem hiding this comment.
maybe something like
timeToRebuild, err := a.shouldRebuildCache()
if err != nil {
a.logger.WithError(err).Warn("unable to determine cache rebuild state, triggering rebuild")
timeToRebuild = true
}
if timeToRebuild && a.rebuildCache != nil {
a.rebuildCache.RebuildCache()
}
There was a problem hiding this comment.
I think we can skip the error logging outside shouldRebuildCache since we already log it inside the method. Returning true on error inside the method would not require checking the return error and explicitly putting timeToRebuild on true
There was a problem hiding this comment.
I know this is old and not part of your changes, but can we do a safe guard here
strVal, ok := value.(string)
if !ok {
logger.Warn("cacheUpdateTime is not a string, triggering rebuild")
return true, nil
}
There was a problem hiding this comment.
Do we need this defensive code since in pkg/agent/evensync.go, line 162 that value is put in agent details and will always be a string? I don't think there are permissions to alter the agent details subresource from cli and I don't think anyone would do that, right?
There was a problem hiding this comment.
Technically you're right. But one extra ok check costs nothing and eliminates an entire class of runtime panic, no? I'm just saying x-agent-details is a map deserialized from JSON and JSON deseralization into interface{} has no type gurantees. It'll panic if future code changes forgets to stringify it. Up to you.
| kinds[management.ManagedApplicationGVK().Kind] = struct{}{} | ||
| kinds[management.AccessRequestGVK().Kind] = struct{}{} | ||
| case config.ComplianceAgent: | ||
| kinds[management.ComplianceRuntimeResultGVK().Kind] = struct{}{} |
There was a problem hiding this comment.
any way to use the watch topic that we create to determine these types?
There was a problem hiding this comment.
the watch topic includes kinds that are not persisted in cache (subscribed for event processing only), so deriving the validated set purely from the watch topic would cause false-positive validation failures for those kinds. The per-agent-type whitelist is intentional.
There was a problem hiding this comment.
The agent creates the watch topic on start, the logic that creates that has all the kinds, that is what I refer to. Not necessarily pulling the watch topic itself.
Cache validation on start up and reconnect