Skip to content

Cache update#6

Open
mlech-reef wants to merge 4 commits into
masterfrom
cache-upd
Open

Cache update#6
mlech-reef wants to merge 4 commits into
masterfrom
cache-upd

Conversation

@mlech-reef
Copy link
Copy Markdown
Contributor

  • Default cache duration set to 12 minutes
  • Removed 5 min upper limit for clean interval
  • Try to get pubkey from pylon in case of the cached one doesn't match

Comment thread internal/auth/auth.go

// loadExpectedPublicKey fetches expected ed25519 public key for given organization (hotkey) from Pylon
// Uses cache to avoid repeated requests for the same hotkey
// Note: orgName should already be sanitized by the caller
Copy link
Copy Markdown

@jakub-zytka-reef jakub-zytka-reef Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could enforce this instead of adding a comment. (Yes, I'm a type system junkie. No, I'm not well versed in golang, so please bear with me...)

Let the orgName be of type SanitizedOrgName, and the only producer of said type be sanitizeOrgName
Or actually, the type constructor should sanitize and that's all.

WDYT?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO I don't want to create another abstraction and custom type for something that is just a string and is purely internal in auth.go and not publicly available outside. Smells like Java/OOP

However, to address your concern and still satisfy procedural go code, I can just move sanitization to the extractOrganizationName and the problem is solved

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm eager to learn.

Please tell me, what in type SanitzedOrgName string smells you of Java, and how does this construct relates to OOP.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not, other than it is really redundant IMHO. I was more referring to creating another struct with a new New method, just to be able to sanitize it there

ListenAddr: ":8080", // default HTTP port
PylonEndpoint: "http://pylon:8000", // default Pylon endpoint
CacheDurationMins: 15, // default cache duration in minutes
CacheDurationMins: 12, // default cache duration in minutes
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's make a constant.
it's easy to grep the constant name. not necessarily so with a specific value

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no struct constant in go. Moving every single value to the global scope will not make it easier to grep, but it produces more redundant code. I am not sure if I understood your comment, but I can do one of below if you want :D Doesn't really matter imo

You want something line this:
image

... or this:
image

?

BTW. Te current state is very common pattern in go.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you wrote a test in which you have checks like:

if config.CacheDurationMins != 12 {

and there are multiple comparisons like that.
Some of them even explain the semantics of the line:

// Should fall back to default
	if config.CacheDurationMins != 12 {

So what are you testing in these tests?
If you are testing that the default value is 12 then such test is useless. Their only purpose is to make the next person change multiple places instead of one when they will be changing the default value.
So such tests should be removed.

Then, if you are testing that in some circumstances the default config should be used, then you should be testing that the default value is used, not 12.
Instead of that you hardcoded 12 and added a comment.
Am I supposed to grep for 12 and find out where does it come from?

Just make const DefaultCacheDurationMins 12 somewhere by the config and use it there...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly. I am testing if the default value is 12. This is only thing I care - it must be 12, period.

If you start asserting the code that you test towards other part of the code you test - you test nothing. This is violation of any good testing practices. The test code must be the source of truth. You can not convince me otherwise. If you refer to interaction-based assertion, which looks like you do, for simple values it doesn't make any sense, as you could set 12 in two places and your test is still passing - you can not verify an interaction between DefaultCacheDurationMins and CacheDurationMins like in case of methods/functions.

If someone come and change that 12 to something else, the test will fail and he/she will think twice updating the test. Nothing needs to be searched, but if you really want, you can grep for CacheDurationMins\: not 12. I don't understand what is the difference between greping for CacheDurationMins\: and greping for DefaultCacheDurationMins\: in terms of DX.

My question is still valid. I can move the values to the global constants like in (1), but usually you wrap it in one

const (
	...
)

however it is not required by the gofmt, so I can do it your way. But I am not going to assert towards DefaultCacheDurationMins. If I add it, I will actually assert both towards 12.

Comment thread internal/auth/auth.go
if cachedKey, found := a.cache.Get(sanitizedOrgName); found {
log.Printf("Cache hit for organization '%s'", sanitizedOrgName)
expectedPub = cachedKey

Copy link
Copy Markdown

@jakub-zytka-reef jakub-zytka-reef Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this code difficult to follow.

I think this would be better:

if cachedKey, found := a.cache.Get(sanitizedOrgName); found {
		log cache hit
		if err := a.validatePublicKey(cert, cachedKey); err == nil {
		   log auth granted
		   w.writeOK("Access granted ...)
		   return
		} else {
           log invalid cache
           a.cache.Invalidate(sanitizedOrgName)
        }
}

expectedPub, err := a.loadExpectedPublicKey(sanitizedOrgName)
if err != nil {
				log.Printf("Failed to load expected public key for organization '%s': %v", sanitizedOrgName, err)
				a.writeForbidden(w, "Access denied: Organization not authorized")
				return
}

// Store fresh key in cache
a.cache.Set(sanitizedOrgName, expectedPub)

if err := a.validatePublicKey(cert, expectedPub); err != nil {
				log.Printf("Certificate validation failed for organization '%s': %v", orgName, err)
				a.writeForbidden(w, "Access denied: Certificate validation failed")
				return
} else {
    log.Printf("Certificate validation successful for organization '%s'", orgName)
	a.writeOK(w, "Access granted")
}

notice how much shorter the code is, and how there's almost no nested ifs. Also the context doesn't jump around - in the original code when the cached key is correctly validated you first need to jump 20 lines from if err := a.validatePublicKey(cert, cachedKey); err != nil to the end of the if, then ~25 lines again down to see what happens when you leave the big if
You loose some context in the log lines (but not in the logs in their entirety), but get much simpler code. I take this tradeoff 10 out of 10 times.

Btw, there is tight coupling of auth logic with response writing, that manifests by

a.writeForbidden(...)
return

were the code written like

return CERTIFICATE_VALIDATION_FAILED

and the caller would use the writer to actually write the decision you could test each aspect separately, and the code wouldn't need much changes if there was another writer used in the future. Now, the logic of deciding whether a cert is good relies on http.ResponseWriter. There is no need for that.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kindly disagree.

Current code is much more readable:

  1. Happy path only ends in one place - at the end of the file. In nested places, you return only in case of unhappy path
  2. You have clear separation of two main paths: (a) cache hit in if, (b) cache miss in else. So quick look at the code and you immediately know what is going on. I don't mind have some code duplication, if I can have better logs (maybe it is devops inside me)
  3. This is not large if...else. Most of the lines are error handling via values, which is go thing and you simply must get used to that

I am happy to discuss it in more details ;)

Copy link
Copy Markdown

@jakub-zytka-reef jakub-zytka-reef Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy path only ends in one place - at the end of the file. In nested places, you return only in case of unhappy path

This is one of the reasons why the code is needlessly complex.
There isn't one happy path. There are multiple:

  1. cache hit, certificate valid
  2. cache hit, certificate invalid, fresh key valid
  3. cache miss, fresh key valid

They were made convoluted so that they end up together in one line.

So let's consider the complexity of happy paths. I want to check if we don't OK in some case that we shouldn't. That's a reasonable ask, isn't it?

Since I read the code from top to bottom, I don't know you made the decision to return OK as the very last thing in the function.

And so to achieve my goal I need to go through the whole 50 lines of code back and forth and check every set of possible conditions to see what is happening there.

So it goes like this:
read lines 178-181. I need to push "cache was hit" to the context.
then I read 182-184. I need to push "public key not validated" to the context, to make it "cache was hit, public key not validated"
then I read 188-202. I push and pop from the context because there are ifs.
I need to remember that if a set of 3 conditions evaluates to some specific values the access will be forbidden.
In 203/204 I need to scan back to 178 to check what the closing bracket are about. I could pop from the context stack, but by that time I read 20 lines of code, made 4 operations on the context, and have to remember 2 specific cases in which 3 ifs evaluate to specific values, and that then the access is forbidden. That's a lot, so I don't want to make a mistake. I scan back, refresh my memory on the context.
So far I scanned ~50 lines of code already.

in 204 there's an else. I scan down, because for the time being I don't care for happens in the other branch. I could go there, but then the amount of information that I need to remember grows even larger.
So I scan to 226 to learn that now the access is granted.
This means that the if's I read earlier must exhaust all the cases where the access should be forbidden. Does it? I need to scan back to 183-202 part and find the answer to a different question than before:
"do, under the conditions from line 178 and 182 conditions from 189 and 198 exhaust all the possibilities where we should deny access".

I've scanned the same code multiple times and I haven't even touched the else block.

Let's compare that to my proposal:
it has a flat structure, each part is independent. There is no reason to scan back in any place.
The first part:

if cachedKey, found := a.cache.Get(sanitizedOrgName); found {
		log cache hit
		if err := a.validatePublicKey(cert, cachedKey); err == nil {
		   log auth granted
		   w.writeOK("Access granted ...)
		   return
		} else {
           log invalid cache
           a.cache.Invalidate(sanitizedOrgName)
        }
}

has ~10 lines and simple semantics:
if there's a correct key in cache - allow access; if not the cache will not have a key.

After reading that part and deeming that semantics is satisfied you can forget about it's contents entirely. The ONLY thing you need to remember is that "if the control progressed after the block then there is no key in the cache"; all scenarios with the key in the cache have been dealt with (correctly).
So far we dealt with "cache hit" scenario. There were ~10 lines and 2 ifs in total.

The second part is also dead simple - load the key to the cache. If it cannot be loaded forbid access:

expectedPub, err := a.loadExpectedPublicKey(sanitizedOrgName)
if err != nil {
				log.Printf("Failed to load expected public key for organization '%s': %v", sanitizedOrgName, err)
				a.writeForbidden(w, "Access denied: Organization not authorized")
				return
}

// Store fresh key in cache
a.cache.Set(sanitizedOrgName, expectedPub)

Again, once you are satisfied this semantics is achieved you can forget about the details of that part. With 1 if you reach a point in code where "there is a fresh public key in the cache". All other scenarios have been dealt with correctly, so no need to concern yourself with that.

And so there's the final part:

if err := a.validatePublicKey(cert, expectedPub); err != nil {
				log.Printf("Certificate validation failed for organization '%s': %v", orgName, err)
				a.writeForbidden(w, "Access denied: Certificate validation failed")
				return
} else {
    log.Printf("Certificate validation successful for organization '%s'", orgName)
	a.writeOK(w, "Access granted")
}

Again, it's dead simple and you can tell the "fresh key in the cache" is handled correctly.
There were 4 ifs in total, the max nesting was 2. You never had to have more than ~10 lines of code in the context + 1 line uncoditional state description. Regardless of whether you were looking for happy paths or negative paths you had to do the same amount of work or the order of function length.

In your solution there are 6 ifs, with max nesting 3. It is necessary to keep much more in the context (say, ~20 lines + scanning over lines that are not interesting), keeping an intricate state descriptions like the cache was hit, the key was invalid, it was unable to load key or it was possible to load the key but they key was not valid.

I encourage you to measure code complexity of the proposed solutions using existing tools...

You have clear separation of two main paths: (a) cache hit in if, (b) cache miss in else. So quick look at the code and you immediately know what is going on. I don't mind have some code duplication, if I can have better logs (maybe it is devops inside me)

This forces you have a state that is a cross product of different conditions, making the code more complex and more difficult to test, as the conditions are independent (e.g. the key is valid or not regardless of if it was loaded from cache or subtensor).
You can have equally good logs in my solution, but since it's linear you might need to read more than 1 line of log. This is a very good tradeoff, because simpler code means less bugs means less log reading in the first place. Reading 2 lines instead of 1 is near zero cost once you actually start reading the logs.

This is not large if...else.

It is needlessly larger than it needs to be by every metric I know of.
That said, I'm open to any new perspective.

Most of the lines are error handling via values, which is go thing and you simply must get used to that

this is completely irrelevant.
FWIW, you handled errors by writing to a http response rather than by returning values, which introduces unnecessary coupling to http, but OK, let's keep the changes to minimum and let's hope there won't be any other use for the auth mechanism and so we won't need to refactor that in the future.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't want to delve into this discussion. Clearly, we have different opinion. Just 2 things I want to share.

  1. More code != more complexity. And I am not talking about cyclomatic complexity, like mccabe which is mostly useless. I m talking about developer experience. As a experienced developer (I think), I like my code better than yours. I would prefer to read my code than yours. I am more than sure, I wold understand my code faster than yours. You haven't gave me any solid arguments that I should change my mind, other than some subjective opinions. So even if prefer your "style", I would probably keep this one as the logging is better - which is a way way more important IMHO than some theoretical divagation about how developers read the code.
  2. Coupling to http is obvious as this is http handler with predefined (by the go standard library) function signature. Abstracting prematurely is a cancer caused by the few influencing books (Clean Code and others). Maybe it works for Java, I don't know.

So please close this PR or like described in the handbook:

If there is a change that is up to taste, the reviewer should refrain from suggesting it. If there is no concrete gain from the change (and maybe others prefer the way of the author, so we don't really know which is better), then it's better not to spend the energy on a discussion and just accept the code as is. We call it “the author's preference”.

The author of the PR must take all review comments into consideration, but as a senior engineer, they are expected to make a decision on how to address them.

let's merge it if you don't see any bugs and you can change it in any way you want after that, signing the commits with your name ;)

No hard feelings from my side. I love heat discussions about coding :) Hopefully you feel the same way,

Copy link
Copy Markdown

@jakub-zytka-reef jakub-zytka-reef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider the comments I left ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants