Hi, thank you for releasing the HGM paper and code.
I have a clarification question about the final agent selection strategy. In Section 3.3, the paper defines the best-belief agent using the epsilon percentile of the utility posterior, and later states:
In all experiments, we employ HGM with an exploration-exploitation scheduler B / b, where b is the remaining budget, epsilon = 1, and alpha = 0.6.
I am a bit confused about the interpretation of epsilon = 1.
If epsilon = 1 is interpreted as the 100th percentile of a Beta posterior, then the percentile value would be 1 for all agents, so it would not distinguish between different agents.
Did you intend epsilon = 1 to mean the 1st percentile of the posterior, i.e. epsilon = 0.01 in probability notation?
Thanks for the clarification!
Hi, thank you for releasing the HGM paper and code.
I have a clarification question about the final agent selection strategy. In Section 3.3, the paper defines the best-belief agent using the epsilon percentile of the utility posterior, and later states:
I am a bit confused about the interpretation of
epsilon = 1.If
epsilon = 1is interpreted as the 100th percentile of a Beta posterior, then the percentile value would be 1 for all agents, so it would not distinguish between different agents.Did you intend
epsilon = 1to mean the 1st percentile of the posterior, i.e.epsilon = 0.01in probability notation?Thanks for the clarification!