The fomula in this this line seems to have a problem sum_{k=1}^{K} target_prob_k * (logits_k - log sum_{k=1}^K exp(logits_k)) - const = (sum_{k=1}^{K} target_prob_k * logits_k) - log sum_{k=1}^K exp(logits_k) - const https://github.com/NVIDIA/NeMo-Aligner/blob/main/nemo_aligner/utils/distributed.py#L622
The fomula in this this line seems to have a problem
sum_{k=1}^{K} target_prob_k * (logits_k - log sum_{k=1}^K exp(logits_k)) - const
= (sum_{k=1}^{K} target_prob_k * logits_k) - log sum_{k=1}^K exp(logits_k) - const
https://github.com/NVIDIA/NeMo-Aligner/blob/main/nemo_aligner/utils/distributed.py#L622