Percentage of correct segments is currently computed by also checking for overlap of predicted and actual silence durations between words. However for Mauch and Jamendo datasets there are no annotated word offsets, so it makes no sense to check for silences in this case.
I had to convert my predictions to have word offset=word onset time of the next word to ensure the metric is correctly calculated in this case, but it feels wrong to have to throw away the models predicted offsets for this.
It would be better to have a flag in the evaluation code that optionally activates/deactivates checking the silence segments so you don't have to change the predictions you feed in.