Bug:
Running the Remote Homology task with a new tokenizer crashes at Epoch 0 with CUDA error: device-side assert triggered (label index out of bounds).
Reason:
In src/dataset/base.py, the cache_all_tokenized() method calls additional_label_filtering_for_TAPE_homo() when loading from cache, but not when creating a new cache. This causes unfiltered labels (0-1195) to be used instead of filtered labels (0-44).
Solution:
Add the filter call after creating the cache in cache_all_tokenized():
if flag:
torch.save(self.data, cache_file_name)
self.additional_label_filtering_for_TAPE_homo(tokenizer_name) # Add this line
Workaround: Re-run the script. The second run loads from cache and applies the filter correctly.
Bug:
Running the Remote Homology task with a new tokenizer crashes at Epoch 0 with
CUDA error: device-side assert triggered(label index out of bounds).Reason:
In
src/dataset/base.py, thecache_all_tokenized()method callsadditional_label_filtering_for_TAPE_homo()when loading from cache, but not when creating a new cache. This causes unfiltered labels (0-1195) to be used instead of filtered labels (0-44).Solution:
Add the filter call after creating the cache in
cache_all_tokenized():Workaround: Re-run the script. The second run loads from cache and applies the filter correctly.