Skip to content

[Bug] TAPE Remote Homology crashes on first run with new tokenizers #10

@yxliu-TAMU

Description

@yxliu-TAMU

Bug:

Running the Remote Homology task with a new tokenizer crashes at Epoch 0 with CUDA error: device-side assert triggered (label index out of bounds).

Reason:

In src/dataset/base.py, the cache_all_tokenized() method calls additional_label_filtering_for_TAPE_homo() when loading from cache, but not when creating a new cache. This causes unfiltered labels (0-1195) to be used instead of filtered labels (0-44).

Solution:

Add the filter call after creating the cache in cache_all_tokenized():

if flag:
    torch.save(self.data, cache_file_name)
self.additional_label_filtering_for_TAPE_homo(tokenizer_name)  # Add this line

Workaround: Re-run the script. The second run loads from cache and applies the filter correctly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions