About training time

I am reproducing the training process of the Tokenizer. I am using the same code, with data from ImageNet1K totaling 1.28 million samples. On a single A100 GPU with a batch size of 16, it shows that a single epoch takes 260 hours.

This is quite different from what the paper says, that VFMTok requires 1.5 days of training on 16 Nvidia H800 GPUs. Could you please tell me what your setup is?