Initial thoughts on what we want to do to set it up. These are up for discussion.
https://huggingface.co/datasets/HuggingFaceTB/finemath
Questions
- Do we want to pre-tokenize? If so which tokenizer? Will this be a problem if we want to train different models?
- Do we want to keep the tokenized and non-tokenized versions? I think this makes sense.
Initial thoughts on what we want to do to set it up. These are up for discussion.
https://huggingface.co/datasets/HuggingFaceTB/finemath
Questions