I am training a classification model using AWS Sagemaker -TensorFlow. My training dataset is huge and distributed in 4 folders in the same AWS s3 bucket.
I defined input channels like this
inputs = {
'train1' : folder1,
'train2' : folder2,
'train3': folder3,
'train4':folder4,
'valid':folder
}
I am passing these channels 'ids' into my main train code and then reading the data using PIPE mode like this,
all_data = []
if mode = train:
for id in ids:
data = PipeModeDataset(channel=id, format = 'TFRecord')
data = parsing data here
all_data.append(data)
Now I am using all_data as my whole data and doing augmentation in it and then passing it to the training script.
I got an error while doing this, (error related to data). sometimes training hangs.
What I want to know is the correct way of using multiple channels for single training using PIPE mode
Thanks
I am training a classification model using AWS Sagemaker -TensorFlow. My training dataset is huge and distributed in 4 folders in the same AWS s3 bucket.
I defined input channels like this
inputs = {
'train1' : folder1,
'train2' : folder2,
'train3': folder3,
'train4':folder4,
'valid':folder
}
I am passing these channels 'ids' into my main train code and then reading the data using PIPE mode like this,
all_data = []
if mode = train:
for id in ids:
data = PipeModeDataset(channel=id, format = 'TFRecord')
data = parsing data here
all_data.append(data)
Now I am using all_data as my whole data and doing augmentation in it and then passing it to the training script.
I got an error while doing this, (error related to data). sometimes training hangs.
What I want to know is the correct way of using multiple channels for single training using PIPE mode
Thanks