You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Apr 19, 2026. It is now read-only.
Hi!
I compared Keras fit time for dataset and experimental_distribute_dataset from your great notebook using latest TF version. It turned out that distributed dataset adds no speedup. Are you sure that your distributed input pipeline is well optimized for TPU? Why don't you use other optimizations like these:
def input_fn(batch_size):
"""> 2000 images/sec"""
files = tf.data.Dataset.list_files(FLAGS.data_dir)
def tftecord_dataset(filename):
buffer_size = 8 * 1024 * 1024 # 8 MiB per file
return tf.data.TFRecordDataset(filename, buffer_size=buffer_size)
dataset = files.apply(tf.contrib.data.parallel_interleave(
tftecord_dataset, cycle_length=32, sloppy=True))
dataset = dataset.apply(tf.contrib.data.shuffle_and_repeat(10000, NUM_EPOCHS))
dataset = dataset.apply(tf.contrib.data.map_and_batch(
parser_fn, batch_size, num_parallel_calls=4))
return dataset.prefetch(4)
if FLAGS.use_tpu:
# When using TPU, wrap the optimizer with CrossShardOptimizer which
# handles synchronizarion details between different TPU cores.
optimizer = tpu_optimizer.CrossShardOptimizer(optimizer)
Hi!
I compared Keras fit time for
datasetandexperimental_distribute_datasetfrom your great notebook using latest TF version. It turned out that distributed dataset adds no speedup. Are you sure that your distributed input pipeline is well optimized for TPU? Why don't you use other optimizations like these: