TPU-optimized pipeline

Hi!
I compared Keras fit time for `dataset` and `experimental_distribute_dataset` from your [great notebook](https://www.kaggle.com/mgornergoogle/custom-training-loop-with-100-flowers-on-tpu) using latest TF version. It turned out that distributed dataset adds no speedup. Are you sure that your distributed input pipeline is well optimized for TPU? Why don't you use other optimizations like [these](https://youtu.be/SxOsJPaxHME?t=956):
```
def input_fn(batch_size):
    """> 2000 images/sec"""
    files = tf.data.Dataset.list_files(FLAGS.data_dir)

    def tftecord_dataset(filename):
        buffer_size = 8 * 1024 * 1024   # 8 MiB per file
        return tf.data.TFRecordDataset(filename, buffer_size=buffer_size)

    dataset = files.apply(tf.contrib.data.parallel_interleave(
        tftecord_dataset, cycle_length=32, sloppy=True))
    dataset = dataset.apply(tf.contrib.data.shuffle_and_repeat(10000, NUM_EPOCHS))
    dataset = dataset.apply(tf.contrib.data.map_and_batch(
        parser_fn, batch_size, num_parallel_calls=4))
    return dataset.prefetch(4)

if FLAGS.use_tpu:
    # When using TPU, wrap the optimizer with CrossShardOptimizer which
    # handles synchronizarion details between different TPU cores.
    optimizer = tpu_optimizer.CrossShardOptimizer(optimizer)
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TPU-optimized pipeline #55

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

TPU-optimized pipeline #55

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions