initial prefetch for simple single chunked dim#161
initial prefetch for simple single chunked dim#161ljstrnadiii wants to merge 2 commits intoxarray-contrib:mainfrom
Conversation
|
In another attempt to simplify an example and profile transferring data from multiple workers to a single worker (where an ml tasks would iterate over batches) I have created this example: |
|
@jhamman @maxrjones this is sort of the approach am considering developing. I think 2gbps should be fine, but I was able to get 8+gbps with https://github.com/NVlabs/tensorcom using basic k8s pods and a manifest, which uses msgpack with pyzmq. I am trying to avoid using that and stick with the dask mechanics, but I am tempted to mock up a quick profile script of using zmq to bypass dask entirely, but within dask tasks. This all might not belong in xbatcher, but I wanted to put it out there to get ay feedback people might have. |
|
Here is an example of using the prefetch generator with tf.data.Dataset |
|
Can the test at the bottom be wrapped as a function? I'm guessing it's not supposed to run for everyone. |
@cmdupuis3 I am not sure I understand what you are asking by wrapped as a function. Do you mean be able to submit to dask? The BatchGenerator should be available on this branch if you check it out and install in editable mode. |
|
Actually I think I was confused. I read |

POC Prefetch Generator:
This is a draft pr to articulate one possible approach to "prefetching" dask arrays or xarray arrays with dask.
The goals were to simultaneously:
as_completedI also tried one approach using a Queue on the workers. This felt weird and found myself reinventing features that dask already has.
Results
Using helm to deploy a cluster on kubernetes with 8 workers (4cpu and 16gb each and relatively standard network configurations), I am able to see:
What Next?
No clue. I would like to investigate
note: