Skip to content

parallel transfer of files, full directory caching at first access and thread/daemon for backend sync! #10

@hradec

Description

@hradec

I have written a cache filesystem as well (hradecFS here on github), and although mine works well, I'm having a lot of trouble with thread locking (my experience debugging multi-thread code on a fuse filesystem has being a nightmare), apart from a major design flaw that I recently figured out and will require a lot of re-coding.

Your mcachefs code seems to use a similar idea as mine (assuming the backend never changes, transferring full size files once they're opened), and I really like how well it behaves, specially with search paths, like PYTHONPATH for example.

I just loved the journal idea... so nice and well implemented! And your mcachefs has a transfer queue, which I never got around writing for hradecFS.

I'm really considering putting my hradecFS aside for now, and implement a few concepts from it on mcachefs, like for example:


1. parallel transfer of file in the queue - This one was huge for hradecFS, specially when you have large files on the backend and everything the queue has to wait for it to finish. Having parallel transfer not only makes the filesystem more responsive, but also improves usage of bandwidth, specially over WAN. After I got parallel transfer working, responsiveness was improved tremendously.

2. full directory caching at first directory access - Every time one file is accessed at a certain directory, hradecFS would cache the whole directory listing, since it was querying the directory backend anyways. By caching the full directory listing, any subsequent files on the same directory would not require a backend query to check for the file. We already known if the file exists there or not! This is a must for search paths like PYTHONPATH, LD_LIBRARY_PATH and PATH. I'm not sure if mcachefs already does this since I didn't examined all the code yet. But if does, great!! (by using it for a bit, it does fells like it does, considering how fast it traverses PYTHONPATH on my tests...)

3. secondary maintenance thread (or a service daemon) to deal with backend updates on cached data - This one was on the final of my list for hradecFS, and I never got there. But the idea was to have something in background checking the backend for changes, and syncing the cache accordingly. This way hradecFS would still "see" changes in the backend, with a little delay after they happen, without any impact whatsoever to responsiveness.


I'm forking your code right now to start working on parallel file transferring. I would love to known what are your thoughts about it, and also about the other 2 ideas.

And I would also love to hear about your future plans for mcachefs!

Last but not least, please let me known about any quirks regarding threading on your code, specially problems you faced and bugs you ran into during development... again, threading has being my nightmare with fuse!! Anything that you feel like sharing would be appreciated!

anyhow, great work and thanks for sharing it!!

amazing!
cheers...
-H

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions