Interested in GSoC 2026 Project 6: Interface for post-simulation analysis crawling of WESTPA simulations. #5268

kunjsinha · 2026-03-01T14:41:23Z

kunjsinha
Mar 1, 2026

Hi MDAnalysis team (and mentors @jeremyleung521 and @ltchong )

I'm Kunj, a CS student looking to work on project 6 for GSoC 2026. I have built similar CLI based apps before and have a slight bit of experience with multiprocessing as well. My proficiency is in Python and I am familiar with the git/github development workflow.

I have already

Installed WESTPA and ran the basic NaCl tutorial to generate a west.h5 + traj_segs files
Explored the HDF5 structure with h5py (and read the official organization wiki)
Went through w_crawl docs and Tutorial 7.5
Looked at MDAnalysis parsers/readers to see how a WESTPA one could fit

I plan to apply for this project and have a few targeted questions below to make sure I understand the exact scope before writing my proposal. Happy to discuss or share initial ideas

The project says “New MDAnalysis/MDAKit parser”. Should this be added as a core MDAnalysis reader (like the existing topology/trajectory parsers), or is an MDAKit preferred? Also, since this is a WESTPA collaboration project, would any parts ideally live in the westpa/westpa repo (like the Project 5 dashboard discussion)?
The description mentions that “Code that translates the topology in the HDF5 Framework to that of MDAnalysis has already been written and included in the source code of v2022.13.” How complete is it, and is it meant to be the foundation for the new parser?
Since w_crawl already does parallel analysis over segments and the skills explicitly list multiprocessing, I’m thinking the CLI tool could automatically parallelize simple MDAnalysis calls across iterations using multiprocessing (with a fallback serial mode). Is this the direction you have in mind, or would you prefer the parser itself to be lightweight and let users handle parallelism on top of the Universe?
From the HDF5 wiki, west.h5 contains metadata and links, but actual coordinates live in separate traj_segs/iter_XXXXXX.h5 files (and topology isn’t stored directly). How should the parser build a full MDAnalysis Universe? For example, should it treat segments as separate trajectories, support iteration/segment selection, automatically include weights from seg_index, or still require the user to provide a reference topology file?
To get familiar and demonstrate engagement, are there any beginner-friendly issues or small tasks in the MDAnalysis or WESTPA repos related to HDF5 readers or WESTPA support that I could pick up first?

Really looking forward to your thoughts!

Best Regards,
Kunj Sinha

jeremyleung521 · 2026-03-02T17:42:29Z

jeremyleung521
Mar 2, 2026
Collaborator

Hi Kunj. Welcome.

The project says “New MDAnalysis/MDAKit parser”. Should this be added as a core MDAnalysis reader (like the existing topology/trajectory parsers), or is an MDAKit preferred? Also, since this is a WESTPA collaboration project, would any parts ideally live in the westpa/westpa repo (like the Project 5 dashboard discussion)?

I think MDAKit or westpa/westpa would probably work best, with the latter being a little bit easier on the maintenance side.

The description mentions that “Code that translates the topology in the HDF5 Framework to that of MDAnalysis has already been written and included in the source code of v2022.13.” How complete is it, and is it meant to be the foundation for the new parser?

Well, the PR for the source code change is linked here. The code was written so one could read any supported trajectory format with mdanalysis --> save to HDF5 framework. How important/useful that is for the reverse direction, I'll leave it up to you. It's not mandatory to reuse that code.

Since w_crawl already does parallel analysis over segments and the skills explicitly list multiprocessing, I’m thinking the CLI tool could automatically parallelize simple MDAnalysis calls across iterations using multiprocessing (with a fallback serial mode). Is this the direction you have in mind, or would you prefer the parser itself to be lightweight and let users handle parallelism on top of the Universe?

I would probably have MDA do the parallelization (if possible) than run MDA multiple times. As for why, I won't give away the answer and let you think about it.

From the HDF5 wiki, west.h5 contains metadata and links, but actual coordinates live in separate traj_segs/iter_XXXXXX.h5 files (and topology isn’t stored directly). How should the parser build a full MDAnalysis Universe? For example, should it treat segments as separate trajectories, support iteration/segment selection, automatically include weights from seg_index, or still require the user to provide a reference topology file?

That logistic is something for you to plan and think over (and include in the pre-proposal). Overall, we want to feed in the west.h5 (and the iter_XXXX.h5 files) and get the auxdata/XXXX dataset within west.h5, like w_crawl, but not needing to write your own copy/analysis/read functions like w_crawl. The topology is already in the HDF5 file in an xml form.

To get familiar and demonstrate engagement, are there any beginner-friendly issues or small tasks in the MDAnalysis or WESTPA repos related to HDF5 readers or WESTPA support that I could pick up first?

Unfortunately on the WESTPA-end we're quite small so we don't have a bunch of "good-first-issues" all lined up, but we have some ideas in westpa/westpa#321 that might spark some ideas. The tutorials suite/westpa-test-system repos should provide enough examples for you to play around with.

0 replies

kunjsinha · 2026-03-04T15:05:17Z

kunjsinha
Mar 4, 2026
Author

Thank you for taking the time to answer each of my questions!
I'll address each point you made (parser placement, reuse of existing code, construction of Universe and access of auxdata, etc) in my pre-proposal.
Please let me know if there are any other points you would like being highlighted in my pre-proposal.
I'm really excited on having this opportunity and am grateful for the feedback.

0 replies

kunjsinha · 2026-03-27T11:36:54Z

kunjsinha
Mar 27, 2026
Author

Hey @jeremyleung521
I've just submitted my GSoC proposal draft through the GSoC portal as a PDF with a google doc link inside. Could you please review it.

Thanking You,
Kunj Sinha

1 reply

jeremyleung521 Mar 28, 2026
Collaborator

Thanks for following the instructions on the MDA GSoC FAQ to a T! I just made some comments on the google doc so I hope that helps!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interested in GSoC 2026 Project 6: Interface for post-simulation analysis crawling of WESTPA simulations. #5268

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 3 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Interested in GSoC 2026 Project 6: Interface for post-simulation analysis crawling of WESTPA simulations. #5268

Uh oh!

Uh oh!

kunjsinha Mar 1, 2026

Replies: 3 comments · 1 reply

Uh oh!

jeremyleung521 Mar 2, 2026 Collaborator

Uh oh!

kunjsinha Mar 4, 2026 Author

Uh oh!

kunjsinha Mar 27, 2026 Author

Uh oh!

jeremyleung521 Mar 28, 2026 Collaborator

kunjsinha
Mar 1, 2026

Replies: 3 comments 1 reply

jeremyleung521
Mar 2, 2026
Collaborator

kunjsinha
Mar 4, 2026
Author

kunjsinha
Mar 27, 2026
Author

jeremyleung521 Mar 28, 2026
Collaborator