-
Notifications
You must be signed in to change notification settings - Fork 7
Initial Molecule Choices: My Few Thoughts #13
Description
Hey everyone,
I hope you're all not too overwhelmed with all the changes to the GitHub repo lately! I've ran into a lot of extra time to focus on this work and brainstorm with @mrshirts since the end of the semester and I have a few thoughts on which molecules we might want to start with for the parameterization process. I'd like to preface this discussion with, "Please take some time to look through what I've changed in my last few merges to the main repo. Pay close attention to the 'Initial Molecule Choices' directory as there is a lot of new information there which is hopefully explained in a succinct and accessible manner." And with that, please read on with an open (yet critical) mind.
My thoughts on potential choices are binary, depending on what type of diversity in molecular structure we'd like to begin with. I'll split these potential choices into those excluding aromatic bonds (set "XAr") and those including aromatic bonds (set "Ar"). Given the wide spread of data across properties for the highest ranked (highest number of total data points) on 'allcomp_counts_interesting.csv' in the 'Initial Molecule Choices' directory, it should be sufficient to start with as few as 5 molecules for either set.
Set "XAr": water, ethanol, 1-butanol, heptane, methyl tert-butyl ether
-Why?
-Significant data coverage across all individual species (see 'allcomp_counts_interesting.csv')
-Fair data coverage for mixture combinations (see 'mix_counts_interesting.csv')
Set "Ar": water, 1-butanol, heptane, methyl tert-butyl ether, toluene
-For the same reasons as the "XAr" set
Note that some of the properties have a very small range of molecules (or combination of molecules) for which there is data. Therefore, it is not possible to cover every property of interest with the suggested sets above. An alternative method to remedy this, depending on how many molecules we would be willing to start with, would be to go through the top pure solvents and mixtures per property and use all molecules in that list. It is very likely that the diversity remains high even then.