ClusteringError:
SMIRKSifier was not able to create SMIRKS for the provided
clusters with 5 layers. Try increasing the number of layers
or changing your clusters
I suppose this results from not finding matches with reference types, but this is just a guess. Is there a way to build a smarts list using SMIRKSifier that discriminates the two different CC single bonds? If not, what would be a good place in the code to start to implement that?
Here is an example that illustrates the above (see Buten.zip for rdkit molecules attached as JSON):
from chemper.mol_toolkits import mol_toolkit
from chemper.smirksify import SMIRKSifier, print_smirks
from rdkit import Chem
with open("./cis-Buten.json", "r") as fopen:
cis_buten = Chem.JSONToMols(
fopen.read()
)[0]
with open("./trans-Buten.json", "r") as fopen:
trans_buten = Chem.JSONToMols(
fopen.read()
)[0]
CC_single_different = [
('cc_single1', [[(0, 1), (2, 3)], []]),
('ch_single', [[(0, 4), (0, 5), (0, 6), (1, 7), (2, 8), (3, 9), (3, 10), (3, 11)],
[(0, 4), (0, 5), (0, 6), (1, 7), (2, 8), (3, 9), (3, 10), (3, 11)]
]),
('cc_double', [[(1, 2)], [(1, 2)]]),
('cc_single2', [[], [(0, 1), (2, 3)]])
]
CC_single_same = [
('cc_single', [[(0, 1), (2, 3)], [(0, 1), (2, 3)]]),
('ch_single', [[(0, 4), (0, 5), (0, 6), (1, 7), (2, 8), (3, 9), (3, 10), (3, 11)],
[(0, 4), (0, 5), (0, 6), (1, 7), (2, 8), (3, 9), (3, 10), (3, 11)]]),
('cc_double', [[(1, 2)], [(1, 2)]]),
]
molecules = [cis_buten, trans_buten]
### The following works nicely.
bond_smirksifier = SMIRKSifier(
molecules,
CC_single_same,
max_layers=5,
strict_smirks=True,
verbose=False
)
smirks3k = bond_smirksifier.reduce(max_its=100, verbose=False)
print_smirks(smirks3k)
### The following will not work.
bond_smirksifier = SMIRKSifier(
molecules,
CC_single_different,
max_layers=5,
strict_smirks=True,
verbose=False
)
smirks3k = bond_smirksifier.reduce(max_its=100, verbose=False)
print_smirks(smirks3k)
I am trying to use
chemper.smirksify.SMIRKSifierto build a list of smarts patterns from clusters for cis- and trans-butene. Based on my clustering, CC single bonds should be discriminated between cis- and trans-butene. I.e. the CC single bonds are in the same cluster within a molecule but in different clusters between the two different molecules (seeCC_single_differentbelow). When runningchemper.smirksify.SMIRKSifierto build the smarts list, I am getting the following error message:I suppose this results from not finding matches with reference types, but this is just a guess. Is there a way to build a smarts list using
SMIRKSifierthat discriminates the two different CC single bonds? If not, what would be a good place in the code to start to implement that?Here is an example that illustrates the above (see Buten.zip for rdkit molecules attached as JSON):