Skip to content

not able to smirksify double bonds in different clusters for cis/trans butene #100

@wutobias

Description

@wutobias

I am trying to use chemper.smirksify.SMIRKSifier to build a list of smarts patterns from clusters for cis- and trans-butene. Based on my clustering, CC single bonds should be discriminated between cis- and trans-butene. I.e. the CC single bonds are in the same cluster within a molecule but in different clusters between the two different molecules (see CC_single_different below). When running chemper.smirksify.SMIRKSifier to build the smarts list, I am getting the following error message:

ClusteringError: 
                      SMIRKSifier was not able to create SMIRKS for the provided
                      clusters with 5 layers. Try increasing the number of layers
                      or changing your clusters

I suppose this results from not finding matches with reference types, but this is just a guess. Is there a way to build a smarts list using SMIRKSifier that discriminates the two different CC single bonds? If not, what would be a good place in the code to start to implement that?

Here is an example that illustrates the above (see Buten.zip for rdkit molecules attached as JSON):

from chemper.mol_toolkits import mol_toolkit
from chemper.smirksify import SMIRKSifier, print_smirks
from rdkit import Chem

with open("./cis-Buten.json", "r") as fopen:
    cis_buten = Chem.JSONToMols(
        fopen.read()
    )[0]
with open("./trans-Buten.json", "r") as fopen:
    trans_buten = Chem.JSONToMols(
        fopen.read()
    )[0]


CC_single_different = [ 
    ('cc_single1', [[(0, 1), (2, 3)], []]),
    ('ch_single', [[(0, 4), (0, 5), (0, 6), (1, 7), (2, 8), (3, 9), (3, 10), (3, 11)],
            [(0, 4), (0, 5), (0, 6), (1, 7), (2, 8), (3, 9), (3, 10), (3, 11)]
           ]),
    ('cc_double', [[(1, 2)], [(1, 2)]]),
    ('cc_single2', [[], [(0, 1), (2, 3)]])
]

CC_single_same = [ 
    ('cc_single', [[(0, 1), (2, 3)], [(0, 1), (2, 3)]]),
    ('ch_single', [[(0, 4), (0, 5), (0, 6), (1, 7), (2, 8), (3, 9), (3, 10), (3, 11)],
           [(0, 4), (0, 5), (0, 6), (1, 7), (2, 8), (3, 9), (3, 10), (3, 11)]]),
    ('cc_double', [[(1, 2)], [(1, 2)]]),
]

molecules = [cis_buten, trans_buten]

### The following works nicely.
bond_smirksifier = SMIRKSifier(
    molecules, 
    CC_single_same, 
    max_layers=5,
    strict_smirks=True,
    verbose=False
)
smirks3k = bond_smirksifier.reduce(max_its=100, verbose=False)
print_smirks(smirks3k)

### The following will not work.
bond_smirksifier = SMIRKSifier(
    molecules, 
    CC_single_different, 
    max_layers=5,
    strict_smirks=True,
    verbose=False
)
smirks3k = bond_smirksifier.reduce(max_its=100, verbose=False)
print_smirks(smirks3k)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions