not able to smirksify double bonds in different clusters for cis/trans butene

I am trying to use `chemper.smirksify.SMIRKSifier` to build a list of smarts patterns from clusters for cis- and trans-butene. Based on my clustering, CC single bonds should be discriminated between cis- and trans-butene. I.e. the CC single bonds are in the same cluster within a molecule but in different clusters between the two different molecules (see `CC_single_different` below). When running `chemper.smirksify.SMIRKSifier` to build the smarts list, I am getting the following error message:

```
ClusteringError: 
                      SMIRKSifier was not able to create SMIRKS for the provided
                      clusters with 5 layers. Try increasing the number of layers
                      or changing your clusters
```

I suppose this results from not finding matches with reference types, but this is just a guess. Is there a way to build a smarts list using `SMIRKSifier` that discriminates the two different CC single bonds? If not, what would be a good place in the code to start to implement that?

Here is an example that illustrates the above (see Buten.zip for rdkit molecules attached as JSON):

```
from chemper.mol_toolkits import mol_toolkit
from chemper.smirksify import SMIRKSifier, print_smirks
from rdkit import Chem

with open("./cis-Buten.json", "r") as fopen:
    cis_buten = Chem.JSONToMols(
        fopen.read()
    )[0]
with open("./trans-Buten.json", "r") as fopen:
    trans_buten = Chem.JSONToMols(
        fopen.read()
    )[0]


CC_single_different = [ 
    ('cc_single1', [[(0, 1), (2, 3)], []]),
    ('ch_single', [[(0, 4), (0, 5), (0, 6), (1, 7), (2, 8), (3, 9), (3, 10), (3, 11)],
            [(0, 4), (0, 5), (0, 6), (1, 7), (2, 8), (3, 9), (3, 10), (3, 11)]
           ]),
    ('cc_double', [[(1, 2)], [(1, 2)]]),
    ('cc_single2', [[], [(0, 1), (2, 3)]])
]

CC_single_same = [ 
    ('cc_single', [[(0, 1), (2, 3)], [(0, 1), (2, 3)]]),
    ('ch_single', [[(0, 4), (0, 5), (0, 6), (1, 7), (2, 8), (3, 9), (3, 10), (3, 11)],
           [(0, 4), (0, 5), (0, 6), (1, 7), (2, 8), (3, 9), (3, 10), (3, 11)]]),
    ('cc_double', [[(1, 2)], [(1, 2)]]),
]

molecules = [cis_buten, trans_buten]

### The following works nicely.
bond_smirksifier = SMIRKSifier(
    molecules, 
    CC_single_same, 
    max_layers=5,
    strict_smirks=True,
    verbose=False
)
smirks3k = bond_smirksifier.reduce(max_its=100, verbose=False)
print_smirks(smirks3k)

### The following will not work.
bond_smirksifier = SMIRKSifier(
    molecules, 
    CC_single_different, 
    max_layers=5,
    strict_smirks=True,
    verbose=False
)
smirks3k = bond_smirksifier.reduce(max_its=100, verbose=False)
print_smirks(smirks3k)
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

not able to smirksify double bonds in different clusters for cis/trans butene #100

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

not able to smirksify double bonds in different clusters for cis/trans butene #100

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions