4. Isomorphism

    1. 2019, 2020 Dr. Ramil Nugmanov;

    1. 2019 Dr. Timur Madzhidov; Ravil Mukhametgaleev

    1. 2022 Valentina Afonina

Installation instructions of CGRtools package information and tutorial’s files see on https://github.com/cimm-kzn/CGRtools

NOTE: Tutorial should be performed sequentially from the start. Random cell running will lead to unexpected results.

[1]:
import pkg_resources
if pkg_resources.get_distribution('CGRtools').version.split('.')[:2] != ['4', '1']:
    print('WARNING. Tutorial was tested on 4.1 version of CGRtools')
else:
    print('Welcome!')
Welcome!
[2]:
# load data for tutorial
from pickle import load
from traceback import format_exc

with open('molecules.dat', 'rb') as f:
    molecules = load(f) # list of MoleculeContainer objects
with open('reactions.dat', 'rb') as f:
    reactions = load(f) # list of ReactionContainer objects

m2, m3 = molecules[1:3] # molecule
m7 = m3.copy()
m7.standardize()
r1 = reactions[0] # reaction
m5, m6 = r1.reactants[:2]
m8 = m7.substructure([4, 5, 6, 7, 8, 9])
m9 = m6.substructure([5, 6,7, 8]) # acid
m10 =  r1.products[0].copy()

benzene = m3.substructure([4,5,6,7,8,9])
cgr1 = m7 ^ m8
carb = m10.substructure([5,7,2])

from CGRtools.containers import *

4.1. Molecules Isomorphism

CGRtools has simple substructure/structure isomorphism API.

Note, that atoms are matched in subgraph isomorphism only if they have same charge/multiplicity and isotope options.

[3]:
m7
[3]:
../_images/tutorial_4_isomorphism_4_0.svg
[4]:
m8
[4]:
../_images/tutorial_4_isomorphism_5_0.svg
[5]:
benzene
[5]:
../_images/tutorial_4_isomorphism_6_0.svg
[6]:
# isomorphism operations
print(benzene < m7)  # benzene is substructure of m7
print(benzene > m7)  # benzene is not superstructure of m7
print(benzene <= m7) # benzene is substructure/or same structure of m7
print(benzene >= m7) # benzene is not superstructure/or same structure of m7
print(benzene < m8) # benzene is not substructure of m8. it's equal
print(benzene <= m8)
True
False
True
False
False
True
[7]:
m5
[7]:
../_images/tutorial_4_isomorphism_8_0.svg
[8]:
m6
[8]:
../_images/tutorial_4_isomorphism_9_0.svg

Mappings of substructure or structure to structure can be returned using substructure.get_mapping(structure) method. Method acts as generator.

This functionality was developed to reorder atoms of two MoleculeContainers in the same order (the dictionary that is given by this method could be directly fed to remap function, see above) for some reaction handling issues.

[9]:
next(m5.get_mapping(m6))  # mapping of m5 substructure into m2 superstructure
[9]:
{3: 5, 1: 10, 4: 6}
[10]:
for m in m5.get_mapping(m6):  # iterate over all possible substructure mappings
    print(m)
{3: 5, 1: 10, 4: 6}
{3: 6, 1: 8, 4: 5}
[11]:
next(benzene.get_mapping(m8))  # mapping of benzene into m8 - also benzene.
[11]:
{5: 8, 4: 9, 9: 4, 8: 5, 7: 6, 6: 7}

4.2. Reactions

ReactionContainers do not support isomorphism due to ambiguity. But molecules in reaction can be matched.

[12]:
try:            # it is not possible to match molecule and reaction. Error is returned (or False in the case m6 < r1)
    m6 <= r1
except TypeError:
    print(format_exc())
Traceback (most recent call last):
  File "/tmp/ipykernel_2099236/2041125140.py", line 2, in <module>
    m6 <= r1
  File "/home/valia/miniconda3/envs/cgrtools-master/lib/python3.10/site-packages/CGRtools/algorithms/isomorphism.py", line 52, in __le__
    return self.is_substructure(other)
  File "/home/valia/miniconda3/envs/cgrtools-master/lib/python3.10/site-packages/CGRtools/algorithms/isomorphism.py", line 67, in is_substructure
    next(self.get_mapping(other))
  File "/home/valia/miniconda3/envs/cgrtools-master/lib/python3.10/site-packages/CGRtools/containers/molecule.py", line 451, in get_mapping
    raise TypeError('MoleculeContainer expected')
TypeError: MoleculeContainer expected

[13]:
r1 # see structure in products
[13]:
../_images/tutorial_4_isomorphism_16_0.svg
[14]:
m6 # substructure used. One can see, they should not match
[14]:
../_images/tutorial_4_isomorphism_17_0.svg
[15]:
any(m6 < m for m in r1.products) # check if any molecule from product side has m6 as substructure
[15]:
True
[16]:
any(m6 <= m for m in r1.reactants) # check if any molecule from reactants side has m6 as substructure or is the same as m6
[16]:
True

4.3 CGR

Substructure search is possible with CGRContainer. API is the same as for molecules.

Matching CGR into CGR only possible.

Equal atoms in isomorphism is atoms with same charge/radical and isotope numbers in reactant and product states

[17]:
cgr1
[17]:
../_images/tutorial_4_isomorphism_21_0.svg
[18]:
cgr_q = cgr1.substructure([10, 4])
cgr_q  # prepare CGR Query with carbon - nitrogen bond breaking
[18]:
../_images/tutorial_4_isomorphism_22_0.svg
[19]:
cgr_q <= cgr1  # found substructure!
[19]:
True

4.4 Queries

Queries (QueryContainer, QueryCGRContainer) is special objects wich takes into account neighbors and hybridization state of atoms in molecules or CGRs.

Queries can be generated from molecules or CGRs by substructure method with as_query argument or by union with QueryContainer object

[20]:
carb_q = m10.substructure([5,7,2], as_query=True)
carb_q # notice that one of oxygen atom has 2 neighbors. Only ester could fit this restriction.
[20]:
../_images/tutorial_4_isomorphism_25_0.svg
[21]:
m9 # acid
[21]:
../_images/tutorial_4_isomorphism_26_0.svg
[22]:
m10 # ether
[22]:
../_images/tutorial_4_isomorphism_27_0.svg
[23]:
carb
[23]:
../_images/tutorial_4_isomorphism_28_0.svg

Molecules isomorphism don’t take into account neighbors and hybridization

[24]:
carb < m9 # carb is molecule. It fit this molecule as well.
[24]:
True
[25]:
carb < m10 # carb is a substructure of m10
[25]:
True

One need to convert molecule into QueryContainer object. In this case number of neighbors and hybridization data will be taken into account upon substructure search.

[26]:
carb_q < m9 # now neighbors and hybridization are taken into account.
[26]:
False

Acid m9 has hydroxyl group with one non-hydrogen neighbor. Our query requires existence of one oxygen atom with two neighbors.

[27]:
carb_q < m10 # ester matches to query.
[27]:
True