7. Input-output operations
2019, 2020 Dr. Ramil Nugmanov;
2019 Dr. Timur Madzhidov; Ravil Mukhametgaleev
(с) 2022 Valentina Afonina
Installation instructions of CGRtools package information and tutorial’s files see on https://github.com/cimm-kzn/CGRtools
NOTE: Tutorial should be performed sequentially from the start. Random cell running will lead to unexpected results.
[1]:
import pkg_resources
if pkg_resources.get_distribution('CGRtools').version.split('.')[:2] != ['4', '1']:
print('WARNING. Tutorial was tested on 4.1 version of CGRtools')
else:
print('Welcome!')
Welcome!
[2]:
# load data for tutorial
from pickle import load
from traceback import format_exc
with open('reactions.dat', 'rb') as f:
reactions = load(f) # list of ReactionContainer objects
r1 = reactions[0] # reaction
cgr2 = ~r1
CGRtools.files subpackage contains file readers and writers classes.
7.1. MDL RDF reader
RDFread class can be used for RDF files reading. Instance of this class is file-like object which support iteration, has a method read() for parsing all data and context manager.
7.1.1. Read file from disk
[3]:
from CGRtools.files import * # import all available readers and writers
with RDFRead('example.rdf') as f:
first = next(f) # get first reaction using generator
data = f.read() # read remaining reactions to list of ReactionContainers
data = []
with RDFRead('example.rdf') as f:
for r in f: # looping is supported. Useful for large files.
data.append(r)
with RDFRead('example.rdf') as f:
data = [r for r in f] # list comprehensions application. Result is equivalent to f.read()
OOP-stype Pathlib supported
[4]:
from pathlib import Path
with RDFRead(Path('example.rdf')) as r: # OOP style call
r = next(r)
opened files supported
RDF file should be opened in text mode
[5]:
with open('example.rdf') as f, RDFRead(f) as r:
r = next(r) # OOP style application
7.1.2. Transparent loading from archives and network
Readers designed transparently support any type of data sources.
Page http://seafile.cimm.site/f/e3415b9aae354f4bbfc1/?dl=1 returns RDF file.
Data sources should be file-like objects.
[6]:
from requests import get
from io import StringIO
# get function return requested URL which has attribute text.
# in example this text is whole RDF stored in single string.
# RDFread does not support parsing of strings, but one can emulate files with data
# instead of strings by using io.StringIO
with StringIO(get('http://seafile.cimm.site/f/e3415b9aae354f4bbfc1/?dl=1').text) as f, RDFRead(f) as r:
r = next(r)
print(r, 'StringIO downloaded from network data')
# python support gzipped data. This example shows how to work with compressed
# data directly without decompressing them to disk.
from gzip import open as gzip_open
with gzip_open('example.rdf.gz', 'rt') as f, RDFRead(f) as r:
r = next(r)
print(r, 'gzipped file')
# This example shows how to write data directly to gzipped file
from gzip import open as gzip_open
with gzip_open('example_to_write.rdf.gz', 'wt') as gzf, RDFWrite(gzf) as out_file:
for reaction in data:
out_file.write(reaction)
# zip-files also supported out of the box
# zipped files can be opened only in binary mode. io.TextIOWrapper can be used for transparent decoding them into text
from zipfile import ZipFile
from io import TextIOWrapper
with ZipFile('example.zip') as z, z.open('example.rdf') as c:
with TextIOWrapper(c) as f, RDFRead(f) as r:
r = next(r)
print(r, 'zip archive')
# tar-file reading example
from tarfile import open as tar_open
from io import TextIOWrapper
with tar_open('example.tar.gz') as t:
c = t.extractfile('example.rdf')
with TextIOWrapper(c) as f, RDFRead(f) as r:
r = next(r)
print(r, 'gzipped tar archive')
C(C(=O)O)(=O)O.C(O)C.C(O)C>>O(CC)C(C(OCC)=O)=O StringIO downloaded from network data
C(C(=O)O)(=O)O.C(O)C.C(O)C>>O(CC)C(C(OCC)=O)=O gzipped file
C(C(=O)O)(=O)O.C(O)C.C(O)C>>O(CC)C(C(OCC)=O)=O zip archive
C(C(=O)O)(=O)O.C(O)C.C(O)C>>O(CC)C(C(OCC)=O)=O gzipped tar archive
7.2. Other Readers
SDFRead - MOL, SDF files reader (versions v2000, v3000 are supported)
MRVRead - ChemAxon MRV files reader (lxml parser is used)
SMILESRead - SMILES strings files reader (coho backend used). Every row should start with new SMILES
INCHIRead - INCHI strings files reader (INCHI trust backend used). Every row should start with new InChI
XYZRead - xyz files reader (only structures with explicit hydrogens supported)
PDBRead - PDB files parser (only structures with explicit hydrogens supported)
open('/path/to/data.mrv', 'rb')
[7]:
with MRVRead(open('example.mrv', 'rb')) as f:
mrv = next(f)
mrv
[7]:
7.3. File writers
Export in following file formats is supported: * RDFWrite (v2000) - molecules and reactions export in RDF format * SDFWrite (v2000) - molecules and CGR export in SDF format * MRVWrite - molecules and reactions export in MRV format
Writers has the same API as readers. All writers work with text-files Writers has write
method which accepts as argument single reaction, molecule or CGR object
[8]:
with RDFWrite('out.rdf') as f: # context manager supported
for r in data:
f.write(r)
# file out.rdf will be overriden
[9]:
f = RDFWrite('out.rdf') # ongoing writing into a single file
for r in data:
f.write(r)
f.write(r1)
f.close() # close file. Flushes Python writer buffers.
7.4. CGR can be stored in MDL SDF and loaded from.
White-paper with SDF-CGR specification is described in manusript Supporting Materials.
[10]:
from CGRtools.files import *
from io import StringIO
with StringIO() as f, SDFWrite(f) as w:
w.write(cgr2) # file writing in SDF format
mdl = f.getvalue() # get formatted file to print out
print(mdl) # It is how CGR looks like.
# Notice that most of field are conventional MOL fields, S-queries are used for dynamic bond and atom specification
12 11 0 0 0 0 999 V2000
4.4914 1.4289 0.0000 O 0 0 0 0 0 0 0 0 0 8 0 0
2.4289 0.7145 0.0000 O 0 0 0 0 0 0 0 0 0 10 0 0
1.4289 0.4125 0.0000 O 0 0 0 0 0 0 0 0 0 1 0 0
6.9203 0.4125 0.0000 O 0 5 0 0 0 0 0 0 0 2 0 0
0.0000 0.4125 0.0000 C 0 0 0 0 0 0 0 0 0 3 0 0
0.7144 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 4 0 0
3.6664 1.4289 0.0000 C 0 0 0 0 0 0 0 0 0 5 0 0
3.2539 0.7145 0.0000 C 0 0 0 0 0 0 0 0 0 6 0 0
3.2539 2.1434 0.0000 O 0 0 0 0 0 0 0 0 0 7 0 0
3.6664 0.0000 0.0000 O 0 0 0 0 0 0 0 0 0 9 0 0
6.2058 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 11 0 0
5.4914 0.4125 0.0000 C 0 0 0 0 0 0 0 0 0 12 0 0
1 7 8 0 0 0 0
2 8 8 0 0 0 0
3 6 1 0 0 0 0
3 8 8 0 0 0 0
4 11 1 0 0 0 0
4 7 8 0 0 0 0
5 6 1 0 0 0 0
7 8 1 0 0 0 0
7 9 2 0 0 0 0
8 10 2 0 0 0 0
11 12 1 0 0 0 0
M STY 5 1 DAT 2 DAT 3 DAT 4 DAT 5 DAT
M SAL 1 1 4
M SDT 1 dynatom
M SDD 1 0.0000 0.3333 DAU ALL 0 0
M SED 1 c+1
M SAL 2 2 1 7
M SDT 2 dynbond
M SDD 2 0.0000 0.6667 DAU ALL 0 0
M SED 2 1>0
M SAL 3 2 2 8
M SDT 3 dynbond
M SDD 3 0.0000 1.0000 DAU ALL 0 0
M SED 3 1>0
M SAL 4 2 3 8
M SDT 4 dynbond
M SDD 4 0.0000 1.3333 DAU ALL 0 0
M SED 4 0>1
M SAL 5 2 4 7
M SDT 5 dynbond
M SDD 5 0.0000 1.6667 DAU ALL 0 0
M SED 5 0>1
M END
> <CdId>
1872
> <solvent>
3
> <temperature>
129.5
> <tabulated_constant>
-6.87
$$$$
[11]:
with StringIO(mdl) as f, SDFRead(f) as r: # import SDF file with CGR
cgr3 = next(r)
print(cgr3)
print(type(cgr3))
C(O[.>-]C(C([.>-][O->0]CC)([->.]O)=O)(=O)[->.]O)C
<class 'CGRtools.containers.cgr.CGRContainer'>
7.5. Pickle support
CGRtools containers fully support pickle dumping and loading.
Pickle dumps are more compact than MDL files and could be used as temporal storage.
[12]:
from pickle import loads, dumps
[13]:
loads(dumps(r1)) # load reaction from Pickle dump
[13]: