Pybel

Pybel provides convenience functions and classes that make it simpler to use the Open Babel libraries from Python, especially for file input/output and for accessing the attributes of atoms and molecules. The Atom and Molecule classes used by Pybel can be converted to and from the OBAtom and OBMol used by the openbabel module. These features are discussed in more detail below.

The rationale and technical details behind Pybel are described in O’Boyle et al [omh2008]. To support further development of Pybel, please cite this paper if you use Pybel to obtain results for publication.

Information on the Pybel API can be found at the interactive Python prompt using the help() function. The full API is also listed in the next section (see Pybel API).

To use Pybel, use from openbabel import pybel.

[omh2008]N.M. O’Boyle, C. Morley and G.R. Hutchison. Pybel: a Python wrapper for the OpenBabel cheminformatics toolkit. Chem. Cent. J. 2008, 2, 5. [Link]

Atoms and Molecules

A Molecule can be created in any of three ways:

  1. From an :obapi:`OBMol`, using Molecule(myOBMol)
  2. By reading from a file (see Input/Output below)
  3. By reading from a string (see Input/Output below)

An Atom be created in two different ways:

  1. From an :obapi:`OBAtom`, using Atom(myOBAtom)
  2. By accessing the atoms attribute of a Molecule

Molecules have the following attributes: atoms, charge, data, dim, energy, exactmass, formula, molwt, spin, sssr, title and unitcell (if crystal data). The atoms attribute provides a list of the Atoms in a Molecule. The data attribute returns a dictionary-like object for accessing and editing the data fields associated with the molecule (technically, it’s a MoleculeData object, but you can use it like it’s a regular dictionary). The unitcell attribute gives access to any unit cell data associated with the molecule (see :obapi:`OBUnitCell`). The remaining attributes correspond directly to attributes of OBMols: e.g. formula is equivalent to :obapi:`OBMol::GetFormula() <OpenBabel::OBMol::GetFormula>`. For more information on what these attributes are, please see the Open Babel C++ documentation for :obapi:`OBMol`.

For example, let’s suppose we have an SD file containing descriptor values in the data fields:

>>> mol = next(readfile("sdf", "calculatedprops.sdf")) # (readfile is described below)
>>> print(mol.molwt)
100.1
>>> print(len(mol.atoms))
16
>>> print(mol.data.keys())
{'Comment': 'Created by CDK', 'NSC': 1, 'Hydrogen Bond Donors': 3,
 'Surface Area': 342.43, .... }
>>> print(mol.data['Hydrogen Bond Donors'])
3
>>> mol.data['Random Value'] = random.randint(0,1000) # Add a descriptor containing noise

Molecules have a write() method that writes a representation of a Molecule to a file or to a string. See Input/Output below. They also have a calcfp() method that calculates a molecular fingerprint. See Fingerprints below.

The draw() method of a Molecule generates 2D coordinates and a 2D depiction of a molecule. It uses the OASA library by Beda Kosata to do this. The default options are to show the image on the screen (show=True), not to write to a file (filename=None), to calculate 2D coordinates (usecoords=False) but not to store them (update=False).

The addh() and removeh() methods allow hydrogens to be added and removed.

If a molecule does not have 3D coordinates, they can be generated using the make3D() method. By default, this includes 50 steps of a geometry optimisation using the MMFF94 forcefield. The list of available forcefields is stored in the forcefields variable. To further optimise the structure, you can use the localopt() method, which by default carries out 500 steps of an optimisation using MMFF94. Note that hydrogens need to be added before calling localopt().

The calcdesc() method of a Molecule returns a dictionary containing descriptor values for LogP, Polar Surface Area (“TPSA”) and Molar Refractivity (“MR”). A list of the available descriptors is contained in the variable descs.

If only one or two descriptor values are required, you can specify the names as follows: calcdesc(["LogP", "TPSA"]). Since the data attribute of a Molecule is also a dictionary, you can easily add the result of calcdesc() to an SD file (for example) as follows:

mol = next(readfile("sdf", "without_desc.sdf"))
descvalues = mol.calcdesc()
# In Python, the update method of a dictionary allows you
# to add the contents of one dictionary to another
mol.data.update(descvalues)
output = Outputfile("sdf", "with_desc.sdf")
output.write(mol)
output.close()

For convenience, a Molecule provides an iterator over its Atoms. This is used as follows:

for atom in myMolecule:
   # do something with atom

Atoms have the following attributes: atomicmass, atomicnum, coords, exactmass, formalcharge, heavyvalence, heterovalence, hyb, idx, implicitvalence, isotope, partialcharge, spin, type, valence, vector. The .coords attribute provides a tuple (x, y, z) of the atom’s coordinates. The remaining attributes are as for the Get methods of :obapi:`OBAtom`.

Input/Output

One of the strengths of Open Babel is the number of chemical file formats that it can handle (see Supported File Formats and Options). Pybel provides a dictionary of the input and output formats in the variables informats and outformats where the keys are the three-letter codes for each format (e.g. pdb) and the values are the descriptions (e.g. Protein Data Bank format).

Pybel greatly simplifies the process of reading and writing molecules to and from strings or files. There are two functions for reading Molecules:

  1. readstring() reads a Molecule from a string
  2. readfile() provides an iterator over the Molecules in a file

Here are some examples of their use. Note in particular the use of next() to access the first (and possibly only) molecule in a file:

>>> mymol = readstring("smi", "CCCC")
>>> print(mymol.molwt)
58
>>> for mymol in readfile("sdf", "largeSDfile.sdf")
... print(mymol.molwt)
>>> singlemol = next(readfile("pdb", "1CRN.pdb"))

If a single molecule is to be written to a molecule or string, the write() method of the Molecule should be used:

  1. mymol.write(format) returns a string
  2. mymol.write(format, filename) writes the Molecule to a file. An optional additional parameter, overwrite, should be set to True if you wish to overwrite an existing file.

For files containing multiple molecules, the Outputfile class should be used instead. This is initialised with a format and filename (and optional overwrite parameter). To write a Molecule to the file, the write() method of the Outputfile is called with the Molecule as a parameter. When all molecules have been written, the close() method of the Outputfile should be called.

Here are some examples of output using the Pybel methods and classes:

>>> print(mymol.write("smi"))
'CCCC'
>>> mymol.write("smi", "outputfile.txt")
>>> largeSDfile = Outputfile("sdf", "multipleSD.sdf")
>>> largeSDfile.write(mymol)
>>> largeSDfile.write(myothermol)
>>> largeSDfile.close()

Fingerprints

A Fingerprint can be created in either of two ways:

  1. From a vector returned by the OpenBabel GetFingerprint() method, using Fingerprint(myvector)
  2. By calling the calcfp() method of a Molecule

The calcfp() method takes an optional argument, fptype, which should be one of the fingerprint types supported by OpenBabel (see Molecular fingerprints and similarity searching). The list of supported fingerprints is stored in the variable fps. If unspecified, the default fingerprint (FP2) is calculated.

Once created, the Fingerprint has two attributes: fp gives the original OpenBabel vector corresponding to the fingerprint, and bits gives a list of the bits that are set.

The Tanimoto coefficient of two Fingerprints can be calculated using the | operator.

Here is an example of its use:

>>> from openbabel import pybel
>>> smiles = ['CCCC', 'CCCN']
>>> mols = [pybel.readstring("smi", x) for x in smiles] # Create a list of two molecules
>>> fps = [x.calcfp() for x in mols] # Calculate their fingerprints
>>> print(fps[0].bits, fps[1].bits)
[261, 385, 671] [83, 261, 349, 671, 907]
>>> print(fps[0] | fps[1]) # Print the Tanimoto coefficient
0.3333

SMARTS matching

Pybel also provides a simplified API to the Open Babel SMARTS pattern matcher. A Smarts object is created, and the findall() method is then used to return a list of the matches to a given Molecule.

Here is an example of its use:

>>> mol = readstring("smi","CCN(CC)CC") # triethylamine
>>> smarts = Smarts("[#6][#6]") # Matches an ethyl group
>>> print(smarts.findall(mol))
[(1, 2), (4, 5), (6, 7)]

Combining Pybel with openbabel.py

It is easy to combine the ease of use of Pybel with the comprehensive coverage of the Open Babel toolkit that openbabel.py provides. Pybel is really a wrapper around openbabel.py, with the result that the OBAtom and OBMol used by openbabel.py can be interconverted to the Atom and Molecule used by Pybel.

The following example shows how to read a molecule from a PDB file using Pybel, and then how to use openbabel.py to add hydrogens. It also illustrates how to find out information on what methods and classes are available, while at the interactive Python prompt.

>>> from openbabel import pybel
>>> mol = next(pybel.readfile("pdb", "1PYB"))
>>> help(mol)
Help on Molecule in module pybel object:
...
 |  Attributes:
 |     atoms, charge, dim, energy, exactmass, flags, formula,
 |     mod, molwt, spin, sssr, title.
...
 |  The original Open Babel molecule can be accessed using the attribute:
 |     OBMol
...
>>> print(len(mol.atoms), mol.molwt)
3430 49315.2
>>> dir(mol.OBMol) # Show the list of methods provided by openbabel.py
['AddAtom', 'AddBond', 'AddConformer', 'AddHydrogens', 'AddPolarHydrogens', ... ]
>>> mol.OBMol.AddHydrogens()
>>> print(len(mol.atoms), mol.molwt)
7244 49406.0

The next example is an extension of one of the openbabel.py examples at the top of this page. It shows how a molecule could be created using openbabel.py, and then written to a file using Pybel:

from openbabel import openbabel, pybel

mol = openbabel.OBMol()
a = mol.NewAtom()
a.SetAtomicNum(6)   # carbon atom
a.SetVector(0.0, 1.0, 2.0) # coordinates
b = mol.NewAtom()
mol.AddBond(1, 2, 1)   # atoms indexed from 1

pybelmol = pybel.Molecule(mol)
pybelmol.write("sdf", "outputfile.sdf")