ChemSpider

Version Release: August 21st 2007

User Guide

An Online Chemical Structure Database

and Properties Generation Service BETA

 

 

 

 

 

ChemZoo, Inc.

Copyright © 2007 ChemZoo, Inc.  All rights reserved.

ChemSpider is a trademark of ChemZoo, Inc.

While all content is believed to be correct at the time of publication, it is provided solely for general-purpose information.  The content is supplied “As-Is”.

ChemSketch is a registered trademarks of ACD/Labs Inc. All rights reserved.

 

Information in this document is subject to change without notice and is provided “as is” with no warranty.  ChemZoo, Inc., makes no warranty of any kind with regard to this material. ChemZoo, Inc., shall not be liable for errors contained herein or for any direct, indirect, special, incidental, or consequential damages in connection with the use of this material.


About ChemSpider

ChemSpider is a chemistry search engine. It has been built with the intention of aggregating and indexing chemical structures and their associated information into a single searchable repository and making it available to everybody, at no charge. ChemSpider is a value-added offering since many properties have been added to each of the chemical structures within the database – structure identifiers such as SMILES, InChI, IUPAC and Index Names as well as many physicochemical properties. We intend ChemSpider to offer the fastest chemical structure searches available online and delivered with the flexibility and usability necessary to encourage repeat usage.

What problems will ChemSpider solve? There are tens if not hundreds of chemical structure databases and no single way to search across them. There are databases of curated literature data, chemical vendor catalogs, molecular properties, environmental data, toxicity data, analytical data and on and on.

The only way to know whether a specific piece of information is available for a chemical structure is to have simultaneous access to all of these databases. Since many of these databases are for profit there is no way to easily determine the availability of information within these commercial or even in the open access databases. With ChemSpider the intention is to aggregate into a single database all chemical structures available within open access and commercial databases and to provide the necessary pointers from the ChemSpider search engine to the information of interest. This service will allow users to either access the data immediately via open access links or have the information necessary to continue their searches into commercially available systems. The question “is there specific information about my chemical” will be answered. Accessing the information may require a commercial transaction with the appropriate provider.

About this User Guide

This guide provides an overview of how to use the capabilities at www.chemspider.com. It is designed for either online use or to be printed and used as a hard copy manual.

This user guide is provided in electronic form and is readable in any PDF reader. If you cannot locate a particular topic please do a text string search for the relevant word or phrase.

For More Information…

The ChemSpider website is a work in progress and will be updated on an ongoing basis. The URL is  

http://www.chemspider.com/

Our Web site will likely be accessed at the rate of thousands of hits per day once it gains exposure to the user base.  We will be constantly updating the information on the Web site.  As well as the search and prediction transaction tools available at the site you can read our Spinneret webzine, can participate on our blog when it is posted and browse the Frequently Asked Questions.

If you would like to stay informed about ChemSpider, please sign up for e-mail broadcasts at:

http://www.chemspider.com/Register.aspx

 

How to Contact Us

We are accessible in numerous ways as detailed at:

http://www.chemspider.com/Contact.aspx

General issues should be directed via email to:

info-at-chemspider-dot-com


1.1     An Overview

ChemSpider is delivered to you via a website and offers numerous tools of general value to chemists. It allows you to search a chemical database containing millions of chemical structures and various associated property information. The searches can be based on alphanumeric text or chemical structure.

Also available are a number of transactional services utilizing either structural inputs or produce structural outputs.

The services available are:

·       Generation of systematic identifiers – SMILES and InChI strings

·       Conversion of a systematic name, a trivial name, a registry number, a SMILES string or InChI code to a chemical structure

·       The prediction of physicochemical properties based on ACD/Lab algorithms

1.2     Related Software

ChemSpider is integrated to two chemical structure drawing applets – a Structure Drawing Applet and a PC-based drawing package, ACD/ChemSketch. Both of these software components are described in separate documentation and will not be discussed in detail in this manual.

2.1     Objectives

This chapter will familiarize you with

·       Navigating the ChemSpider website;

·       General information on the ChemSpider interface; and

·       Features available within the ChemSpider website.

2.2     Navigating ChemSpider

To access ChemSpider enter www.chemspider.com into the navigation entry box of an Internet Browser. ChemSpider has been rigorously tested on Internet Explorer version 7 and exercised on Mozilla/Firefox. The Home Page offers the route by which to navigate across the majority of the website. The Navigation interface is illustrated below with descriptive annotations.

ChemSpider Home

 

    

 

 

 

 

 

 

 

 

 

 

·       About ChemSpider  - provides an overview about ChemSpider keeps you updated with news and views

·       Search – This is the search interface to allow access to the ChemSpider database

·       Predictions – Provides online access to structure based predictions of systematic identifiers and physicochemical based predictions

·       Help and FAQ – Access to manuals and frequently asked questions. This page will be updated on an ongoing basis

·       Contact us – Contact information for ChemSpider

·       Spinneret Webzine – Weaving the Chemical Web one molecule at a time (the writings of David Bradley)

 

2.2 Search

The ChemSpider search screen illustrated below allows searching according to identifier, properties and calculated properties. Each will be explained in detail.

2.2.1     Search by Identifier

The Identifier Search allows searching of the database by five structure identifiers. These are systematic names, trivial names, registry numbers of various types, SMILES codes and InChI strings. These are defined in more detail below. If the “Match Full Name” box is checked then the search will find the exact string only. If the box is not checked then a search for the term “oxazole” will find all chemical names containing that substring.

 

Identifier Type

Systematic Name – this includes two primary types of systematic nomenclature – IUPAC names and Index (CAS-type) names.

Trivial Name – this allows searching by one or more trivial or trade names associated with a chemical structure.

Registry Numbers – search on various registry numbers associated with chemical structures

SMILES strings – the SMILES strings listed in the database are those generated using the ACD/Labs software.

InChI strings – InChI strings are canonicalized by default. Since all InChI strings utilize the InChI DLLs provided by IUPAC searching should invoke a simple copy and paste into the search box.

 

Some example identifiers for the same molecule are shown below:

Systematic names: 8-chloro-1-methyl-6-phenyl-4H-[1,2,4]triazolo[4,3-a][1,4]benzodiazepine

Trivial names: Xanax

Registry numbers: 249-349-2 (EINECS Number), 28981-97-7

SMILES codes: Clc2cc3/C(=N\Cc1nnc(C)n1c3cc2)c4ccccc4

InChI strings: InChI=1/C17H13ClN4/c1-11-20-21-16-10-19-17(12-5-3-2-4-6-12)14-9-13(18)7-8-15(14)22(11)16/h2-9H,10H2,1H3

2.2.2     Search by Properties

The Search by Properties search provides the ability to search by one or more properties intrinsic to the molecule. At BETA release it is only possible to search on the exact molecular formula. The molecular Weight (Mw) and various mass searches can be searched by “defect”, a common search method for analytical scientists. Further details are given in the table below.

 

 

Property Type

Empirical Formula – A molecular formula is a concise way of expressing information about the atoms that constitute a particular chemical compound. For molecular compounds, it identifies each constituent element by its chemical symbol and indicates the number of atoms of each element found in each discrete molecule of that compound.

Molecular Weight – The molecular weight (Mw) of a substance is the mass of one molecule of that substance, relative to the unified atomic mass unit u (equal to 1/12 the mass of one atom of carbon-12)

Nominal Mass – The Nominal Mass (Mn) of a molecule is the sum of the approximated monoisotopic masses of the elements forming the molecule.

Average Mass – The Average Mass (Mav) of a molecule is the sum of the calculated masses of the atomic weights of the elements from which it is composed.

Monoisotopic Mass – The Monoisotopic mass (Mmi) for a molecule is the sum of the exact masses of the most abundant stable isotopes that can occur naturally.

 

A common search that would be executed by a scientist, especially a mass spectrometrist would be a search on a monoisotopic mass with a defect. For example, 121.4536 +/- 0.0002 would search for the mass of 121.4526 with a defect of 0.0002.

2.2.3     Search by Calculated Properties and Results Display

The Search by Calculated Properties allows the user to search by one or more calculated properties for the molecule. All properties provided in release version 1 of ChemSpider use algorithms provided as part of a collaboration with Advanced Chemistry Development (ACD/Labs). The search capabilities allow searches by the range of a particular property and usage should be inituitive. Simply input the range of the property for searching and click on the Search button. The search below demonstrates a search on all molecules with a calculated value of logP between 3 and 4 (as predicted using ACD/LogP batch software).

 

 

The Results are shown initially in a tabular display known as the Brief Display mode.

Alternative displays include the “More Details” mode shown below

and the Single Record mode shown below. The Single record mode shows the chemical structure, the systematic names as generated by ACD/Labs nomenclature software, Trivial Names and Registry Numbers if available, SMILES and InChI systematic identifiers and intrinsic properties of the molecule.

Notice the shuttle buttons below the single record allowing scrolling through all hits. From left to right the four shuttle buttons are:

                                

 BACK TO FIRST RECORD

BACK ONE RECORD

FORWARD ONE RECORD

   FORWARD TO LAST RECORD

Notice that it is possible to select either a Structure format for the Single Record Mode or a Properties format. The Properties format displays a series of predicted properties generated using the ACD/Labs PhysChem predictors. The Properties format is shown below.

 

Notice that at the top of the page is a button saying . Selecting this button will allow the form containing the original search criteria to be shown. Similarly Hide Search Form will remove the Search Form from the display in order to make review of the results page easier.

2.2.4. Searching Multiple Properties Simultaneously

It is possible to search on multiple properties simultaneously as shown below.

The search above is a simultaneous search of molecular weight, logP, Fule of 5 Failures, # of hydrogen bond acceptors, number of hydrogen bond donors, number of freely rotatable bonds amd polar surface area. Notice that in front of each property is a check mark and all are selected by default.

This search produces five hits, 2 are shown below.

The search can be expanded simply by unchecking certain properties for the search. For example, removing the constraint of PSA by checking the box provides 9 hits. Removing PSA and FRB from the search gives 21 hits. This type of Boolean AND search provides a powerful way to constrain the search.

2.3     Services Menu

The services menu provides tools allowing

·         generation of a structure from a name or identifier

·         drawing a chemical structure as an input to property generation algorithms

·         prediction of various structure identifiers and physciochemical properties using ACD/Labs algoithms

2.3.1     Generate Structure

The default view when entering the Generate Properties window is shown below.

To generate a structure from a systematic name, trivial name, registry number, SMILES or InChI code simply type or paste it into the text box and click on the button. Depending on whether the Image or Applet check boxes are selected the structure will be displayed as an image or inside a structure drawing applet as shown below.

 

       IMAGE DISPLAY                                                   APPLET DISPLAY

                                

 

 

 

 

If the choice is simply to draw a structure prior to generating the properties simply select the applet and sketch the chemical structure into the applet. For details regarding use of the applet consult the online manual or the reference guide.

Once the structure is drawn in can be saved to the desktop as a molfile using the button.

2.3.2     Generate Properties

Once a structure has been drawn or converted from a chemical name or identifier generation of the list of structure identifiers and calculated properties is as simple as clicking on the  button. The resulting output is shown below. In certain cases the algorithms cannot predict properties for certain molecules and in these cases will return the results of Cannot calculate as shown below for Dielectric Constant.

 

2.3.3     Online Structure and Substructure Searching of  ChemSpider

In order to search structures online there are a number of workflows that can be followed. These will be outlined separately.

2.3.3.1     Searching Using SMILES or InChI

In order to search ChemSpider using SMILES or InChI strings navigate to the Services page and input the appropriate string and convert to the structure. When the structure is generated use the Search ChemSpider button to search the database. Only the exact structure can be searched in this manner.

2.3.3.2     Searching Using the Name

In order to search ChemSpider using a name navigate to the Services page and input the appropriate name and convert to the structure. When the structure is generated use the Search ChemSpider button to search the database. Only the exact structure can be searched in this manner.

 

2.3.3.3     Searching Using a Structural Input

In order to search ChemSpider using a chemical structure navigate to the Search page and use one of two options to input the structure.

 

A molfile can be loaded from the desktop by using the Browse button to navigate to the appropriate molfile. Once selected use the Load button to load the molfile. The structure will then be displayed as shown below.

 

 

Alternatively select the Applet button and sketch the structure into the structure drawing window.

Select the appropriate radio button to choose either an exact structure or substructure search and then click on the Search button. The result will be displayed in the usual Tabular format by default as shown below. A substructure search of eugenol, shown above, provides a set of structural hits as shown below. The nature of substructure searches will be discussed in further detail in Section 2.3.

 

 

2.3.4     Desktop Structure/Substructure Searching and Property Prediction in  ChemSpider

Searching of ChemSpider, or predicting properties with ChemSpider via the ChemSketch interface can be very useful since it can dramatically speed up the process of structure generation. ChemSketch enables facile structure drawing in the interface simply through its ease of use but it also enables conversion of SMILES strings and InChI strings to structures prior to searching. For conversion of these identifiers simply use the Tools>Generate menu item.

 

 

In the commercial version of ChemSketch it is also possible to convert names from structures but the freeware only allows structure drawing, structure from SMILES and Structure from InChI as shown above. For further details about drawing structures we reference you to the ChemSketch manual.

Once a structure is either drawn or generated simply Click on the ChemSpider button on the top menu bar (for this the websearch addon for ChemSpider needs to have been installed). The button is shown below.

 

The dialog box shown below will appear and the service of choice may be selected: either search the exact structure against the database, search for the drawn substructure or predict properties for the drawn structure using the ChemSpider Properties Service.

 

 

The results can be shown in either a browser window such as Firefox or Internet Explorer or in a window which pops up over ChemSketch as shown below. Choose the appropriate option as to how you wish results to be displayed. The results of a property prediction are shown below in a window which appears over ChemSketch.

 

 

From this window an operation of interest may be to search the structure in the ChemSpider database.

 

When a substructure search is executed from the ChemSpider interface the results will be displayed in the exact format shown in a web-based search as shown below. This search used the structure of aspirin as the basis of the search.

 

 

2.4     The Details of Substructure Searching

Substructure searches in ChemSpider are based on a substructural core with the flexibility of the substructure search being based only on non-explicit hydrogens. For the structure of eugenol below the structure has been expanded initially to show explicit hydrogens. A substructure search using this molecule as the input will then assume that ANY atom in any hybridization can be attached at the A position.

 

 

Eugenol

Explicit Hydrogens

Any Atom Substitutions

 

With this basis for substructure searches some examples of the structures obtained following a substructure search using eugenol as the basis af the search are shown below. In both cases some of the A atoms are removed for ease of comparison and bonds have been reoriented to more accurately represent the molecule under comparison.

 

Example 1

Example 2