| advertise add site services publishers database health videos | ![]() | about toolbar stats live show health store more stuff JOIN/LOGIN |
IPI - History File Format | International Protein Index | EBI ebi.ac.uk | New Jersey Chemical Peel- NJ Chemical Peel Treatment- Chemical Peel elizabethrochemedspa.com | Chemical Peel New York, Chemical Peel NYC, Chemical Peel NY, Facial Peel smoothsynergy.com |
This article discusses some common molecular file formats, including usage and converting between them. It also lists a few sources for freely obtaining chemical data on the Internet. Chemical information is usually provided as files or streams and many formats have been created, with varying degrees of documentation. The format can be found by three means (see chemical MIME section)
[edit] Sources of Chemical DataHere is a short list of sources of freely available molecular data. There are many more resources than listed here out there on the Internet. Links to these sources are given in the references below.
[edit] Chemical Markup LanguageChemical Markup Language (CML) is an open standard for representing molecular and other chemical data. The open source project includes XML Schema, source code for parsing and working with CML data, and an active community. The articles Tools for Working with Chemical Markup Language and XML for Chemistry and Biosciences discusses CML in more detail. CML data files are accepted by many tools, including JChemPaint, Jmol, XDrawChem and MarvinView. [edit] Protein Data Bank FormatThe Protein Data Bank Format is commonly used for proteins but it can be used for other types of molecules as well. It was originally designed as a fixed-column-width format and thus officially has a built-in maximum number of atoms; however, many tools can read files that exceed the limit. Some PDB files contain an optional section describing atom connectivity as well as position. Because these files are sometimes used to describe macromolecular assemblies or molecules represented in explicit solvent, they can grow very large and are often compressed. Some tools, such as Jmol, can read PDB files in gzipped format. The PDB maintains the specifications of the PDB file format and its XML alternative, PDBML. The typical file extension for a PDB file is .pdb, although some older files use .ent or .brk. Some molecular modeling tools write nonstandard PDB-style files that adapt the basic format to their own needs. [edit] GROMACS formatThe GROMACS file format family was created for use with the molecular simulation software package GROMACS. It closely resembles the PDB format but was designed for storing output from molecular dynamics simulations, so it allows for additional numerical precision and optionally retains information about particle velocity as well as position at a given point in the simulation trajectory. It does not allow for the storage of connectivity information, which in GROMACS is obtained from separate molecule and system topology files. The typical file extension for a GROMACS file is .gro. [edit] CHARMM formatThe CHARMM molecular dynamics package can read and write a number of standard chemical and biochemical file formats; however, the CARD (coordinate) and PSF (protein structure file) are largely unique to CHARMM. The CARD format is fixed-column-width, resembles the PDB format, and is used exclusively for storing atomic coordinates. The PSF file contains atomic connectivity information (which describes atomic bonds) and is required before beginning a simulation. The typical file extensions used are .crd and .psf respectively. [edit] Ghemical file formatThe Ghemical software can use OpenBabel to import and export a number of file formats. However, by default, it uses the GPR format. This file is composed of several parts, separated by a tag (!Header, !Info, !Atoms, !Bonds, !Coord, !PartialCharges and !End). The proposed MIME type for this format is application/x-ghemical. [edit] SYBYL Line NotationSYBYL Line Notation (SLN) is a chemical line notation. Based on SMILES, it incorporates a complete syntax for specifying relative stereochemistry. SLN has a rich query syntax that allows for the specification of Markush queries. The syntax also supports the specification of combinatorial libraries. Example SLNs
[edit] SMILESThe Simplified Molecular Input Line Entry Specification (SMILES) is a line notation for molecules. SMILES strings include connectivity but do not include 2D or 3D coordinates. Hydrogen atoms are not represented. Other atoms are represented by their element symbols B, C, N, O, F, P, S, Cl, Br, and I. The symbol "=" represents double bonds and "#" represents triple bonds. Branching is indicated by (). Rings are indicated by pairs of digits. Some examples are
[edit] XYZThe XYZ file format is a simple format that usually gives the number of atoms in the first line, a comment on the second, followed by a number of lines with atomic symbols (or atomic numbers) and cartesian coordinates. [edit] MDL numberThe MDL number contains a unique identification number for each reaction and variation. The format is RXXXnnnnnnnn. R indicates a reaction, XXX indicates which database contains the reaction record. The numeric portion, nnnnnnnn, is an 8-digit number. [edit] Other Common FormatsOne of the widest used industry standards are chemical table file formats, like the Structure Data Format (SDF) files. They are text files that adhere to a strict format for representing multiple chemical structure records and associated data fields. The format was originally developed and published by Molecular Design Limited (MDL). MOL is another file format from MDL. It is documented in Chapter 4 of the white paper Media:MDL CTfile Formats. PubChem also has XML and ASN1 file formats, which are export options from the PubChem online database. They are both text based (ASN1 is most often a binary format). There are a large number of other formats listed in the table below [edit] Converting Between FormatsOpenBabel and JOELib are freely available open source tools specifically designed for converting between file formats. Their chemical expert systems support a large atom type conversion tables. babel -i input_format input_file -o output_format output_file For example, to convert the file epinephrine.sdf in SDF to CML use the command babel -i sdf epinephrine.sdf -o cml epinephrine.cml The resulting file is epinephrine.cml. A number of tools intended for viewing and editing molecular structures are able to read in files in a number of formats and write them out in other formats. The tools JChemPaint (based on the Chemistry Development Kit), XDrawChem (based on OpenBabel), Chime, and Jmol fit into this category. MDL MOL file can be converted to wikipedia recommended SVG graphics format by Mol2Svg converter freeware [1]. [edit] The Chemical MIME Project"Chemical MIME" is a de facto approach for adding MIME types to chemical streams.
The definitive specification is at http://www.ch.ic.ac.uk/chemime/ which is updated when major new types appear. [edit] Chemical MIME SupportFor Unix/Linux there is a tar.gz available which registers chemical MIME types on your system. Programs can then register as viewer, editor or processor for these formats so that full support for chemical MIME types is available. chemical-mime-data: http://downloads.sourceforge.net/chemical-mime/ [edit] See also
[edit] References
[edit] External links
[edit] Notes |
| ↑ top of page ↑ | about thumbshots |