The Chemical Thesaurus
Reaction Chemistry Database


A review of an earlier version of The Chemical Thesaurus appeared in 2001 that perfectly sums up everything that The Chemical Thesaurus reaction chemistry database project is trying to achieve, and we could not have put it better ourselves:

"The Chemical Thesaurus is a reaction chemistry information system that extends traditional references by providing hyperlinks between related information. The program goes a long way toward meeting its ambitious goal of creating a nonlinear reference for reaction information. With its built-in connections, organizing themes, and multiple ways to sort and view data, The Chemical Thesaurus is much greater than the sum of the data in its database.

"The program does an excellent job of removing the artificial barriers between different subdisciplinary areas of chemistry by presenting a unified vision of inorganic and organic reaction chemistry."

K.R. Cousins, JACS, 123, 35, pp 8645-6 (2001)



OK, tell me more... Why is the software called "The Chemical Thesaurus"?

The word thesaurus means storehouse, and The Chemical Thesaurus a storehouse of information about chemical species [entities] and chemical reactions, interactions and processes.

Also, the application behaves rather like the thesaurus built into our word processors that allows us to jump from word-to-word by meaning:

The Chemical Thesaurus allows us to jump from chemical to chemical via the associated interactions, reactions and processes. For example, it is possible to click thru the industrial synthesis of nylon-6,6:

And, with the Chemical Thesaurus it is possible to click back to find out how nylon-6,6 is made.

The Chemical Thesaurus is a reaction chemistry explorer


How do I use the software?

Just click around... Explore... Discover...

That said, this website is designed for people with an interest in chemistry and The Chemical Thesaurus may be a bit perplexing to non-scientists.

Briefly, The Chemical Thesaurus reaction chemistry database & web application consists of just seven inter-linked screens:

• The Main Index screen
• List Chemical Entities (plus an associated/expanded sorts & finds page)
• Chemical Entities data page
• List Interactions, Reactions & Processes
• Interactions, Reactions & Processes data page
• List Mechanisms & Collections
• Mechanisms & Collections data page

The thing to remember is that a particular interaction, reaction or process can either be found by searching for the chemical entities that partake as substrates, reagents, solvents, catalysts, products, by-products, or by how the interaction, reaction or process is classified by mechanism or collection.


Older, Stand-Alone [non-web] Version of The Chemical Thesaurus


Chemistry: The Study of Matter and its Changes

Chemistry is often described as the study of matter and its changes. This is crucial because the relational database schema that under lies The Chemical Thesaurus – the very architecture of the application – is explicitly designed in terms of matter and the changes that occur to matter.

Matter is considered in terms of chemical entities.

Changes to matter are considered in terms of the interactions, reactions and/or processes of defined chemical entities.

The term chemical entity is used because it is inclusive and can be used to group together all objects of chemical interest including: atoms, isotopes, molecular substances & discrete molecules, photons, metals, alloys, ionic salts, network materials,electrons, ions, radicals, reactive intermediates, generic species such as nucleophile, and even specialist apparatus like the Dean & Stark trap.

No other term is so general:

The sodium ion, Na+, is a chemical species but not a substance or a material.
Diamond is a material and it is a substance, but not a species.
Aldehydes and nucleophiles are hypothetical, generic objects.

The Dean & Stark trap is glassware.

The classification of matter into the various types of chemical entity used in The Chemical Thesaurus is discussed in detail in the Chemogenesis web book, here.

A particular chemical entity may have one name or several synonyms. For example the compound CH3I is commonly called both methyl iodide and iodomethane, and both names appear in the synonyms database.

All chemical changes can be described by chemical equations:

  • 2 H2 + O2 –> 2 H2O
  • crude oil –> methane, propane, butane...
  • A + B –> C

The reaction equation a powerful metaphor able to describe processes from elementary particle interactions to biochemistry.

Reaction equations can be balanced in terms of numbers of entities, mass, enthalpy, entropy and Gibbs free energy, or they may be unbalanced.

Hypothetical interactions and processes can be described.

Both physical changes and chemical changes can be modelled by chemical reaction equations.

Actually, there is no theoretical or clear-cut separation between "physical" and "chemical" change, although the distinction may sometimes be useful with beginning science students. Technically, all material changes are changes in phase space.



The RDMS Engine

The Chemical Thesaurus reaction chemistry database on the web uses the MySQL relational database engine to serve several relationally linked database tables. There are several entry points into the table system:


What Data is Included?

It is not possible to add all chemistry to any one database, so the decision has been made to fill The Chemical Thesaurus reaction chemistry database with as much simple, fundamental and important reaction chemistry as possible. The policy has been to scatter wide rather than to pile deep. That said, it is hoped that the database now contains all of the reaction chemistry knowledge that a chemistry major would be expected to be familiar with, specialist modules excepted.

Although the database contains some references to the primary and secondary literature the data is mainly textbook level. But this should be seen as a strength rather than a weakness because The Chemical Thesaurus reaction chemistry database, in tandem with the Chemogenesis web book, attempts to describe and model and reaction chemistry space, from the ground up. Currently, the Chemical Thesaurus reaction chemistry database holds information on:

  • quarks, leptons & selected hadrons
  • the proton, neutron & electrons
  • isotopes, atoms, atomic ions
  • nucleosynthesis & radioactive decay series
  • simple molecules & molecular ions
  • VSEPR geometries
  • main group chemistry
  • inorganic industrial chemistry
  • organic industrial chemistry
  • organic functional groups & FG reaction chemistry
  • reaction mechanisms
  • Lewis acids, Lewis bases & Lewis acid/base complexes
  • redox agents, radicals, diradicals, photochemistry
  • pericyclic processes
  • Brønsted acids & conjugate bases
  • material types, polymers, minerals, alloys
  • explosives, flame chemistry
  • selected natural products, common pharmaceuticals & their classes
  • and more...

The Chemical Thesaurus holds sample data on:

  • organic chemistry of real species
  • synthetic routes
  • transition metal chemistry
  • organometallic chemistry
  • biochemistry

These are truly vast areas of human knowledge and comprehensive coverage is totally outside the scope of the current iteration of this project. Detailed information about the chemistry of real species is held in the primary literature, a resource that consists of more than a hundred scientific journals, plus various academic and commercial chemistry databases:

  • CAS (the Chemical Abstracting Service) has a database of 27 million substances (as of Jan 2006), and about 4000 more are added per day, that is more than a million a year
  • Beilstein: 9 million organic substances
  • Gmelin: 2 million inorganic compounds

What are all these "(generic)" entities?

Chemistry is commonly discussed in terms of hypothetical species with ideal behaviour, with real species assigned to these ideal, generic species. Consider the statement:

"Acetaldehyde and propanal are aldehydes."

Acetaldehyde and propanal are real chemical entities, while the hypothetical aldehyde is an idealised generic species.

The term 'Markush structure or group' is sometimes used for generic, particuarly in the patent literature.

This logic is formalised and developed in The Chemical Thesaurus. This is possible because the reaction chemistry database can hold information about any type of chemical object:

chemical reagents
molecular ions
reactive intermediates
and generic species such as: aldehyde (generic)

Moreover, the software allows the user to jump between real species and their associated generic species.

For example, acetic acid is a carboxylic acid and clicking on the Carboxylic acid (generic) link will jump to a page where all of the carboxylic acids in the database are listed.

Don't worry, it is much easier to do with a click of the mouse than it is to to explain in words! But you may have been wondering what all the references to "generic" were. Generic species are always listed with (generic) after the name to avoid confusion.

A great deal of chemical education involves understanding the chemistry of generic species, and learning how to assign real species as generic species with each other. This approach is integral to how The Chemical Thesaurus is organised.

Test your knowledge by going the Chemistry Tutorials & Drills web site.


Retro Synthetic Analysis

Retro Synthetic Analysis (RSA) is a technique employed in advanced synthetic organic chemistry to help design the sequence of reactions to a large, multifunctional molecule entity, such as a natural product or pharmaceutical agent.

The idea is to logically find the synthetic building blocks required for construction by "disconnection".

This is achieved by looking for strategic bonds and the potential functional group inter-conversions in a molecule, and then to deducing the synthetic entities, or "synthons", required to construct the desired molecule in the lab.

For example, acetic anhydride can be disconnected onto an "acetyl cation synthon" and an "acetate ion synthon":

There is no actual reaction in which an acetyl cation reacts with an acetate anion, because both ions require counter ions, however, the RSA analysis is conceptually very useful.

RSA deconstruction logic has been extended in The Chemical Thesaurus to main group chemistry. For example, the trivial Na+ plus Cl– reaction to give sodium chloride is shown as a retro synthetic disconnection:


Chemical Naming & Identification Issues

Please note that even simple chemistry can generate naming problems. For example:

The chemistry associated with elemental sulfur is commonly associated with S – as it is here – but the species S does not exist, at least not below 1000°C.

Flowers of sulfur, the common yellow soft crystalline form of the element, is S8. If this species were to be used in the reaction chemistry database all stoichiometries would have to be multiplied by 8 and the numbers would become unnecessarily cumbersome.

The species S1 is invented for the sake of simplicity.

Likewise, there are two types of proton, H+, in the database:

The proton of high energy physics: H+(vacuum).

The proton associated with Brønsted acid reaction chemistry: H+(solvated).

A decision has been made to have separate entries for these two types of proton.

A decision has also been made to have separate entries for minerals and reagent chemicals.

The reason is that few minerals are chemically pure and chemists like composition to be defined within 1% or better. The decision to separate minerals from chemical reagents leads occasional double entries, such as two entries for gypsum: gypsum the mineral of variable composition and gypsum the pure chemical reagent.

Another problem results from the usual conventions of writing chemical equations: reaction products and by-products are expressed as pure materials even though they seldom are.

For example, an aqueous industrial manufacturing process may produce sulfuric acid in water as a by-product. Clicking on the sulfuric acid icon will transport the user to the concentrated sulfuric acid data page, yet it is not possible (with any energetically efficiency, at least) to convert aqueous sulfuric acid into concentrated sulfuric acid.

Thus, some chemical intelligence is required when navigating through the relationally linked database tables.



Queries, Suggestions, Bugs, Errors, Typos...

If you have any:

Queries
Comments
Suggestions for links or future developments
Bug, typo or grammatical or factual error reports on this page or site,

please contact Mark R. Leach, the author, using mrl@meta-synthesis.com

This free, open access web resource is an ongoing project and your input is appreciated.

© Mark R. Leach 1999-2009