2013 Winner: BioCAM: Biological and Chemical Annotation by Mode-of-Action

Project Information
BioCAM: Biological and Chemical Annotation by Mode-of-Action
Physical and Biological Sciences
CHEM 195
Since computers were first integrated into the life sciences and the human genome was initially sequenced, the rate of data acquisition has climbed exponentially. While new technologies have the capacity to produce large amounts of data, analyzing the massive amounts generated poses significant problems. Single datasets give an excellent view of one aspect of a biological, chemical, or physical system, but are limited by the constraint that analysis of a single feature poses. For example, genomics provides insight into an organisms genetic potential, but provides no information about gene expression or gene product, while proteomics reports on expression and physical characteristics of proteins, but provides no data about non-expressed or non-protein encoding genes. Mixing “-omics” (genomics, metabolomics, proteomics, transcriptomics, etc.) datasets is the logical next step and has been extremely successful, ultimately leading to the development of systems biology.

The field of natural products has greatly benefited from these new “big data” high-throughput approaches. Natural products takes advantage of plants, animals, fungi, and bacteria that make compounds as defenses and to communicate with themselves and their environment. These compounds can be used as drugs because they are bioactive and affect biological systems in a variety of ways. Drugs isolated from these natural sources are responsible for nearly two-thirds of cancer therapeutics and over half of all FDA-approved drugs, but natural products drug development currently suffers from expensive and labor-intensive techniques and limited discovery of new chemistry. Traditional microbial natural products research involves extracting all of the compounds out of the microorganism, screening the extracts with a binary hit/no-hit screen, and isolating and characterizing the active compound(s). This approach is very slow, involving lengthy chromatography steps, screening, re-screening, and structure elucidation, and is hampered by repetitive identification of previously discovered chemistry. High-throughput and more diagnostic screening platforms have streamlined the process of choosing extracts for follow-up analysis, but they have not sped up the annotation of extracts containing novel chemistry. Some de-replication platforms for the annotation of known compounds in extracts have been published, but usually involve manual analysis. In the Linington Lab, we created a computational system that compares a process that reports on bioactivity of the extract with another process that reports on all the compounds within the extract in order to quickly identify the bioactive compound(s).

Cytological Profiling (CP) is a biological screen that not only identifies compounds which are bioactive in HeLa cancer cells, but also gives each compound a 'fingerprint' that describes how it affects the cells. If we group extracts with similar CP fingerprints, we often find they contain the same bioactive compound(s), but it is difficult to pick out the pertinant compound(s) from the extract. Using Ultra High-Performance Liquid Chromatography coupled to High Resolution Mass Spectrometry (UPLC-HRMS), we can record physical properties about each compound in an extract. By combining CP and UPLC-HRMS data, we can look for compounds that are present in extracts that have similar CP fingerprints. If a compound is present in all the extracts within a CP cluster, it is likely a bioactive compound. Cytological Profiling also gives a detailed fingerprint showing how that compound affects cells at the primary screening stage with very little follow-up analysis. By knowing the bioactive compounds present in all of the extracts, we can further separate CP mode-of-action clusters into subclusters of extracts with similar modes-of-action and similar chemistry.

In this thesis, I outline the later stages of development of Cytological Profiling and its incorporation into BioCAM: Biological and Chemical Annotation by Mode-of-Action. BioCAM was designed to de-replicate our natural products microbial extract library in an automated and high-throughput manner and to mine functional information about the extracts and the compounds within the extracts. This is the first instance of an automated platform to identify the bioactive constituents (both known and unknown) from complex mixtures. We have successfully used the platform to de-replicate over 100 extracts and identify several possible novel compounds. Although we used this platform with CP to identify potential cancer therapeutics, it could easily be used with any other screening process. In the future, we plan to use another screening process, BioMAP (antiBiotic Mode of Action Profiler), with our BioCAM platform, to identify potential antibiotics in order to combat the rising number of multiple drug resistant bacteria strains. We can also use this platform to annotate microbial genomes or study the poorly-understood mechanisms behind regulation of bacterial drug production. Many fields outside of natural products could benefit from combining two datasets including studies of pathways and genes involved in cancer and stem cell differentiation.
Students
  • Emerson Glassey (Merrill)
Mentors