Natural Product-Based Drug Discovery


Natural products have been a major source of medicine throughout history. In ancient Egypt, people already used willow bark to ease pain and fever. Its active extract, salicin has a chemical substructure like aspirin. However, the extraction or synthesis of natural products seems to be a major issue when considering modern industry requires to use HTS (High Throughput Screening) process or build a large chemical library.1 As a result, for the past few decades, pharmaceutical companies tend to de-emphasize the use of natural products and focus more on small molecule drugs.2

This raises problems: There is a demand to bring more novel drugs to the market due to the increasing disease diversity.1 However, even though the expense on R&D department of pharmaceutical companies is increasing every year, the number of annual approved drugs does not increase (even less compared to that in 1990s). There is a large failure rate in phase II and phase III due to lack of efficacy or off-target side effects.3

As such, in recent years, there has been a revitalization of interest to explore the unique chemical space of natural products, eager to design small molecules that could mimic nature’s chemistry.4 Compared to synthetic small drug-like molecules, natural products tend to have more sp3-hybridized bridgehead atoms, more chiral centers, a higher oxygen content but lower nitrogen one, and aliphatic rings over aromatic one. There are some rigorous decision guidelines when designing, filtering, and screening potential drug molecules, for example, Lipinski’s “rule of five”.5 It limits drug-like candidates to a small range considering: less than 5 hydrogen bond donors, less than 10 hydrogen bond acceptors, less than 500 Da molecular mass, logP less than 5. However, 20% of natural product break such rules,4,6 and laying in the chemical space of bRo5 (beyond rule of five), many drugs still show potential ability to cure life-threatening diseases (for example: HIV protease inhibitors, anticancer agents and heart stimulators).7 Hence natural products are again providing insights for future drug discovery trends.

Natural Products for Drug Design4

Natural products are a promising starting point for lead discovery. By pseudo-retrosynthesis (RECAP method) of 210,000 natural products, many of the fragments derived could be used as a model in de-novo drug design. According to authors’ investigation of a large compound set from DrugBank, ChEMBL database, Dictionary of Natural Products database, Traditional Chinese Medicine Database, these natural-product-derived fragments have a high containing ratio in approved drugs and are proved to be “privileged”.

The aforementioned blocking issues for natural products have been overcome by modern chemo- and bio- informatic approaches, especially by computer assistance. With the development of computational technology, many techniques evolve, like deep learning and artificial intelligence, to simplify natural products’ scaffolds, assist drug design from natural-product-derived fragments, and predict macromolecular target of corresponding designed molecules. Fig. 1 shows a brief workflow with the assistance of such techniques: Various structures are generated with synthetic available building blocks, derived from natural product template (DOGS method). These structures then undergo a ranking process on their topological-pharmacophore-feature similarity to the template molecule (CATS method). After manual simplification, designed molecules are sent to SPiDER software for target prediction, finishing this de novo design approach. Of course, thorough in vivo and in vitro assessment is still mandatory, and one of the drawbacks is that only previous studied target can be predicted.

Fig. 1. The molecular design strategy and target prediction algorithm.8,9

New Insights

A recent study published in PNAS presents more insights on natural product-based drug discovery.10 It reports a quantitative analysis of both number of compounds and compound novelty over the past decades. Hereafter presents the results from the study.

First, the trends in chemical diversity. Fig. 2 shows the number of new natural products per year and rate of novel compound isolation as a percentage of total natural product isolation. The novelty is quantified using Tanimoto similarity scores (It is defined as the number of features common to both molecules, divided by the total number of unique features, for further discussion, see Supporting Information10) between all molecule pairs. As can be seen from the figure, the number of new natural products increases dramatically and such steady trend has kept for decades even when pharmaceutical companies almost exited such arena. The authors suggest it a result of increasing globalization of research. But on the other hand, the percentage of new natural products with novel structure decreases. Despite these percentage, the absolute number of molecules with low similarities remains high during recent period, from further analysis.

Fig. 2. Number of compounds published per year and rate of novel compound isolation as a percentage of total natural product isolation10

Second, the authors also reveal the importance of exploration of unusual source organisms or environments by splitting the dataset into subgroups within two major designations. Novel sources remain an important and productive source of added chemical diversity. It shows similar result that most natural products found recently bear structural similarity to previously published compounds.

Third, this study mainly focuses on cyclic tetrapeptides to evaluate the chemical space occupied by natural products. The theoretical structural diversity is extremely larger than the actual structural diversity. Comparing to the very large theoretical chemical space offered by natural products, only limited key scaffolds are always selected.

Finally, the authors turned to another quantifying system Tversky scores (it considers features that are unique to A and B and uses a weighting factor to provide a measure of how well each compound is a subunit of the other) to determine the distribution of the known natural product structures within chemical space. A network diagram shows a narrow percentage of the total available natural product-like chemical space is occupied by known natural product scaffolds.10 It suggests convergent compound classes by selective pressures from nature, or inability to access all the natural compounds from the predicted chemical space.

This study shows a comprehensive retrospective analysis of natural product. It reveals the boundary of related chemical space hence may guide future design of a plausible natural product-like synthetic screening libraries. It also points out the importance of “bottom up” approaches: to discover the unexpressed genetic potential of microorganisms. There are also some missing points to note, as mentioned in the “Limitations of this Analysis” part10: plant-derived natural products are not included in this study due to lack of access to an appropriate database. Besides, considering the quantifying systems, Tanimoto scores and Tversky scores, they are challenging and have limitations.11 Tanimoto score is a unidirectional scoring mechanism and fail to assess when newly-discovered compound is a substructure of existing nature product.


To conclude, natural products as “an old source for new drugs” have shown as a privileged structure in drug discovery. To fully understand the formation of natural products and to discover its occupied chemical space may lead to a better utility of it.



  1. Morrow, J. K., Tian, L. & Zhang, S. Molecular networks in drug discovery. Crit. Rev. Biomed. Eng. 38, 143–56 (2010).
  2. Koehn, F. E. & Carter, G. T. The evolving role of natural products in drug discovery. Nat. Rev. Drug Discov. 4, 206–220 (2005).
  3. Kola, I. & Landis, J. Can the pharmaceutical industry reduce attrition rates? Nat. Rev. Drug Discov. 3, 1–5 (2004).
  4. Rodrigues, T., Reker, D., Schneider, P. & Schneider, G. Counting on natural products for drug design. Nat. Chem. 8, 531–541 (2016).
  5. Lipinski, C. A., Lombardo, F., Dominy, B. W. & Feeney, P. J. Experimental and Computational Approaches to Estimate Solubility and Permeability in Drug Discovery and Develop ment Settings. Adv. Drug Deliv. Rev. 23, 3–25 (1997).
  6. Quinn, R. J. et al. Developing a drug-like natural product library. J. Nat. Prod. 71, 464–468 (2008).
  7. Doak, B. C., Over, B., Giordanetto, F. & Kihlberg, J. Oral druggable space beyond the rule of 5: Insights from drugs and clinical candidates. Chemistry and Biology 21, 1115–1142 (2014).
  8. Reker, D., Rodrigues, T., Schneider, P. & Schneider, G. Identifying the macromolecular targets of de novo-designed chemical entities through self-organizing map consensus. Proc. Natl. Acad. Sci. 111, 4067–4072 (2014).
  9. Friedrich, L., Rodrigues, T., Neuhaus, C. S., Schneider, P. & Schneider, G. From Complex Natural Products to Simple Synthetic Mimetics by Computational de Novo Design. Angew. Chemie – Int. Ed. 55, 6789–6792 (2016).
  10. Pye, C. R., Bertin, M. J., Lokey, R. S., Gerwick, W. H. & Linington, R. G. Retrospective analysis of natural products provides insights for future discovery trends. Proc. Natl. Acad. Sci. U. S. A. 114, 5601–5606 (2017).
  11. Palazzolo, A. M. E., Simons, C. L. W. & Burke, M. D. The natural productome. Proc. Natl. Acad. Sci. 114, 5564–5566 (2017).


Data-Driven Approach to Drug Toxicity Prediction

A research article published in Cell Chemistry Biology in 2016

Gayvert, K. M., Madhukar, N. S., & Elemento, O. (2016). A Data-Driven Approach to Predicting Successes and Failures of Clinical Trials. Cell Chemical Biology, 23(10), 1294–1301.

David C. Young mentioned in 2009 that the side effects of drugs are usually not identified until clinical trails which may results in drug failing clinical trails after already spending a large amount of money. [] Failures in clinical trails have skyrocketed over the past three decades due to safety reasons. How to overcome this obstacle? The first thing comes to my mind is Big Data. Data-driven approaches have been used in almost all areas to solve different problems which is the reason I start to blog research articles of this cutting-edge area that interest me

In this specific case, Elemento et al. sought to use a similar “moneyball” approach, inspired by the effective use of sabermetrics in predicting successful baseball players (I don’t know baseball at all), to predict clinical toxicity, which is highlt related to successes and failures of clinical trials. This approach is called Predicting the Odds of Clinical Trial Outcomes Using Random Forest (PrOCTOR).图片1

This approach is shown in this figure (for detail illustration, check the video presented by the author: click here). It integrates chemical properties, drug-likeness measures, and target-based properties of a molecule into a random forest model to predict whether the drug is likely to be a member to fail clinical trials for toxicity reasons.

The set of 48 features taken into account in this research are listed in this file (click here).