Breiman, L. Machine Learning (2001) 45: 5. doi:10.1023/A:1010933404324
See relevant random-forest based paper in chemical biology field:
With the development of modern super computers, ab initio method seems to be more feasible than previous applying to a large system, while empirical methods, can achieve more precise prediction to address problems in a specific project.
Two research papers published in 2002 by Hiroaki Kitano.
Kitano, H. (2002). Systems biology: a brief overview. Science (New York, N.Y.), 295(5560), 1662–4. https://doi.org/10.1126/science.1069492
Kitano, H. (2002). Computational systems biology. Nature, 420(6912), 206–210. https://doi.org/10.1038/nature01254
What is systems biology?
A research article published in Cell Chemistry Biology in 2016
Gayvert, K. M., Madhukar, N. S., & Elemento, O. (2016). A Data-Driven Approach to Predicting Successes and Failures of Clinical Trials. Cell Chemical Biology, 23(10), 1294–1301. https://doi.org/10.1016/j.chembiol.2016.07.023
David C. Young mentioned in 2009 that the side effects of drugs are usually not identified until clinical trails which may results in drug failing clinical trails after already spending a large amount of money. [https://doi.org/10.1002/9780470451854] Failures in clinical trails have skyrocketed over the past three decades due to safety reasons. How to overcome this obstacle? The first thing comes to my mind is Big Data. Data-driven approaches have been used in almost all areas to solve different problems which is the reason I start to blog research articles of this cutting-edge area that interest me
In this specific case, Elemento et al. sought to use a similar “moneyball” approach, inspired by the effective use of sabermetrics in predicting successful baseball players (I don’t know baseball at all), to predict clinical toxicity, which is highlt related to successes and failures of clinical trials. This approach is called Predicting the Odds of Clinical Trial Outcomes Using Random Forest (PrOCTOR).
This approach is shown in this figure (for detail illustration, check the video presented by the author: click here). It integrates chemical properties, drug-likeness measures, and target-based properties of a molecule into a random forest model to predict whether the drug is likely to be a member to fail clinical trials for toxicity reasons.
The set of 48 features taken into account in this research are listed in this file (click here).