- Large, chemically diverse dataset of logP measurements for benchmarking studies.
Large, chemically diverse dataset of logP measurements for benchmarking studies.
Lipophilicity is a crucial parameter in drug development since it impacts both ADME properties and target affinity of drug candidates. In early drug discovery stage, accurate tools for logP prediction are highly desired. Many calculation methods were developed to aid pharmaceutical scientists in drug research; however almost all suffer from insufficient accuracy and variation of performance in several regions of the chemical space associated with new chemical entities. The low predictive power of existing software packages can be explained by limited availability and/or variable quality of experimental logP values associated with training set used, which stem from various protocols and poorly cover chemical space. In this study, a dataset of 1000 diverse test compounds out of 4.5 million was generated; logP values of 759 purchasable compounds (46% non-ionizable, 30% basic, 17% acidic, 0.5% zwitterionic and 6.5% ampholytes) from this selected set were experimentally determined by UHPLC followed by UV detection or MS detection when necessary. Finally, a data collection of 707 validated logP values ranging from 0.30 to 7.50 is now available for benchmarking of existing and development of new approaches to predict octanol/water partition coefficients of chemical compounds.