In silico ADME in drug design – enhancing the impact

Each year the pharmaceutical industry makes thousands of compounds, many of which do not meet the desired efficacy or pharmacokinetic properties, describing the absorption, distribution, metabolism and excretion (ADME) behavior. Parameters such as lipophilicity, solubility and metabolic stability can be measured in high throughput in vitro assays. However, a compound needs to be synthesized in order to be tested. In silico models for these endpoints exist, although with varying quality. Such models can be used before synthesis and, together with a potency estimation, influence the decision to make a compound. In practice, it appears that often only one or two predicted properties are considered prior to synthesis, usually including a prediction of lipophilicity. While it is important to use all information when deciding which compound to make, it is somewhat challenging to combine multiple predictions unambiguously. This work investigates the possibility of combining in silico ADME predictions to define the minimum required potency for a specified human dose with sufficient confidence. Using a set of drug discovery compounds, in silico predictions were utilized to compare the relative ranking based on minimum potency calculation with the outcomes from the selection of lead compounds. The approach was also tested on a set of marketed drugs and the influence of the input parameters investigated.


Introduction
Drug design is a multi-parameter optimization process, with pharmacokinetic properties describing absorption, distribution, metabolism and excretion (ADME) of a compound being important to consider early on. These parameters describe the pharmacokinetics of a compound and are key to determining the dose required for efficacy. Parameters such as lipophilicity, solubility, and metabolic stability can be measured in a high throughput manner in vitro and are, thus, often used as early ADME screens. During lead identification and lead optimization phases a substantial number of compounds is synthesized and characterized in such screening assays for selection towards additional, more costly in vitro and in vivo experiments to be able to finally select a single candidate drug. However, a compound needs to be physically available to be subjected to such assays. Thus, drug industry generates thousands of compounds per month out of which many do not show desirable ADME properties.
In silico models for such ADME endpoints, on the other hand, have been available for a long time, though with varying quality and usability [1][2][3][4][5][6]. Ideally, such models would be used before synthesis and, together with any potency estimation for the specific case, influence the decision to make a compound. Consequent usage of such in silico predictions has the potential to considerably reduce the number of synthesized compounds with inadequate ADME properties. In practice, it seems that most often only a few predicted properties are really considered before synthesis, usually including lipophilicity as important parameter. While it is understood that all available knowledge should be used to select which compound to make, it is not easy to define how the outcome of various predictions can be combined unambiguously. Lately, the use of multi-parameter optimization and scoring tools has been proposed and shown to be of value [7,8]. However, the definition of the scoring functions can be difficult and may be a bit arbitrary. As physiologically meaningful scoring function the predicted dose to man (D2M) has been suggested to utilize as early as possible. Several reports validated early D2M predictions both from preclinical in vivo data [9] as well as from in vitro data combined with in silico predictions [10]. However, using in silico predictions only was not found to be reliable enough, since both potency and pharmacokinetic properties would need to be accurately predicted.
Here we concentrate on the pharmacokinetic properties only and suggest to use in silico ADME models to predict what potency would be required for a specific compound to enable coverage during the whole dosing interval. By reverting the D2M equation and setting the target dose to, for example, 100 mg once daily, the achievable plasma concentration can be calculated from in silico predicted parameters and the minimum required potency ("threshold pIC50") defined as ranking score for virtual compounds. We show how such a ranking score could be used in the discovery setting on the example of a set of 27 compounds leading towards a candidate structure [11]. We also test the concept on a set of known drugs with information about dosing and human pharmacokinetics [9] and compare the results from purely in silico predictions to those from in vitro data and from human in vivo data. We also investigate how the in silico derived parameters influence the outcome of the equations, highlighting concepts to help drug discovery to improve pharmacokinetic properties during design.

Inverse dose-to-man (D2M) prediction
Equations used are standard pharmacokinetic equations [9,12,13] based on a one compartment model considering immediate absorption to estimate the average (C ave , Eq. 1), minimum (C min , Eq. 2) and maximum (C max , Eq. 3) concentration of a drug at steady state. (1) where F is bioavailability (Eq. 4), D po is the daily oral dose, Cl is the plasma clearance estimated using the well-stirred model (Eq. 5),  is the dosing interval, typically set to 24 h, V is the volume of distribution (predicted) and k e is the elimination constant (Eq. 6).
where F abs (fraction absorbed) is estimated from permeability (Eq. 7), F g (fraction escaping gut metabolism) is set to 1 and F h (fraction escaping hepatic metabolism) is estimated from Cl (Eq. 8). (5) with Cl b being blood clearance, Q h the liver blood flow (20 ml/min/kg), CLint in vivo the in vivo intrinsic clearance estimated from in vitro CLint (Eq. 9), and fu b the fraction unbound in blood.
where R b is the blood plasma ratio and set to 1 for neutrals and bases, whereas set to 0.55 for acids and zwitterions.
where fu is fraction unbound in plasma (predicted).
where CF is an empirical correction factor adopted for the simplified regression offset approach (CF=3, see below), and CLint sc is the scaled intrinsic clearance from the human hepatocyte incubation (Eq. 9a).
where CLint HH is the intrinsic clearance in human hepatocytes (predicted), SF HH are the scaling factors (120 million cells per g liver x 1680 g liver weight / 70 kg body weight = 2.88) and fu inc is the fraction unbound in the hepatocyte incubation (predicted).
The minimum potency value, e.g., "threshold pIC50" based on C min , is then calculated as the negative logarithm of the predicted concentration considering the specific fold coverage, typically set to 1 or 3 (Eq. 10).
Simplified regression offset approach -correction factor (CF) The regression offset approach [15] was shown to improve in vitro to in vivo scaling of clearance, especially when experiments where compared from different laboratories (sites) and using different batches of hepatocytes or microsomes. Extensive in-house data analysis (data not shown) indicated that overall the regression slope can be set to 1 and the offset to about 0.5, giving an equivalent level of correct predictions in three species, human, rat and dog. Thus, in vivo CLint can be estimated by multiplying the scaled CLint with a correction factor of 3. Here we adopt this procedure to estimate in vivo CLint directly from in silico predictions.

In silico models
Five in silico models were used as input for the above calculations. Human volume of distribution at steady state, intrinsic Caco2 permeability, human plasma protein binding, human hepatocyte intrinsic clearance, and fraction unbound in the hepatocyte incubation. All but the volume of distribution model used AstraZeneca in-house experimental data. All models are available within AstraZeneca.

Human volume of distribution (V)
Human volume of distribution data for about 700 compounds were collated from literature [16][17][18] and randomly split into a training and test set (n = 544 and 144, respectively). Both acids, bases, neutral and zwitterionic compounds are included in both datasets. It is a random forest model [19] using physicochemical descriptors [20] including ACD log P and log D [21] and clog P [22]. The experimental data is modelled in its logarithmic form. The prediction explains ~70 % of the variance of the test set, and has an error in prediction of 0.4 log units.

Intrinsic Caco2 permeability (P app )
The model is built on in-house data of intrinsic Caco2 permeability, apparent permeability measured in Caco2 cells in presence of a defined transport inhibitor cocktail as described earlier [14], and updated about every sixth month. The present training set consists of >4,000 data points. About 100 simple physicochemical descriptors [20] including ACD log P and log D [21] and clog P [22] are calculated from the compound structures. The data is modelled as log P app using random forest regression [19] as implemented in scikit-learn [23]. The latest temporal test set comprises about 300 compounds and shows an R 2 of 0.4 and a root mean squared error of predictions (RMSEP) of 0.6 (log scale), with about 60 % of the compounds being within 3-fold of the experimental value. The model can be used to distinguish between high and low permeability compounds with classification accuracy above 0.8.

Human plasma protein binding as fraction unbound (fu)
The model is built on in-house data of human plasma protein binding generated using equilibrium dialysis in high throughput assays as described earlier [24][25][26]. It is updated monthly. The present training set consists of almost 90,000 compounds. The data is modelled as log K (= log[(fraction unbound)/(fraction bound)]). The modelling procedure utilizes support vector machines [27] with a linear kernel, signature descriptors [28] and the conformal prediction framework [29] as implemented in cpsign from GenettaSoft [30]. The latest temporal test set comprising 750 compounds shows an R 2 of 0.7 and an RMSEP of 0.4, with about 80 % of the compounds being within 3-fold of the experimental value of fu.

Human hepatocyte intrinsic clearance (CLint HH )
The model is built on in-house data of human hepatocyte intrinsic clearance generated in high throughput assays using incubations of either cryopreserved or fresh human hepatocytes at 37 °C for up to 120 min as described earlier [26,31,32]. It is updated about every sixth month. The present training set contains more than 11,000 compounds and the data is modelled as log CLint. The modelling procedure utilizes support vector machines [27] with a linear kernel, signature descriptors [28] and the conformal prediction framework [29] as implemented in cpsign from GenettaSoft [30]. The latest temporal test set comprising almost 200 compounds shows an R 2 of 0.2 and an RMSEP of 0.4, with ~75 % of the compounds being within 3-fold of the experimental CLint value.

Fraction unbound in the hepatocyte incubation (fu inc )
The model is built on in-house binding data measured in cryopreserved rat hepatocyte incubations as described earlier [33]. The model is updated approximately every sixth month and the present data set contains about 1,700 compounds. The data is modelled as log K (= log[(fraction unbound)/(fraction doi: 10.5599/admet.6.1. 470 19 bound)]). The modelling procedure utilizes support vector machines [27] with a radial basis function kernel, signature descriptors [28] and the conformal prediction framework [29] as implemented in cpsign from GenettaSoft [30]. The latest temporal test set comprising about 200 compounds shows an R 2 of 0.5 and an RMSEP of 0.5, with ~75 % of the compounds being within 3-fold of the experimental fu inc value.

Global Sensitivity Analysis
Global sensitivity analysis was performed by a quasi-Monte Carlo method using the Fourier amplitude sensitivity test (FAST) and Sobol' sensitivity, which are implemented in the Global sensitivity analysis toolbox (GSAT) [34] in MATLAB [35] using 20,000 sample points.

Local Sensitivity Analysis
Local sensitivity analysis was performed by taking Jacobian matrices of the model with respect to the model parameters and other parameter points [36]. These matrices contain actual parameter values which can be examined, or plotted graphically against other parameters, to show which have the greatest effect at each point.
where = ( 1 , … , ) is a vector of parameter points to be examined, = ( 1 , … , ) is a vector of other parameters which are being examined, and x is a scalar output from the model. This was performed in MATLAB [35] using automatic differentiation of the parameters via the myAD toolkit [37].

Data sets
Two data sets were used for the present analysis. Data set 1 was a set of 27 drug discovery compounds exemplifying important compounds leading towards candidate selection [11]. Data set 2 was a set of 21 marketed drugs, for which potency data, dosing information and human pharmacokinetic data had been collated and used to show feasibility of early D2M predictions [9].

Calculation of required minimum potency (threshold pIC50) for a set of drug discovery compounds
As proof of concept a set of drug discovery molecules leading towards a candidate structure [11] was evaluated. Intrinsic clearance in human hepatocytes, fraction unbound in the incubation, fraction unbound in plasma, volume of distribution and intrinsic Caco2 permeability were predicted for all compounds using the present in-house models (see Table 1) and minimum and maximum plasma concentrations for a once daily oral dose of 100 mg calculated using equations 2 and 3. The project aimed at minimum plasma concentrations with 3-fold coverage over potency measured as pIC50 in a whole blood assay. Thus, threshold pIC50 values were calculated from C min,total /3 (see Table 2). Blood plasma ratios for these mainly neutral compounds were considered to be 1. Protein binding was considered in the scaling approach, but threshold pIC50s were based on resulting total plasma concentration to match the potency measurement.
The data indicated that some of the compounds had very short half-life and thus a once daily dose resulted in a high threshold pIC50 to cover potency over 24 hours and a high C max /C min ratio. Using a 10 times higher dose would reduce the threshold pIC50 by one unit, but show the same C max /C min ratio, since the applied model assumes linear kinetics. Considering only compounds with a threshold pIC50 below 9 and a C max /C min ratio below 100 about 12 compounds remained. This set of compounds included compound 15b, the compound finally selected, and three of the remaining four compounds of higher interest for which rat in vivo results were reported. Compound 22, which according to the present analysis was least favourable, was actually found in the rat study to have short half-life and, thus, not suitable to progress.
In summary, it seems that for this data set the threshold pIC50 together with the predicted C max /C min ratio could be used for ranking the compounds and correctly identify those with inferior pharmacokinetics. The results also highlight the danger in deprioritizing compounds just by applying a cut-off for, for example, metabolic stability. Here, three compounds had (predicted) CLint values higher than 20 µl/min/10 6 cells, compounds 12, 14a and 19. For 12, a reference compound, threshold pIC50 was one of the lowest in the whole set based on a predicted blood clearance below 5 ml/min/kg, whereas 14a showed an intermediate threshold pIC50 (8.2) and blood clearance just above 5 ml/min/kg. Only for compound 19 the threshold pIC50 was estimated to be on the high end within the present set, even though the blood clearance was again predicted as just above 5 ml/min/kg. 22, the compound with the highest threshold pIC50 has almost double the blood clearance despite a lower hepatocyte CLint (~ 8 µl/min/10 6 cells). Rank order differences between in vitro (or in silico) CLint and predicted in vivo clearance can be explained by the different binding properties: High binding in the incubation, i.e., low fu inc values, will potentiate the metabolic instability measured in the incubation, whereas high binding in plasma, i.e., low fu, can to some extent mitigate low metabolic stability seen in vitro. Furthermore, minimum concentrations mainly depend on a compound's half-life, determined by clearance and volume of distribution. Thus, volume of distribution is another important factor that again may change the rank order.

Calculation of threshold potency for a set of known drugs
As second test, we investigated the approach for a set of marketed drugs where information on dosing and human pharmacokinetic data had been collated by McGinnity et al. [9]. This dataset was previously used to evaluate the validity of early dose to man predictions [9,10]. Table 3 shows in silico ADME parameters for the 21 drugs predicted with the present models at AstraZeneca. To take into account that these drugs are not necessarily given once daily only, we adopted the dosing regimen suggestion by McGinnity et al. [9] when calculating the minimum required potency, threshold pX (see Table 4). The resulting threshold pX values are below 10 for all but four of the compounds, thus indicating that the approach is able to estimate pharmacokinetic behaviour to some extent. Additionally, the C max /C min ratio was estimated as below 100 for most of the compounds, even though it showed values above 1000 for three of them (ritonavir, bisoprolol and diclofenac). The availability of both in vitro [10] and in vivo [9] data for the compounds prompted us to investigate, to what extent the usage of experimental data would change the results (see Table 5 and Figure 1).  [9], based on their predicted t 1/2 : t 1/2 >8h  24h; t 1/2 4-8 h  12h; t 1/2 2-4h  8h; t 1/2 1-2 h  6h; t 1/2 < 1h  4h; b C min,total : predicted minimum total concentration at steady state; c C min,free : predicted minimum concentration at steady state corrected for plasma protein binding; d C max,total : predicted maximum total concentration at steady state; e threshold pX from C min,free : required minimum pX estimated from free C min for coverage over dosing interval; f C max /C min : predicted ratio between maximum and minimum total concentration at steady state. Table 5. Threshold pX based on a daily dose of 100 mg (1.4 mg/kg), given as 1-6 doses per day, for 21 known drugs [9], derived from in silico data (as above), in vitro data [9,10,14], and actual human in vivo data [9] Drug name Threshold pX from in silico a  Table 4; b threshold pX calculated from in vitro values using the same procedure as in Table 4 (CLint HH and fu values from ref [10], P app values from [9] and [14] as specified, and in-house values for fu inc ; c threshold pX calculated from in vivo values [9] using the same procedure as in Table 4; d P app value from ref [9]; e P app value from ref [14]; f fu inc value from in-house in silico model; g no value (not enough experimental data available for calculation); h in-house value for CLint HH ; i fu value from ref [17]; j P app from in-house in silico model; k in-house value for P app .
Most threshold pX values can be found within the range of 7-9, with a couple of values, especially in vivo derived, below 7 and others, most often in silico derived, above 9. Only one of the compounds for which human in vivo PK data was available showed an in vivo derived threshold pX values above 10, whereas all in vitro derived threshold pX values were actually below 10. The one compound with a higher in vivo data derived value, diclofenac, was earlier recognized as not being properly described by the one compartment PK model employed here [9]. For about a third of the compounds all three values are very close within about one log unit, e.g., carvedilol or nitrendipine. Other compounds have a somewhat higher spread, most often the in vitro value closer to the in vivo derived value, e.g., acebutolol or betaxolol. For a few compounds the in silico derived value is clearly differing from the other two, e.g., bisoprolol, diazepam and ritonavir. To better understand why we find the in silico predictions differ, we investigated how well the in silico and in vitro data can predict human in vivo properties (see Figures 2-4). Note that cyclosporine and desloratidine were excluded from this analysis, since there was not enough in vitro data for the former and no in vivo data for the latter compound. or in vivo (black triangles) data as described in text. Figure 2 shows that in silico data tends to overpredict human clearance. The two compounds on the left, diazepam and ritonavir, are obvious extremes, and their higher estimate of the threshold potency value are most likely related to the clearance misprediction. Note that clearance predictions from in vitro are clearly closer to the line of unity. However, we find two compounds, diclofenac and metoprolol, to be under predicted by more than 3-fold using the present scaling approach. This can also be seen in a more optimistic threshold pX value when compared to in vivo (see Figure 1). Overall, about two thirds of the compounds have in silico predicted clearance values within 3-fold from the experimental in vivo values, and all but two compounds are within 3-fold utilizing in vitro values.  Also the underprediction of half-life from in silico data is mostly due to the clearance prediction, while half-life prediction from in vitro, using the same in silico derived value for distribution volume, is clearly enhanced.
Human fraction absorbed seems to be the most difficult parameter to predict correctly (see Figure 4). While both in silico and in vitro predictions identify the two compounds with F abs below 0.4, compounds with F abs between 0.4 and 0.6 were only recognized in two or three cases from in silico and in vitro data, respectively. However, this failure is not necessarily a concern in the present case, since the relationship between F abs and plasma concentrations is assumed to be linear and a 2-fold change in fraction absorbed will lead to a 2-fold change in plasma concentration and thereby only to a 0.3 log units change for the threshold pX. Actually, Page [10] suggested, that estimation of F abs as either 1 (acids) or 0.5 (all other ion classes) is sufficiently accurate for early D2M predictions.

Influence of basic parameters -intrinsic clearance, fraction unbound in the hepatocyte incubation, plasma protein binding and volume of distribution
Using pharmacokinetic equations assuming a one compartment model both dose and fraction absorbed are linearly related to the derived plasma concentrations and it is easily understood how either of these parameters will influence the threshold pIC50 calculation. The remaining four input parameters, CLint HH , fu inc , fu, and V, on the other hand are more intricately interlinked, especially since the first three also are included in the scaling approach converting in vitro (in silico) CLint to in vivo clearance using the well-stirred liver model (equations 5 and 9). In order to better understand how these parameters influence the outcome, threshold pIC50 values for hypothetical compounds with hepatocyte CLint values varying from 1 to 200 l/min/10 6 cells, values for fraction unbound in the incubation varying from 0.01 to 0.7, values for protein binding (fu) varying from 0.001 to 0.3, and volume of distribution varying from 0.2 to 3 L/kg were calculated (see Figure 5).
As expected, lower intrinsic clearance as measured in hepatocytes leads to a lower threshold pIC50 estimate. However, the sensitivity of this correlation varies both with the fu inc /fu b ratio and the volume of distribution: lower volume of distribution leads to higher sensitivity towards intrinsic clearance. Additionally, higher fu inc /fu b ratios lead to lower sensitivity of the intrinsic clearance as well as of volume of distribution (see Figure 5, panel in upper right corner). Lower V leads to higher threshold pIC50 values and higher fu inc /fu b ratios essentially lead to lower threshold pIC50s, at least as long as potency in blood assays, i.e., defined from total concentrations, is considered. The threshold pIC50 shifts up by three log units in the uppermost panel row (fu = 0.001) when free plasma concentrations are taken into account, whereas the threshold pIC50 in the lower panel row (fu = 0.3) shifts only by about 0.5 log units.

Sensitivity analysis
Sensitivity analysis is a useful tool for determining which parameters are important. Global sensitivity analysis is used to explore the entire parameter space, considering physiologically relevant values to determine the relative importance of each of them. Here it was shown that volume of distribution is the most influential parameter by far (see Figure 6).
Local sensitivity analysis, on the other hand, can define how sensitive the calculation is to a parameter, when another parameter is scanned over its defined range. Here, we checked the influence of the remaining parameters for different values of volume of distribution (see Figure 7). CLint HH and fu have overlapping influence, considering a 10 % change, whereas fu inc is exacty opposite. Since in vivo CLint is multiplied by fu, when calculating Cl b using the well stirred model (Eq. 5) and is calculated from CLint HH divided by fu inc (eqs 9 and 9a) this is not really surprising. Note that the influence of the parameters is highest at low volume of distribution values, decreases quickly and does not change further as soon as V reaches a value of 2 L/kg. Considering typical volume ranges for acids, bases and neutrals with median values of 0.2 L/kg, 2.9 L/kg and 1.3 L/kg, respectively [10,17], as indicated in Figure 7, it is clear that especially for acids a small change in either hepatocyte CLint or binding properties can make a big difference for the prediction of threshold pIC50 determined from C min .

Influence of in silico model quality
The quality of the in silico models needs to be considered for the present analysis. Here we use AstraZeneca's in-house models for the five parameters, two of which have rather extensive training sets with more than 10,000 data points each, gathered over a long period of time (>10 years), whereas two, fu inc and P app , have intermediate training set size, and the fifth, volume of distribution, is based on literature data and uses only about 500 compounds. For the four models employing in-house data, about 70-75 % of the compounds within a temporal test set, i.e., compounds that have not been available when the model was built, were found to be within 3-fold of the experimental value. These results are rather encouraging when considering that the experimental variability in general is assumed to be about 2-fold [38]. The models are being regularly updated, as it was shown earlier that continuous updates of models enables new chemistry to be well represented and, thus, likely better predicted by the models [39]. It should be emphasized that the predictivity of the models needs to be investigated for the drug discovery project or chemical series in question when this approach is to be applied. Note that, as was shown in Figure 2, clearance prediction are reasonably accurate from in silico for most compounds in data set 2 and can be directly verified as soon as in vitro data is available for a compound.
The remaining parameter, human distribution volume, is less easily available. The QSAR model used here is based on human literature data and cannot be updated as straightforwardly as models using data from in-house screens. Also, external data is not necessarily relevant to newest in-house chemistry, and model outcome can usually only be verified at later stages since in vivo data is required. Nevertheless, the predictivity was considered good, with an error of prediction to be 0.4 log units (2.5-fold), and scrutinizing literature or databases for new data [40][41][42] in fairly regular intervals should ensure continued high quality of the model.

Conclusions
The present study suggests the usage of in silico ADME predictions to estimate which plasma concentration a new chemical structure may achieve when given orally using a defined dose and dosing regimen. From the plasma concentration a minimum required potency can be deduced, here referred to as threshold potency (or threshold pIC50). While it was postulated, that early dose predictions from in silico was not yet possible [10], the idea here is to utilise predictions only to consider a compound's summarized pharmacokinetic properties for ranking compounds within a series early on in the design process. Combining the in silico predictions to a physiologically meaningful score, reduces the risk of deselecting a compound just because one of the parameters is outside an arbitrarily chosen limit.
Using the approach for a set of drug discovery compounds, it was shown that the threshold pIC50 was able to define which compounds had a higher or lower chance of success, with the finally selected compound in the former set. It was assumed that the approach is most useful in lead optimisation stage, when the requirements for potency coverage are known.
Additionally, it was shown that the threshold potency values calculated for a set of marketed drugs were in a reasonable range for most of the compounds, when appropriate dosing regimens were considered for each. These results were also compared with the outcome, when experimental in vitro or human in vivo data were considered as input for the same procedure. In silico outcome was in many cases similar to both in vitro and in vivo, and when there were bigger differences usually the in vitro was closer to the in vivo outcome. Thus, in silico prediction can be easily verified by in vitro experiments as soon as a compound is made.