Prediction of hERG inhibition of drug discovery compounds using biomimetic HPLC measurements

The major causes of failure of drug discovery compounds in clinics are the lack of efficacy and toxicity. To reduce late-stage failures in the drug discovery process, it is essential to estimate early the probability of adverse effects and potential toxicity. Cardiotoxicity is one of the most often observed problems related to a compound’s inhibition of the hERG channel responsible for the potassium cation flux. Biomimetic HPLC methods can be used for the early screening of a compound’s lipophilicity, protein binding and phospholipid partition. Based on the published hERG pIC50 data of 90 marketed drugs and their measured biomimetic properties, a model has been developed to predict the hERG inhibition using the measured binding of compounds to alpha-1-acid-glycoprotein (AGP) and immobilised artificial membrane (IAM). A representative test set of 16 compounds was carefully selected. The training set, involving the remaining compounds, served to establish the linear model. The mechanistic model supports the hypothesis that compounds have to traverse the cell membrane and bind to the hERG ion channel to cause the inhibition. The AGP and the hERG ion channel show structural similarity, as both bind positively charged compounds with strong shape selectivity. In contrast, a good IAM partition is a prerequisite for cell membrane traversal. For reasons of comparison, a corresponding model was derived by replacing the measured biomimetic properties with calculated physicochemical properties. The model established with the measured biomimetic binding properties proved to be superior and can explain over 70% of the variance of the hERG pIC50 values.


Introduction
It has been recognised that the physicochemical properties of drug candidates can be related to the latestage attrition of compounds in the drug development process. The early problems with bioavailability and absorption have been successfully improved by optimising solubility and permeability [1]. Recently, toxicity and the lack of efficacy have been identified as the major cause of compound attrition in clinics. Together, preclinical toxicity and adverse events account for approximately one-third of all attrition cases [2].
Cardiotoxicity is one of the major causes of concern during clinical trials together with liver and central nervous system (CNS) toxicity [3]. It accounts for approximately 27 % of drug development failures, and it does not seem to be restricted to specific high-risk therapeutic areas [4]. One particular focus of cardiovascular adverse effects has been drug-induced arrhythmia or "proarrhythmia" as a consequence of an increased recognition of a relationship between drug-induced QT interval prolongation and Torsades de Pointes (TdP) [5]. TdP is a dangerous type of proarrhythmia, described as a rare ventricular tachycardia with potential sudden cardiac death, which has led to approximately one-third of all drug withdrawals between 1990 and 2006 [4]. Furthermore, 15 % of drugs still on the market can cause QT prolongation, and 4 % are associated with TdP arrhythmia risk. Therefore, it is important to recognize a compound's cardiotoxicity potential early in the drug discovery process, not only because of the associated loss of human life or health, but also because of the enormous financial loss in investment and future revenue potential [6].
The cardiac action potential is regulated by the electrical current flows of ions across cardiomyocyte membranes. Many drugs can bind to ion channels, block ionic flow and disrupt the regulation of the action potential [7]. Upon blockade, the action potential will rest longer, which results in an increased duration of the relative QT interval that can be observed in electrocardiograph (ECG) traces. Disturbing the QT interval may lead to instability in the heart rhythm [8]. Patients with long QT syndrome (LQTS) exhibit a significant predisposition for the TdP type's cardiac arrhythmia [9]. A prolongation of the cardiac action potential and the QT interval has been associated with loss of function or drug-trapping inside the central cavity of the Kv11.1 [10] potassium channel, which is encoded by hERG (human Ether-a-go-go Related Gene) and carries the rapid delayed rectifier potassium current (I Kr ) [7,11]. This channel has a tetrameric structure formed by co-assembly of four identical subunits, each composed of six helical transmembrane domains (denoted S1-S6). The S4 domain contains six positive charges, typical for voltage-gated K + channels [12]. The channel pore is asymmetrical, and its dimensions change depending on its state (open-closed-inactivated). The hERG channel has been shown to interact with a wide range of drugs owing to the unique shape of the ligand-binding site, its hydrophobic character and the large vestibule of the channel [13,14]. The risk tolerance for QT prolongation may vary significantly depending on the dose and indication of the drug. Documented hERG-blocking activity reduces the value of a molecule, as it increases the risk of clinical failure. It has also been estimated that about 60 % of drugs in development exhibit hERG block [11].
Various attempts have been made to predict the hERG inhibition potential of drugs in silico to avoid the synthesis of risky molecules [15]. When studying therapeutic areas and the safety margins regarding the free therapeutic plasma concentration of drugs [16], it was found that a wide variety of drugs, including antiarrhythmic, antibacterial, antipsychotic and pain-killer drugs showed potential risk. As toxicity, just like potency, is dose-dependent, it is essential to relate the hERG inhibitory concentration to the drugs' free therapeutic plasma concentration. It was found that a less than 30-fold difference between the therapeutic and inhibitory concentration indicates a high risk. Redfern et al. [16] also investigated the relative value of preclinical cardiac electrophysiology data (in vitro and in vivo) for predicting the risk of TdP in drug clinical use. In vivo, telemetry experiments in non-rodents (typically dogs) are the ultimate preclinical test for cardiotoxicity. However, its high cost severely limits its use at the earlier discovery stage [17]. In vitro voltage-clamp techniques are widely used to provide real-time mechanistic information on ion channels [18]. The experiments are performed in mammalian cells transfected with the gene for hERG. The overwhelming majority of predictive hERG models have been built using mammalian patch clamp data. Techniques such as fluorescence-based assays with cells transfected with hERG and radioligand (typically dofetilide or MK-499) displacement assays [17] have also been successfully used. Since the success of any model building depends on the quality of the biological data, it was important to carefully select reliable and informative cardiotoxicity data for a wide variety of drugs in order to develop a continuous model. As the determination of the half-maximal inhibitory concentration (IC 50 ) value requires measurements of inhibitory activities at multiple concentrations, the IC 50 information was considered more reliable, and was selected over the inhibition type entries for positive/negative classification. Therefore, IC 50 of drugs and their log unit values (pIC 50 ) in response to hERG were collected from the literature.
Certain physicochemical properties of molecules have been recognized as early indicators of potential problems with early drug discovery compounds [19]. Besides lipophilicity [20], solubility [21] and permeability, biomimetic properties such as protein [22] and phospholipid binding [23] can be measured at the early stages of the drug discovery process [24]. The chromatographic technique provides an automated, high throughput and reliable measurement of important properties of the drug discovery compounds [25] that can be used to estimate later stage in vivo properties of compounds such as the volume of distribution, the unbound volume of distribution [26] and the drug efficiency [27]. Measurements can also estimate cell membrane partition and skin penetration of compounds based on chromatographic principles [28,29]. Various toxicity indicators have already been related to a compound's physicochemical properties, including hERG inhibition and hepatotoxicity [30]. The toxicity potential of compounds has been studied using the immobilised artificial membrane (IAM) chromatography [31]. In this work, several chromatography-based techniques were investigated to search for the properties of the compounds that could be used to predict their toxicity, with special emphasis on cardiotoxicity.
In this study, hERG pIC 50 data from a set of 90 diverse marketed drugs from a wide range of therapeutic areas and with different physicochemical properties were correlated with their measured biomimetic properties. The measurement of the biomimetic properties of the available drugs was conducted in our laboratories. Generic gradient HPLC methods were used to determine the chromatographic hydrophobicity Index (CHI) [32,33] using mobile phases at three different pH values. The protein binding of the compounds was measured using immobilised human serum albumin (HSA) [22], and alpha-1-acid-glycoprotein (AGP) stationary phases [34]. The phospholipid-binding was measured using the immobilised artificial membrane (IAM) stationary phase [23]. The aim was to establish relationships between the cardiotoxicity potential and the biomimetic binding properties of the drugs and to evaluate their predictive performance.

Experimental
The drugs were obtained from Sigma-Aldrich (Merck) and dissolved in dimethylsulfoxide (DMSO) at 10 mM concentration. The 10 µL stock solutions were diluted down to 100 µL before injecting them onto an Agilent 1100 HPLC system.

CHI lipophilicity measurements
The Chromatographic Hydrophobicity Index (CHI) was measured using the compounds' calibrated gradient retention times obtained from an Agilent 1100 HPLC fitted with a Gemini NX-C-18 column (Phenomenex Ltd Macclesfield, UK) with dimensions of 50 x 3 mm and 5 µm particle size. The mobile phase A was either 0.01 M formic acid (pH 2.6), a 50 mM ammonium acetate buffer with an adjusted pH of 7.4 or a 50 mM ammonium acetate buffer with an adjusted pH of 10.5. The mobile phase B was 100 % acetonitrile. The flow rate was 1.0 mL/min, with starting mobile phases of 0.01M formic acid (pH 2.6), 50 mM ammonium acetate adjusted to pH 7.4, and 50 mM ammonium acetate adjusted to pH 10.5 to determine the lipophilicity of the compounds at acidic, neutral and alkaline pHs, respectively. An acetonitrile linear gradient was used from 0 to 100 %. The acetonitrile concentration reached 100 % in 3.5 min. The 100 % acetonitrile mobile phase was maintained for an additional 1 min before it was returned to 0 % at 4.7 min. The gradient run cycle time was 6 min, with an additional equilibration time of 1 min before the next injection. The standard deviation in the retention time measurements is ±0.005 min from repeated injections. The retention time values for a standard set of compounds listed in Table1 were used to convert the drug retention times to CHI values. Table 1. The CHI values of the calibration set of compounds at three pHs [33]. These values were obtained by fitting the isocratically determined CHI values and the gradient retention time values. The standard error ranged from 0.1 to 0.8 CHI values. CHI approximates to the acetonitrile concentration when the compound elutes and can be converted to the octanol/water log D scale using CHI log D = 0.0525*CHI -1.467 [35]. Compound

Measurements of plasma protein binding using Chiralpak HSA and AGP columns
The protein binding measurements were carried out on Chiralpak HSA and Chiralpak AGP columns with dimensions of 3 x 50 mm and 5 μm particle size (Chiral Technologies Europe, France). The mobile phase was 50 mM ammonium acetate adjusted to pH 7.4, with a 1.2 mL/min flow rate. The standard isopropanol (IPA) gradient reached 35 % in 3.5 min, which was maintained for 1 min, before returning to 0 % at 4.7 min. The cycle time was 6 min with an additional 1 min re-equilibration time. The racemic warfarin showed separation of its enantiomers at retention times of 3.58 and 3.77 min. The precision of the retention time measurements was within ±0.01 min. The calibration set of compounds and their literature % binding data which were also converted to log k data are shown in Table 2. Table 2. The protein binding data of the marketed drug molecules that were used to calibrate the retention times obtained on the chiral protein columns (Chiralpak HSA and Chiralpak AGP). The % binding data obtained by equilibrium dialysis were converted to log k data using log k = log (%binding/(101-%binding)). Compound

Measurements of phospholipid-binding at pH 7.4 using an IAM column
The phospholipid-binding was measured using an IAM PC.DD2 column with dimensions of 100 x 4.6 mm (Regis Technologies Inc., Morton Grove, IL, USA). The gradient retention times were measured using a 50 mM ammonium acetate mobile phase with the pH adjusted to 7.4. The mobile phase flow rate was 1.5 mL/min. The acetonitrile gradient was applied to reach 90 % in 4.75 min. The 90 % acetonitrile concentration was maintained for an additional 0.5 min (to 5.25 min) and returned to 0 % by 5.5 min. The cycle time was 6 min, plus an additional 1 min equilibration time was applied while the injector prepared for the next injection. The gradient retention times were calibrated with the acetophenone homologues for which the CHI IAM values have been established using isocratic measurements [34]. Table 3 shows the calibration set of compounds and their predetermined CHI IAM values. The CHI Index on the IAM column (CHI IAM) approximates the acetonitrile concentration in the mobile phase when the compound elutes. CHI IAM values above 45 indicate strong phospholipid binding. The CHI IAM values have been converted to log k IAM values derived from the CHI IAM values using equation 1. It represents the equivalent value derived from several isocratic measurements with extrapolated log retention factors to 100 % aqueous mobile phase [23]. The log k IAM values can be converted to log K (IAM) values and show linear relationships with the octanol/water partition coefficients [26]. Equation 2 shows the conversion: Repeating the retention time measurements provided a standard deviation of ±0.005 min.

Database search for pIC 50 values
Assessing the risk of a blockade of the human ether à-go-go related gene potassium channels could greatly facilitate the development of therapeutic compounds and the withdrawal of hazardous marketed drugs. The development of highthroughput automated patch clamp assays has increased the amount of hERG-associated data available in public databases [17]. Integrated databases are now available using the ChEMBL and PubChem public databases. A large integrated database created by Sato et al. [36] has been used in this study. This database curates hERG-related data from in vitro assays, such as binding assays (radioligand replacement assay) and electrostatic assays (automated patch-clamp assays), in ChEMBL, PubChem, GOSTAR, NIH Chemical Genomics Center (NCGC) and hERGCentral and integrates them into the largest database about hERG inhibition. IC 50 values of the compounds and their pIC 50 values expressed in molar concentrations were carefully searched and collected from this database, which is freely available at https://drugdesign.riken.jp/hERGdb/. Data entries using inequality signs, NULL values and value ranges were excluded. In cases of differences in the reported data for the same compound, mean values were calculated and considered for the model building while outlier values were omitted when the deviation in the results was significant (data points not falling within three standard deviations of the mean). Calculated physicochemical properties ADME Boxes v.3.0 software (Pharma Algorithm) was used to calculate various physicochemical parameters of the investigated compounds such as octanol-water partition (log P) and distribution (log D) coefficients at the pH values of 7.4, hydrogen bond donor (HBD) and acceptor (HBA) groups, Abraham's hydrogen bond acidity (A) and basicity (B), total polar surface area (TPSA), molecular weight (MW), as well as the molecular fractions of positively charged (F+), negatively charged (F-) and zwitterionic species (Fz) at pH =7.4.

Statistical and visualisation software
JMP v13.0 (SAS Institute Inc) and SPSS 23.0 (IBM SPSS Statistics) were used for the statistical calculations and the stepwise regression analysis. For visualisation, Stardrop (Optibrium Ltd) chemically aware visualisation tools were used to create the plots. Table 4 contains the collected and quality checked pIC 50 data of the investigated 90 drug molecules with their generic names and the measured biomimetic HPLC data. The drugs used in the training set and test set are listed separately in alphabetical order.    Table 5 contains the calculated physicochemical properties of the investigated compounds. The test set listed separately in alphabetical order in the last part of the table.

Selection of the test set of compounds
Constructed toxicity models require external validation to prove their predictive ability. Hence, a test set, usually consisting of about 20 % of the entire set, is necessary to evaluate the established models in terms of their predictive performance [37]. For that reason, a principal component analysis using the calculated physicochemical properties of the compounds was performed. By plotting the first two principal components (Figure 1), four compounds from each quadrant were selected by taking into account the compounds' therapeutic areas to ensure the test set's diversity. The remaining compounds were used for modelling as the training set. Table 6 shows the therapeutic areas of the compounds selected as the test set.
The best model using only in silico calculated properties, the molecular weight (MW), the number of Hbond donors (HBD) and acceptor groups (HBA), the polar surface (TPSA), the logarithm of the calculated octanol/water partition coefficient of the neutral form and the combined ionised form of the molecules at It was found that log P is correlated better than log D with pIC 50 and only negatively charged molecular fraction F-stands as statistically significant additional physicochemical parameter next to log P. The statistical insignificance of F+ may be attributed to the fact that its positive sign due to ionization counterbalanced with its positive influence to pIC 50 . The models' statistical parameters are much worse when compared to the model using measured AGP binding (log k AGP) data and membrane partition (log k IAM) data.
The estimated pIC 50 values of the test set have been calculated using equation 3 and plotted in Figure 2. The test set compounds are marked with larger circles. It can be seen in Figure 2 that the majority of the positively charged compounds show a pIC 50 value greater than 5 log units in the in vitro hERG experiments. Acidic and zwitterionic compounds show only weak hERG inhibition. It is also interesting to note that the AGP binding data showed a strong correlation with the compounds binding to hERG channel receptors. The explanation for this may lie in the similarity of the two proteins. It was found [38] that the AGP binding site can be featured as a funnel-like structure. The side of the funnel represents a lipophilic region. The funnel's top width provides a steric hindrance for molecules wider than the funnel, while at the narrow end of the funnel are the negatively charged sialic acid residues that bind the positive charges if they fit into the deep end of the funnel. The structure is illustrated in Figure 3. The IAM binding, which shows the compound's membrane partition, was also significant in the model, which is not surprising as the ion channel receptor is in the membrane. The compound needs to have high membrane affinity to be able to approach the channel. The positive charge also promotes the binding to the negatively charged surface. Both the AGP and IAM stationary phases show strong shape selectivity, which is also essential to hERG inhibition. Although a wide variety of molecules show high pIC 50 values the shape of the molecule is very important because of the channel opening's well-defined size. This fact reduces the power of the in silico models if only 2D descriptors are used in the model building. As a result, the 3D description of the molecules would probably enhance the success of in silico models.

AGP binding
hERG K + ion-channel blockage The steric structure and the negatively charged surface of AGP and the hERG ion channel suggest strong similarities. Compounds that block the channel have to penetrate the cell and have a relatively high concentration in the cell membrane where the potassium ion channel can be found [39]. This explains the importance of the membrane-binding properties in the model, as shown in Figure 4.
Validation of both models was performed by predictions of the 16 compounds included in the test set. The results are presented in Table 7. It can be seen that the prediction of the test set was superior in the case of the model derived with the measured properties, and the residuals did not exceed double the model error (0.693). On the other hand, predictions from the model derived with the calculated properties exhibited much worse residuals in most cases, with the pIC 50 predictions of irbesartan and lamotrigine exceeding 1 log unit.

Conclusions
The hERG channel inhibition properties of drugs and drug discovery compounds are an important attribute as compounds with strong binding can cause cardiotoxic side effects during clinical trials. Early recognition of a compound's hERG inhibition potential is important to avoid the progression of compounds that fail later because of cardiotoxicity.
It has been demonstrated that two biomimetic HPLC property measurements can be used for screening molecules for hERG inhibition potential at an early stage of the drug discovery process. The model is based on the strong similarity between the AGP and hERG channel structures. Both attract positively charged compounds with a significant degree of lipophilicity. Both exhibit steric hindrance depending on the size and shape of the molecule being investigated. The membrane partition is also an important parameter as it reveals the membrane affinity of the compounds where the ion channel receptor is located. It has also been shown that two-dimensional physicochemical descriptors cannot provide an acceptable model for estimating the hERG pIC 50 of molecules.