Research Interests

  • Machine Learning

    Bayesian Methods, Kernel Methods

  • Computational Biology

    Cancer Biology, Emerging Infectious Diseases, Personalized Medicine

Research Summary

My research interests revolve around developing machine learning algorithms that enable scientific discovery from heterogeneous data. Machine learning methods are usually constrained by the quality of feature representations and similarity measures that describe objects. My main focus has been to learn good feature representations and similarity measures by developing theoretically well-founded algorithms that uncover underlying mechanisms of complex systems under consideration using my statistics and optimization background. I have focused on applications that lie at the intersection of theoretical and applied research, in particular those arising in computational biology.

In the future, machine learning will become even more widespread to analyze high-dimensional data coming from complex systems such as cancer. In this vein of research, my current research agenda integrates novel machine learning solutions into cancer biology. However, it is not yet very clear how to benefit from the results of these studies for practice changes at the bedside. There are three major challenges in cancer biology: (i) being able to analyze the rapidly increasing amount of high-throughput data with these computational methods, possibly incorporating prior knowledge as well, (ii) being able to develop computational methods that support personalized cancer therapies based on biological characterization of each patient, and (iii) being able to explore the molecular mechanisms of cancer to identify and validate novel biomarkers.

Publications

Sort by year:    

Nek2A Prevents Centrosome Clustering and Induces Cell Death in Cancer Cells via KIF2C Interaction

Batuhan Mert Kalkan, Selahattin Can Ozcan, Enes Cicek, Mehmet Gonen, Ceyda Acilan

JournalCell Death & Disease, vol. 15, p. 222, 2024

Abstract

Unlike normal cells, cancer cells frequently exhibit supernumerary centrosomes, leading to formation of multipolar spindles that can trigger cell death. Nevertheless, cancer cells with supernumerary centrosomes escape the deadly consequences of unequal segregation of genomic material by coalescing their centrosomes into two poles. This unique trait of cancer cells presents a promising target for cancer therapy, focusing on selectively attacking cells with supernumerary centrosomes. Nek2A is a kinase involved in mitotic regulation, including the centrosome cycle, where it phosphorylates linker proteins to separate centrosomes. In this study, we investigated if Nek2A also prevents clustering of supernumerary centrosomes, akin to its separation function. Reduction of Nek2A activity, achieved through knockout, silencing, or inhibition, promotes centrosome clustering, whereas its overexpression results in inhibition of clustering. Significantly, prevention of centrosome clustering induces cell death, but only in cancer cells with supernumerary centrosomes, both in vitro and in vivo. Notably, none of the known centrosomal (e.g., CNAP1, Rootletin, Gas2L1) or non-centrosomal (e.g., TRF1, HEC1) Nek2A targets were implicated in this machinery. Additionally, Nek2A operated via a pathway distinct from other proteins involved in centrosome clustering mechanisms, like HSET and NuMA. Through TurboID proximity labeling analysis, we identified novel proteins associated with the centrosome or microtubules, expanding the known interaction partners of Nek2A. KIF2C, in particular, emerged as a novel interactor, confirmed through coimmunoprecipitation and localization analysis. The silencing of KIF2C diminished the impact of Nek2A on centrosome clustering and rescued cell viability. Additionally, elevated Nek2A levels were indicative of better patient outcomes, specifically in those predicted to have excess centrosomes. Therefore, while Nek2A is a proposed target, its use must be specifically adapted to the broader cellular context, especially considering centrosome amplification. Discovering partners such as KIF2C offers fresh insights into cancer biology and new possibilities for targeted treatment.

Geolocation Risk Scores for Credit Scoring Models

Erdem Ünal, Uğur Aydın, Murat Koraş, Barış Akgün, Mehmet Gönen

ConferenceProceedings of the 9th International Conference on Machine Learning, Optimization, and Data Science (LOD 2023), pp. 34–44, 2023

Abstract

Customer location is considered as one of the most informative demographic data for predictive modeling. It has been widely used in various sectors including finance. Commercial banks use this information in the evaluation of their credit scoring systems. Generally, customer city and district are used as demographic features. Even if these features are quite informative, they are not fully capable of capturing socio-economical heterogeneity of customers within cities or districts. In this study, we introduced a micro-region approach alternative to this district or city approach. We created features based on characteristics of micro-regions and developed predictive credit risk models. Since models only used micro-region specific data, we were able to apply it to all possible locations and calculate risk scores of each micro-region. We showed their positive contribution to our regular credit risk models.

Epigenetic‐Focused CRISPR/Cas9 Screen Identifies (Absent, Small, or Homeotic)2‐Like Protein (ASH2L) as a Regulator of Glioblastoma Cell Survival

Ezgi Ozyerli-Goknar, Ezgi Yagmur Kala, Ali Cenk Aksu, Ipek Bulut, Ahmet Cingöz, Sheikh Nizamuddin, Martin Biniossek, Fidan Seker‐Polat, Tunc Morova, Can Aztekin, Sonia H. Y. Kung, Hamzah Syed, Nurcan Tuncbag, Mehmet Gönen, Martin Philpott, Adam P. Cribbs, Ceyda Acilan, Nathan A. Lack, Tamer T. Onder, H. T. Marc Timmers, Tugba Bagci-Onder

JournalCell Communication and Signaling, vol. 21, p. 328, 2023

Abstract

Background: Glioblastoma is the most common and aggressive primary brain tumor with extremely poor prognosis, highlighting an urgent need for developing novel treatment options. Identifying epigenetic vulnerabilities of cancer cells can provide excellent therapeutic intervention points for various types of cancers.

Method: In this study, we investigated epigenetic regulators of glioblastoma cell survival through CRISPR/Cas9 based genetic ablation screens using a customized sgRNA library EpiDoKOL, which targets critical functional domains of chromatin modifiers.

Results: Screens conducted in multiple cell lines revealed ASH2L, a histone lysine methyltransferase complex subunit, as a major regulator of glioblastoma cell viability. ASH2L depletion led to cell cycle arrest and apoptosis. RNA sequenc‐ ing and greenCUT&RUN together identified a set of cell cycle regulatory genes, such as TRA2B, BARD1, KIF20B, ARID4A and SMARCC1 that were downregulated upon ASH2L depletion. Mass spectrometry analysis revealed the interaction partners of ASH2L in glioblastoma cell lines as SET1/MLL family members including SETD1A, SETD1B, MLL1 and MLL2. We further showed that glioblastoma cells had a differential dependency on expression of SET1/MLL family members for survival. The growth of ASH2L‐depleted glioblastoma cells was markedly slower than controls in orthotopic in vivo models. TCGA analysis showed high ASH2L expression in glioblastoma compared to low grade gliomas and immuno‐ histochemical analysis revealed significant ASH2L expression in glioblastoma tissues, attesting to its clinical relevance. Therefore, high throughput, robust and affordable screens with focused libraries, such as EpiDoKOL, holds great prom‐ ise to enable rapid discovery of novel epigenetic regulators of cancer cell survival, such as ASH2L.

Conclusion: Together, we suggest that targeting ASH2L could serve as a new therapeutic opportunity for glioblastoma.

MOKPE: Drug–Target Interaction Prediction via Manifold Optimization Based Kernel Preserving Embedding

Oğuz C. Binatlı, Mehmet Gönen

JournalBMC Bioinformatics, vol. 24, p. 276, 2023.

Abstract

Background: In many applications of bioinformatics, data stem from distinct het- erogeneous sources. One of the well-known examples is the identification of drug–target interactions (DTIs), which is of significant importance in drug discovery. In this paper, we propose a novel framework, manifold optimization based kernel preserving embedding (MOKPE), to efficiently solve the problem of modeling heterogeneous data. Our model projects heterogeneous drug and target data into a unified embedding space by preserving drug–target interactions and drug–drug, target–target similarities simultaneously.

Results: We performed ten replications of ten-fold cross validation on four different drug–target interaction network data sets for predicting DTIs for previously unseen drugs. The classification evaluation metrics showed better or comparable performance compared to previous similarity-based state-of-the-art methods. We also evaluated MOKPE on predicting unknown DTIs of a given network. Our implementation of the proposed algorithm in R together with the scripts that replicate the reported experiments is publicly available at https://github.com/ocbinatli/mokpe.

Higher Rates of Cefiderocol Resistance among NDM Producing Klebsiella Bloodstream Isolates Applying EUCAST over CLSI Breakpoints

Burcu Isler, Cansel Vatansever, Berna Özer, Güle Çınar, Abdullah Tarık Aslan, Caitlin Falconer, Michelle J. Bauer, Brian Forde, Funda Şimşek, Necla Tülek, Hamiyet Demirkaya, Şirin Menekşe, Halis Akalin, İlker İnanç Balkan, Mehtap Aydın, Elif Tükenmez Tigen, Safiye Koçulu Demir, Mahir Kapmaz, Şiran Keske, Özlem Doğan, Çiğdem Arabacı, Serap Yağcı, Gülşen Hazırolan, Veli Oğuzalp Bakır, Mehmet Gönen, Neşe Saltoğlu, Alpay Azap, Özlem Azap, Murat Akova, Önder Ergönül, Füsun Can, David L. Paterson, Patrick N. A. Harris

JournalInfectious Diseases, vol. 55, no. 9, pp. 607–613, 2023.

Abstract

Background: Cefiderocol is generally active against carbapenem-resistant Klebsiella spp. (CRK) with higher MICs against metallo-beta-lactamase producers. There is a variation in cefiderocol interpretive criteria determined by EUCAST and CLSI. Our objective was to test CRK isolates against cefiderocol and compare cefiderocol susceptibilities using EUCAST and CLSI interpretive criteria.

Methods: A unique collection (n = 254) of mainly OXA-48-like- or NDM-producing CRK bloodstream isolates were tested against cefiderocol with disc diffusion (Mast Diagnostics, UK). Beta-lactam resistance genes and multilocus sequence types were identified using bioinformatics analyses on complete bacterial genomes.

Results: Median cefiderocol inhibition zone diameter was 24 mm (interquartile range [IQR] 24–26 mm) for all isolates and 18 mm (IQR 15–21 mm) for NDM producers. We observed significant variability between cefiderocol susceptibilities using EUCAST and CLSI breakpoints, such that 26% and 2% of all isolates, and 81% and 12% of the NDM producers were resistant to cefiderocol using EUCAST and CLSI interpretive criteria, respectively.

Conclusions: Cefiderocol resistance rates among NDM producers are high using EUCAST criteria. Breakpoint variability may have significant implications on patient outcomes. Until more clinical outcome data are available, we suggest using EUCAST interpretive criteria for cefiderocol susceptibility testing.

Effectiveness of Tocilizumab in Non-Intubated Cases with COVID-19: A Systematic Review and Meta-Analysis

Şiran Keske, Merve Akyol, Cem Tanrıöver, Batu Özlüşen, Rüştü Emre Akcan, Ulaş Güler, Bilgin Sait, Bahar Kaçmaz, Mehmet Gönen, Önder Ergönül

JournalInfection, vol. 51, p. 1619–1628, 2023.

Abstract

Purpose: Tocilizumab, a monoclonal IL-6 receptor blocker, is an effective agent for severe-to-critical cases of COVID-19; however, its target patients for the optimum use need to be detailed. We performed a systematic review and meta-analysis to define its effect among severely ill but non-intubated cases with COVID-19.

Methods: We searched PubMed, Scopus, Web of Science, MEDLINE, Cochrane Central Register of Controlled Trials (CENTRAL), Medrxiv, and Biorxiv until February 13, 2022, for non-intubated cases, and included randomized-controlled trials (RCT) based on bias assessment. The primary outcomes were the requirement of invasive mechanical ventilation and mortality. Random effect and fixed-effect models were used. The heterogeneity was measured using the χ2 and I2 statistics, with χ2 p ≤ 0.05 and I2 ≥ 50% indicating the presence of significant heterogeneity. We registered the study to the International Prospective Register of Systematic Reviews (PROSPERO) with the registration number CRD42021232575.

Results: Among 261 articles, 11 RCTs were included. The pooled analysis of the 11 RCTs demonstrated that the rate of mortality was significantly lower in the tocilizumab group than in the control group (20.0% and 24.2%, OR: 0.84, 95% CI 0.73–0.96, and heterogeneity I2 = 0%. p = 0.82.). The mechanical ventilation rate was lower in the tocilizumab group than the control group (27% vs 35.2%, OR: 0.76, 95% CI 0.67–0.86, and heterogeneity I2 = 6%. p = 0.39).

Conclusion: Among non-intubated severe COVID-19 cases, tocilizumab reduces the risk of invasive mechanical ventilation and mortality compared to standard-of-care treatment.

High Prevalence of ArmA-16S rRNA Methyltransferase Among Aminoglycoside Resistant Klebsiella pneumoniae Bloodstream Isolates

Burcu Isler, Caitlin Falconer, Cansel Vatansever, Berna Özer, Güle Çınar, Abdullah Tarık Aslan, Brian Forde, Patrick Harris, Funda Şimşek, Necla Tülek, Hamiyet Demirkaya, Şirin Menekşe, Halis Akalin, İlker İnanç Balkan, Mehtap Aydın, Elif Tükenmez Tigen, Safiye Koçulu Demir, Mahir Kapmaz, Şiran Keske, Özlem Doğan, Çiğdem Arabacı, Serap Yağcı, Gülşen Hazırolan, Veli Oğuzalp Bakır, Mehmet Gönen, Neşe Saltoğlu, Alpay Azap, Özlem Azap, Murat Akova, Önder Ergönül, Füsun Can, David L. Paterson

JournalJournal of Medical Microbiology, vol. 71, no. 12, p. 001621, 2022.

Abstract

Introduction: Aminoglycosides are used for the treatment of carbapenemase-producing Klebsiella pneumoniae (CPK) infections. 16S rRNA methyltransferases (RMTs) confer resistance to all aminoglycosides and are often cocarried with NDM.

Hypothesis/Gap Statement: There is a dart of studies looking at the aminoglycoside resistance mechanisms for invasive CPK isolates, particularly in OXA-48 endemic settings.

Aim: We aimed to determine the prevalence of RMTs and their association with beta lactamases and MLSTs amongst aminoglycoside-resistant CPK bloodstream isolates in an OXA-48 endemic setting.

Methodology: CPK isolates (n=181), collected as part of a multicentre cohort study, were tested for amikacin, gentamicin and tobramycin susceptibility using custom-made sensititre plates (GN2XF, Thermo Fisher Scientific). All isolates were previously subjected to whole-genome sequencing. Carbapenemases, RMTs, MLSTs and plasmid incompatibility groups were detected on the assembled genomes.

Results: Of the 181 isolates, 109 (60 %) were resistant to all three aminoglycosides, and 96 of 109 (88 %) aminoglycoside-resistant isolates carried an RMT (85 ArmA, 10 RmtC, 4 RmtF1; three isolates cocarried ArmA and RmtC). Main clonal types associated with ArmA were ST2096 (49/85, 58 %) and ST14 (24/85, 28 %), harbouring mainly OXA-232 and OXA-48 +NDM, respectively. RmtC was cocarried with NDM (5/10) on ST395, and NDM +OXA-48 or NDM +KPC (4/10) on ST14, ST15 and ST16. All RMT producers also carried CTX-M-15, and the majority cocarried SHV-106, TEM-150 and multiple other antibiotic resistance genes. The majority of the isolates harboured a combination of IncFIB, IncH and IncL/M type plasmids. Non-NDM producing isolates remained susceptible to ceftazidime-avibactam.

Conclusion: Aminoglycoside resistance amongst CPK bloodstream isolates is extremely common and mainly driven by clonal spread of ArmA carried on ST2096 and ST14, associated with OXA-232 and OXA48 +NDM carriage, respectively.

Identifying Tissue- and Cohort-Specific RNA Regulatory Modules in Cancer Cells Using Multitask Learning

Milad Mokharidoost, Philipp G. Maass, Mehmet Gönen

JournalCancers, vol. 14, no. 19, p. 4939, 2022

Abstract

MicroRNA (miRNA) alterations significantly impact the formation and progression of human cancers. miRNAs interact with messenger RNAs (mRNAs) to facilitate degradation or translational repression. Thus, identifying miRNA–mRNA regulatory modules in cohorts of primary tumor tissues are fundamental for understanding the biology of tumor heterogeneity and precise diagnosis and treatment. We established a multitask learning sparse regularized factor regression (MSRFR) method to determine key tissue- and cohort-specific miRNA–mRNA regulatory modules from expression profiles of tumors. MSRFR simultaneously models the sparse relationship between miRNAs and mRNAs and extracts tissue- and cohort-specific miRNA–mRNA regulatory modules separately. We tested the model’s ability to determine cohort-specific regulatory modules of multiple cancer cohorts from the same tissue and their underlying tissue-specific regulatory modules by extracting similarities between cancer cohorts (i.e., blood, kidney, and lung). We also detected tissue-specific and cohort-specific signatures in the corresponding regulatory modules by comparing our findings from various other tissues. We show that MSRFR effectively determines cancer-related miRNAs in cohort-specific regulatory modules, distinguishes tissue- and cohort-specific regulatory modules from each other, and extracts tissue-specific information from different cohorts of disease-related tissue. Our findings indicate that the MSRFR model can support current efforts in precision medicine to define tumor-specific miRNA–mRNA signatures.

Corporate Network Analysis Based on Graph Learning

Emre Atan, Ali Duymaz, Funda Sarısözen, Uğur Aydın, Murat Koraş, Barış Akgün, Mehmet Gönen

ConferenceProceedings of the 8th International Conference on Machine Learning, Optimization, and Data Science (LOD 2022), pp. 268–278, 2022

Abstract

We constructed a financial network based on the relationships of the customers in our database with our other customers or other bank customers using our large-scale data set of money transactions. There are two main aims in this study. Our first aim is to identify the most profitable customers by prioritizing companies in terms of centrality based on the volume of money transfers between companies. This requires acquiring new customers, deepening existing customers and activating inactive customers. Our second aim is to determine the effect of customers on related customers as a result of the financial deterioration in this network. In this study, while creating the network, a data set was created over money transfers between companies. Here, text similarity algorithms were used while trying to match the company title in the database with the title during the transfer. For customers who are not customers of our bank, information such as IBAN numbers are assigned as unique identifiers. We showed that the average profitability of the top 30% customers in terms of centrality is five times higher than the remaining customers. Besides, the variables we created to examine the effect of financial disruptions on other customers contributed an additional 1% Gini coefficient to the model that the bank is currently using even if it is difficult to contribute to a strong model that already works with a high Gini coefficient.

A Kernel-Based Multilayer Perceptron Framework to Identify Pathways Related to Cancer Stages

Marzieh Soleimanpoor, Milad Mokharidoost, Mehmet Gönen

ConferenceProceedings of the 8th International Conference on Machine Learning, Optimization, and Data Science (LOD 2022), pp. 62–77, 2022

Abstract

Standard machine learning algorithms have limited knowledge extraction capability in discriminating cancer stages based on genomic characterizations, due to the strongly correlated nature of high-dimensional genomic data. Moreover, activation of pathways plays a crucial role in the growth and progression of cancer from early-stage to late-stage. That is why we implemented a novel kernel-based neural network framework that integrates pathways and gene expression data using multiple kernels and discriminates early- and late-stages of cancers. Our goal is to identify the relevant molecular mechanisms of the biological processes which might be driving cancer progression. As the input of developed multilayer perceptron (MLP), we constructed kernel matrices on multiple views of expression profiles of primary tumors extracted from pathways. We used Hallmark and Pathway Interaction Database (PID) datasets to restrict the search area to interpretable solutions. We applied our algorithm to 12 cancer cohorts from the Cancer Genome Atlas (TCGA), including more than 5100 primary tumors. The results showed that our algorithm could extract meaningful and disease-specific mechanisms of cancers. We tested the predictive performance of our MLP algorithm and compared it against three existing classification algorithms, namely, random forests, support vector machines, and multiple kernel learning. Our MLP method obtained better or comparable predictive performance against these algorithms.

Spatial Prediction of COVID-19 Pandemic Dynamics in the United States

Çiğdem Ak, Alex D. Chitsazan, Mehmet Gönen, Ruth Etzioni, Aaron J. Grossberg

JournalISPRS International Journal of Geo-Information, vol. 11, no. 9, p. 470, 2022

Abstract

The impact of COVID-19 across the United States (US) has been heterogeneous, with rapid spread and greater mortality in some areas compared with others. We used geographically-linked data to test the hypothesis that the risk for COVID-19 was defined by location and sought to define which demographic features were most closely associated with elevated COVID-19 spread and mortality. We leveraged geographically-restricted social, economic, political, and demographic information from US counties to develop a computational framework using structured Gaussian process to predict county-level case and death counts during the pandemic’s initial and nationwide phases. After identifying the most predictive information sources by location, we applied an unsupervised clustering algorithm and topic modeling to identify groups of features most closely associated with COVID-19 spread. Our model successfully predicted COVID-19 case counts of unseen locations after examining case counts and demographic information of neighboring locations, with overall Pearson’s correlation coefficient and the proportion of variance explained as 0.96 and 0.84 during the initial phase and 0.95 and 0.87 during the nationwide phase, respectively. Aside from population metrics, presidential vote margin was the most consistently selected spatial feature in our COVID-19 prediction models. Urbanicity and 2020 presidential vote margins were more predictive than other demographic features. Models trained using death counts showed similar performance metrics. Topic modeling showed that counties with similar socioeconomic and demographic features tended to group together, and some of these feature sets were associated with COVID-19 dynamics. Clustering of counties based on these feature groups found by topic modeling revealed groups of counties that experienced markedly different COVID-19 spread. We conclude that topic modeling can be used to group similar features and identify counties with similar features in epidemiologic research.

Efficient Multitask Multiple Kernel Learning with Application to Cancer Research

Arezou Rahimi, Mehmet Gönen

JournalIEEE Transactions on Cybernetics, vol. 52, no. 9, pp. 8716–8728, 2022

Abstract

Multitask multiple kernel learning (MKL) algorithms combine the capabilities of incorporating different data sources into the prediction model and using the data from one task to improve the accuracy on others. However, these methods do not necessarily produce interpretable results. Restricting the solutions to the set of interpretable solutions increases the computational burden of the learning problem significantly, leading to computationally prohibitive run times for some important biomedical applications. That is why we propose a multitask MKL formulation with a clustering of tasks and develop a highly time-efficient solution approach for it. Our solution method is based on the Benders decomposition and treating the clustering problem as finding a given number of tree structures in a graph; hence, it is called the forest formulation. We use our method to discriminate early-stage and late-stage cancers using genomic data and gene sets and compare our algorithm against two other algorithms. The two other algorithms are based on different approaches for linearization of the problem while all algorithms make use of the cutting-plane method. Our results indicate that as the number of tasks and/or the number of desired clusters increase, the forest formulation becomes increasingly favorable in terms of computational performance.

Comparison of Ceftazidime-Avibactam Susceptibility Testing Methods Against OXA-48-like Carrying Klebsiella Blood Stream Isolates

Burcu Isler, Cansel Vatansever, Berna Özer, Güle Çınar, Abdullah Tarık Aslan, Adam Stewart, Peter Simos, Caitlin Falconer, Michelle J Bauer, Brian Forde, Patrick Harris, Funda Şimşek, Necla Tülek, Hamiyet Demirkaya, Şirin Menekşe, Halis Akalin, İlker İnanç Balkan, Mehtap Aydın, Elif Tükenmez Tigen, Safiye Koçulu Demir, Mahir Kapmaz, Şiran Keske, Özlem Doğan, Çiğdem Arabacı, Serap Yağcı, Gülşen Hazırolan, Veli Oğuzalp Bakır, Mehmet Gönen, Neşe Saltoğlu, Alpay Azap, Özlem Azap, Murat Akova, Önder Ergönül, David L. Paterson, Füsun Can

JournalDiagnostic Microbiology and Infectious Disease, vol. 104, no. 1, p. 115745, 2022

Abstract

Ceftazidime-avibactam exhibits good in vitro activity against carbapenem resistant Klebsiella carrying OXA-48-like enzymes. We tested two hundred unique carbapenem resistant Klebsiella blood stream isolates (71% with single OXA-48-like carbapenemases, including OXA-48, n = 62; OXA-232, n = 57; OXA-244, n = 17; OXA-181, n = 5) that were collected as part of a multicentre study against ceftazidime-avibactam using Etest (bioMérieux, Marcyl’Étoile, France), 10/4 μg disc (Thermo Fisher) and Sensititre Gram Negative EURGNCOL Plates (Lyophilized panels, Sensititre, Thermo Fisher) with the aim of comparing the performances of the Etest and disc to that of Sensititre. Ceftazidime-avibactam MIC50/90 was 2/>16 mg/L for the entire collection and was 2/4 mg/L for single OXA-48-like producers. Categorical and essential agreements between the Etest and Sensititre were 100% and 97%, respectively. Categorical agreement between the disc and Sensititre was 100%. Etest and 10/4 μg discs are suitable alternatives to Sensititre for ceftazidime-avibactam sensitivity testing for OXA-48-like producers.

Fast and Interpretable Genomic Data Analysis Using Multiple Approximate Kernel Learning

Ayyüce Begüm Bektaş, Çiğdem Ak, Mehmet Gönen

JournalBioinformatics, vol. 38, pp. i77–i83, 2022

Abstract

Motivation: Dataset sizes in computational biology have been increased drastically with the help of improved data collection tools and increasing size of patient cohorts. Previous kernel-based machine learning algorithms proposed for increased interpretability started to fail with large sample sizes, owing to their lack of scalability. To overcome this problem, we proposed a fast and efficient multiple kernel learning (MKL) algorithm to be particularly used with large-scale data that integrates kernel approximation and group Lasso formulations into a conjoint model. Our method extracts significant and meaningful information from the genomic data while conjointly learning a model for out-of-sample prediction. It is scalable with increasing sample size by approximating instead of calculating distinct kernel matrices.

Results: To test our computational framework, namely, Multiple Approximate Kernel Learning (MAKL), we demonstrated our experiments on three cancer datasets and showed that MAKL is capable to outperform the baseline algorithm while using only a small fraction of the input features. We also reported selection frequencies of approximated kernel matrices associated with feature subsets (i.e. gene sets/pathways), which helps to see their relevance for the given classification task. Our fast and interpretable MKL algorithm producing sparse solutions is promising for computational biology applications considering its scalability and highly correlated structure of genomic datasets, and it can be used to discover new biomarkers and new therapeutic guidelines.

Machine Learning as a Clinical Decision Support Tool for Patients with Acromegaly

Cem Sulu, Ayyüce Begüm Bektaş, Serdar Şahin, Emre Durcan, Zehra Kara, Ahmet Numan Demir, Hande Mefkure Özkaya, Necmettin Tanrıöver, Nil Çomunoğlu, Osman Kızılkılıç, Nurperi Gazioğlu, Mehmet Gönen, Pınar Kadıoğlu

JournalPituitary, vol. 25, no. 3, pp. 486–495, 2022

Abstract

Objective: To develop machine learning (ML) models that predict postoperative remission, remission at last visit, and resis- tance to somatostatin receptor ligands (SRL) in patients with acromegaly and to determine the clinical features associated with the prognosis.

Methods: We studied outcomes using the area under the receiver operating characteristics (AUROC) values, which were reported as the performance metric. To determine the importance of each feature and easy interpretation, Shapley Additive explanations (SHAP) values, which help explain the outputs of ML models, are used.

Results: One-hundred fifty-two patients with acromegaly were included in the final analysis. The mean AUROC values resulting from 100 independent replications were 0.728 for postoperative 3 months remission status classification, 0.879 for remission at last visit classification, and 0.753 for SRL resistance status classification. Extreme gradient boosting model demonstrated that preoperative growth hormone (GH) level, age at operation, and preoperative tumor size were the most important predictors for early remission; resistance to SRL and preoperative tumor size represented the most important pre- dictors of remission at last visit, and postoperative 3-month insulin-like growth factor 1 (IGF1) and GH levels (random and nadir) together with the sparsely granulated somatotroph adenoma subtype served as the most important predictors of SRL resistance.

Conclusions: ML models may serve as valuable tools in the prediction of remission and SRL resistance.

Characteristics and Outcomes of Carbapenemase Harbouring Carbapenem‐Resistant Klebsiella spp. Bloodstream Infections: A Multicentre Prospective Cohort Study in an OXA‐48 Endemic Setting

Burcu Isler, Berna Özer, Güle Çınar, Abdullah Tarık Aslan, Cansel Vatansever, Caitlin Falconer, İştar Dolapçı, Funda Şimşek, Necla Tülek, Hamiyet Demirkaya, Şirin Menekşe, Halis Akalin, İlker İnanç Balkan, Mehtap Aydın, Elif Tükenmez Tigen, Safiye Koçulu Demir, Mahir Kapmaz, Şiran Keske, Özlem Doğan, Çiğdem Arabacı, Serap Yağcı, Gülşen Hazırolan, Veli Oğuzalp Bakır, Mehmet Gönen, Mark D. Chatfield, Brian Forde, Neşe Saltoğlu, Alpay Azap, Özlem Azap, Murat Akova, David L. Paterson, Füsun Can, Önder Ergönül

JournalEuropean Journal of Clinical Microbiology & Infectious Diseases, vol. 41, no. 5, pp. 841–847, 2022

Abstract

A prospective, multicentre observational cohort study of carbapenem-resistant Klebsiella spp. (CRK) bloodstream infections was conducted in Turkey from June 2018 to June 2019. One hundred eighty-seven patients were recruited. Single OXA-48-like carbapenemases predominated (75%), followed by OXA-48-like/NDM coproducers (16%). OXA-232 constituted 31% of all OXA-48-like carbapenemases and was mainly carried on ST2096. Thirty-day mortality was 44% overall and 51% for ST2096. In the multivariate cox regression analysis, SOFA score and immunosuppression were significant predictors of 30-day mortality and ST2096 had a non-significant effect. All OXA-48-like producers remained susceptible to ceftazidime-avibactam.

A Meta-Analysis for the Role of Aminoglycosides and Tigecyclines in Combined Regimens Against Colistin- and Carbapenem-Resistant Klebsiella pneumoniae Bloodstream Infections

Yusuf Mert Demirlenk, Lal Sude Gücer, Duygu Uçku, Cem Tanrıöver, Merve Akyol, Zeynepgül Kalay, Erinç Barçın, Rüştü Emre Akcan, Füsun Can, Mehmet Gönen, Önder Ergönül

JournalEuropean Journal of Clinical Microbiology & Infectious Diseases, vol. 41, no. 5, pp. 761–769, 2022

Abstract

We aimed to describe the effect of aminoglycosides and tigecycline to reduce the mortality in colistin- and carbapenem-resistant Klebsiella pneumoniae (ColR-CR-Kp) infections. We included the studies with defined outcomes after active or non-active antibiotic treatment of ColR-CR-Kp infections. The active treatment was defined as adequate antibiotic use for at least 3 days (72 h) after the diagnosis of ColR-CR-Kp infection by culture. The Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) statement and the checklist of PRISMA 2020 was applied. Crude and adjusted odds ratios (OR) with 95% confidence interval (CI) were calculated and pooled in the random effects model. Adding aminoglycosides to the existing treatment regimen reduced overall mortality significantly (OR 0.34, 95% CI 0.20–0.58). Overall mortality was 34% in patients treated with aminoglycoside-combined regimens and was 60% in patients treated with non-aminoglycoside regimens. Treatment with tigecycline is not found to reduce mortality (OR: 0.76, 95% CI: 0.47–1.23). Our results suggest that aminoglycoside addition to the existing regimen of colistin- and carbapenem-resistant Klebsiella pneumoniae infections reduces mortality significantly.

Elimination of Healthcare Associated Acinetobacter baumannii Infection in a Highly Endemic Region

Onder Ergonul, Gizem Tokca, Şiran Keske, Ebru Donmez, Bahar Madran, Azize Kömür, Mehmet Gönen, Fusun Can

JournalInternational Journal of Infectious Diseases, vol. 114, pp. 11–14, 2022

Abstract

This paper describes the elimination of healthcare associated Acinetobacter baumannii infections in a highly endemic region. A prospective, observational study was performed between October 2012 and October 2017. Acinetobacter baumannii was isolated from 59 patients, and >95% similarity was demonstrated among isolates of seven patients (DiversiLab™, BioMérieux). Carbapenemase activity was detected in 15 out of 17 (88%) isolates, and all were OXA-23 type. The control of Acinetobacter baumannii outbreaks can be achieved by the close follow-up supported by molecular techniques, strict application of infection control measures, and isolation of the transferred patients.

Effectiveness of Favipiravir in COVID-19: A Live Systematic Review

Batu Özlüşen, Şima Kozan, Rüştü Emre Akcan, Mekselina Kalender, Doğukan Yaprak, İbrahim Batuhan Peltek, Şiran Keske, Mehmet Gönen, Önder Ergönül

JournalEuropean Journal of Clinical Microbiology & Infectious Diseases, vol. 40, pp. 2575–2583, 2021

Abstract

We performed a systematic review and meta-analysis for the effectiveness of Favipiravir on the fatality and the requirement of mechanical ventilation for the treatment of moderate to severe COVID-19 patients. We searched available literature and reported it by using PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. Until June 1, 2021, we searched PubMed, bioRxiv, medRxiv, ClinicalTrials.gov, Cochrane Central Register of Controlled Trials (CENTRAL), and Google Scholar by using the keywords “Favipiravir” and terms synonymous with COVID-19. Studies for Favipiravir treatment compared to standard of care among moderate and severe COVID-19 patients were included. Risk of bias assessment was performed using Revised Cochrane risk of bias tool for randomized trials (RoB 2) and ROBINS-I assessment tool for non-randomized studies. We defined the outcome measures as fatality and requirement for mechanical ventilation. A total of 2702 studies were identified and 12 clinical trials with 1636 patients were analyzed. Nine out of 12 studies were randomized controlled trials. Among the randomized studies, one study has low risk of bias, six studies have moderate risk of bias, and 2 studies have high risk of bias. Observational studies were identified as having moderate risk of bias and non-randomized study was found to have serious risk of bias. Our meta-analysis did not reveal any significant difference between the intervention and the comparator on fatality rate (OR 1.11, 95% CI 0.64–1.94) and mechanical ventilation requirement (OR 0.50, 95% CI 0.13–1.95). There is no significant difference in fatality rate and mechanical ventilation requirement between Favipiravir treatment and the standard of care in moderate and severe COVID-19 patients.

PrognosiT: Pathway/Gene Set-Based Tumour Volume Prediction Using Multiple Kernel Learning

Ayyüce Begüm Bektaş, Mehmet Gönen

JournalBMC Bioinformatics, vol. 22, p. 537, 2021

Abstract

Background: Identification of molecular mechanisms that determine tumour progression in cancer patients is a prerequisite for developing new disease treatment guidelines. Even though the predictive performance of current machine learning models is promising, extracting significant and meaningful knowledge from the data simultaneously during the learning process is a difficult task considering the high-dimensional and highly correlated nature of genomic datasets. Thus, there is a need for models that not only predict tumour volume from gene expression data of patients but also use prior information coming from pathway/gene sets during the learning process, to distinguish molecular mechanisms which play crucial role in tumour progression and therefore, disease prognosis.

Results: In this study, instead of initially choosing several pathways/gene sets from an available set and training a model on this previously chosen subset of genomic features, we built a novel machine learning algorithm, PrognosiT, that accomplishes both tasks together. We tested our algorithm on thyroid carcinoma patients using gene expression profiles and cancer-specific pathways/gene sets. Predictive performance of our novel multiple kernel learning algorithm (PrognosiT) was comparable or even better than random forest (RF) and support vector regression (SVR). It is also notable that, to predict tumour volume, PrognosiT used gene expression features less than one-tenth of what RF and SVR algorithms used.

Conclusions: PrognosiT was able to obtain comparable or even better predictive performance than SVR and RF. Moreover, we demonstrated that during the learning process, our algorithm managed to extract relevant and meaningful pathway/gene sets information related to the studied cancer type, which provides insights about its progression and aggressiveness. We also compared gene expressions of the selected genes by our algorithm in tumour and normal tissues, and we then discussed up- and down-regulated genes selected by our algorithm while learning, which could be beneficial for determining new biomarkers.

The Seroprevalence of SARS-CoV-2 Antibodies among Health Care Workers Before the Era of Vaccination: A Systematic Review and Meta-Analysis

İlker Kayı, Bahar Madran, Şiran Keske, Özge Karanfil, Jose Ramon Arribas, Natalia Psheniсhnaya, Nicola Petrosillo, Mehmet Gönen, Önder Ergönül

JournalClinical Microbiology and Infection, vol. 27, no. 9, pp. 1241–1249, 2021

Abstract

Background: The prevalence of SARS-CoV-2 infection among health care workers (HCWs) provide information to for the spread of COVID-19 within health care facilities, and the risk groups.

Objectives: We aimed to describe the rate of SARS-CoV-2 seroprevalence and its determinants among HCWs.

Data sources: We used Web of Science, PubMed, Scopus, MEDLINE, EBSCOhost and Cochrane Library.

Study eligibility criteria: We included the reports of SARS-CoV-2 seroprevalence with a sample size of minimum 1000 HCWs.

Methods: The study was registered at the International Prospective Register of Systematic Reviews (PROSPERO, no: CRD42021230456). We used PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) statement. The keywords were “COVID-19”, “SARS-CoV-2”, “Coronavirus”, “seroprevalence”, “health care workers” and “risk factors”.

Results: In total 4329 reports were retrieved, the duplications were removed; after filtering according to the title and abstract, 25 studies were selected. Risk of bias was assessed in 25 studies; it was low in 13 studies, medium in four studies, and high in eight studies. In meta-analysis by using the random effect model, the weighted average of seroprevalence was calculated as 8% (95% CI: 6-10%). The pooled seroprevalence rates of the selected variables that have a rate over the average were male HCWs with 9% (95% CI: 7-11%); HCWs from ethnic minorities with 13% (95% CI: 9-17%); high exposure 9% (95% CI: 6-13%); exposure to the virus outside the health care setting 22% (95% CI: 14-32%).

Conclusions: Our analysis indicate a SARS-CoV-2 seroprevalence rate of 8% among studies included >1000 HCWs for the year 2020, before vaccinations started. The most common risk factors associated with higher seroprevalence rate were ethnicity, male gender and having higher number of household contacts. Working as a frontline HCW was inconsistent in its association with higher seroprevalence.

Effectiveness of Different Types of Masks in Aerosol Dispersion in SARS-CoV-2 Infection

Gokhan Tanisali, Ahmet Sozak, Abdul Samet Bulut, Tolga Ziya Sander, Ozlem Dogan, Cağdas Dağ, Mehmet Gönen, Fusun Can, Hasan DeMirci, Onder Ergonul

JournalInternational Journal of Infectious Diseases, vol. 109, pp. 310–314, 2021

Abstract

Objective: To compare the effectiveness of different mask types in limiting the dispersal of coughed air.

Method: The Schlieren method with a single curved mirror was used in this study. Coughed air has a slightly higher temperature than ambient air, which generates a refractive index gradient. A curved mirror with a radius of curvature of 10 m and a diameter of 60 cm was used. The spread of the cough wavefront was investigated among five subjects wearing: (1) no mask; (2) a single surgical mask; (3) a double surgical mask; (4) a cloth mask; (5) a valveless N95 mask; and (6) a valved N95 mask.

Results: All mask types reduced the size of the contaminated region significantly. The percentage reduction in the cross-sectional area of the contaminated region for the same mask types on different subjects revealed by normalized data suggests that the fit of a mask plays an important role.

Conclusions: No significant difference in the spread of coughed air was found between the use of a single surgical mask or a double surgical mask. Cloth masks may be effective, depending on the quality of the cloth. Valved N95 masks exclusively protect the user. The fit of a mask is an important factor to minimize the contaminated region.

Assessment of Quarter Billion Primary Care Prescriptions from a Nationwide Antimicrobial Stewardship Program

Mehmet Gönen, Mesil Aksoy, Fatma İşli, Umut Emre Gürpınar, Pınar Göbel, Hakkı Gürsöz, Önder Ergönül

JournalScientific Reports, vol. 11, p. 14621, 2021

Abstract

We described the significance of systematic monitoring nationwide antimicrobial stewardship programs (ASPs) in primary care. All the prescriptions given by family physicians were recorded in Prescription Information System established by the Turkish Medicines and Medical Devices Agency of Ministry of Health. We calculated, for each prescription, “antibiotics amount” as number of boxes times number of items per box for medicines that belong to antiinfectives for systemic use (i.e., J01 block in the Anatomical Therapeutic Chemical Classification System). We compared the antibiotics amount before (2015) and after (2016) the extensive training programs for the family physicians. We included 266,389,209 prescriptions from state-operated family healthcare units (FHUs) between January 1, 2015 and December 31, 2016. These prescriptions were given by 26,313 individual family physicians in 22,518 FHUs for 50,713,181 individual patients. At least one antimicrobial was given in 37,024,232 (28.31%) prescriptions in 2015 and 36,154,684 (26.66%) prescriptions in 2016. The most common diagnosis was “acute upper respiratory infections (AURI)” (i.e., J00-J06 block in the 10th revision of the International Statistical Classification of Diseases and Related Health Problems) with 28.05%. The average antibiotics amount over prescriptions with AURI decreased in 79 out of 81 provinces, and overall rate of decrease in average antibiotics amount was 8.33%, where 28 and 53 provinces experienced decreases (range is between 28.63% and −3.05%) above and below this value, respectively. In the most successful province, the highest decrease in average amount of “other beta-lactam antibacterials” per prescription for AURI was 49.63% in January. Computational analyses on a big data set collected from a nationwide healthcare system brought a significant contribution in improving ASPs.

Virulence Determinants of Colistin-Resistant K. pneumoniae High-Risk Clones

Ozlem Dogan, Cansel Vatansever, Nazli Atac, Ozgur Albayrak, Sercin Karahuseyinoglu, Ozgun Ekin Sahin, Bilge Kaan Kilicoglu, Atalay Demiray, Onder Ergonul, Mehmet Gönen, Fusun Can

JournalBiology, vol. 10, p. 436, 2021

Abstract

We proposed the hypothesis that high-risk clones of colistin-resistant K. pneumoniae (ColR-Kp) possesses a high number of virulence factors and has enhanced survival capacity against the neutrophil activity. We studied virulence genes of ColR-Kp isolates and neutrophil response in 142 patients with invasive ColR-Kp infections. The ST101 and ST395 ColR-Kp infections had higher 30-day mortality (58%, p = 0.005 and 75%, p = 0.003). The presence of yersiniabactin biosynthesis gene (ybtS) and ferric uptake operon associated gene (kfu) were significantly higher in ST101 (99%, p ≤ 0.001) and ST395 (94%, p < 0.012). Being in ICU (OR: 7.9; CI: 1.43–55.98; p = 0.024), kfu (OR:27.0; CI: 5.67–179.65; p < 0.001) and ST101 (OR: 17.2; CI: 2.45–350.40; p = 0.01) were found to be predictors of 30-day mortality. Even the neutrophil uptake of kfu+-ybtS+ ColR-Kp was significantly higher than kfu–ybtS- ColR-Kp (phagocytosis rate: 78% vs. 65%, p < 0.001), and the kfu+-ybtS+ ColR-Kp survived more than kfuybtS- ColR-Kp (median survival index: 7.90 vs. 4.22; p = 0.001). The kfu+-ybtS+ ColR-Kp stimulated excessive NET formation. Iron uptake systems in high-risk clones of colistin-resistant K. pneumoniae enhance the success of survival against the neutrophil phagocytic defense and stimulate excessive NET formation. The drugs targeted to iron uptake systems would be a promising approach for the treatment of colistin-resistant high-risk clones of K. pneumoniae infections.

Protein Dynamics Analysis Identifies Candidate Cancer Driver Genes and Mutations in TCGA Data

Jan Fehmi Sayılgan, Türkan Haliloğlu, Mehmet Gönen

JournalProteins: Structure, Function, and Bioinformatics, vol. 89, pp. 721–730, 2021

Abstract

Recently, it has been showed that cancer missense mutations selectively target the neighborhood of hinge residues, which are key sites in protein dynamics. Here, we show that this approach can be extended to find previously unknown candidate mutations and genes. To this aim, we developed a computational pipeline to detect significantly enriched three-dimensional (3D) clustering of missense mutations around hinge residues. The hinge residues were detected by applying a Gaussian network model. By systematically analyzing the PanCancer compendium of somatic mutations in nearly 10000 tumors from the Cancer Genome Atlas, we identified candidate genes and mutations in addition to well known ones. For instance, we found significantly enriched 3D clustering of missense mutations in known cancer genes including CDK4, CDKN2A, TCL1A, and MAPK1. Beside these known genes, we also identified significantly enriched 3D clustering of missense mutations around hinge residues in PLA2G4A, which may lead to excessive phosphorylation of the extracellular signal-regulated kinases. Furthermore, we demonstrated that hinge-based features improves pathogenicity prediction for missense mutations. Our results show that the consideration of clustering around hinge residues can help us explain the functional role of the mutations in known cancer genes and identify candidate genes.

Trends and Factors Associated with Modification or Discontinuation of the Initial Antiretroviral Regimen During the First Year of Treatment in the Turkish HIV-TR Cohort, 2011–2017

Volkan Korten, Deniz Gökengin, Gülhan Eren, Taner Yıldırmak, Serap Gencer, Haluk Eraksoy, Dilara Inan, Figen Kaptan, Başak Dokuzoğuz, Ilkay Karaoglan, Ayşe Willke, Mehmet Gönen, Önder Ergönül, on behalf of the HIV-TR Study Group

JournalAIDS Research and Therapy, vol. 18, p. 4, 2021

Abstract

Background: There is limited evidence on the modification or stopping of antiretroviral therapy (ART) regimens, including novel antiretroviral drugs. The aim of this study was to evaluate the discontinuation of first ART before and after the availability of better tolerated and less complex regimens by comparing the frequency, reasons and associations with patient characteristics.

Methods: A total of 3019 ART-naive patients registered in the HIV-TR cohort who started ART between Jan 2011 and Feb 2017 were studied. Only the first modification within the first year of treatment for each patient was included in the analyses. Reasons were classified as listed in the coded form in the web-based database. Cumulative incidences were analysed using competing risk function and factors associated with discontinuation of the ART regimen were examined using Cox proportional hazards models and Fine-Gray competing risk regression models.

Results: The initial ART regimen was discontinued in 351 out of 3019 eligible patients (11.6%) within the first year. The main reason for discontinuation was intolerance/toxicity (45.0%), followed by treatment simplification (9.7%), patient willingness (7.4%), poor compliance (7.1%), prevention of future toxicities (6.0%), virologic failure (5.4%), and provider preference (5.4%). Non-nucleoside reverse transcriptase inhibitor (NNRTI)-based (aHR = 4.4, [95% CI 3.0–6.4]; p < 0.0001) or protease inhibitor (PI)-based regimens (aHR = 4.3, [95% CI 3.1–6.0]; p < 0.0001) relative to integrase strand transfer inhibitor (InSTI)-based regimens were significantly associated with ART discontinuation. ART initiated at a later period (2015-Feb 2017) (aHR = 0.6, [95% CI 0.4–0.9]; p < 0.0001) was less likely to be discontinued. A lower rate of treatment discontinuation for intolerance/toxicity was observed with InSTI-based regimens (2.0%) than with NNRTI- (6.6%) and PI-based regimens (7.5%) (p < 0.001). The percentage of patients who achieved HIV RNA < 200 copies/mL within 12 months of ART initiation was 91% in the ART discontinued group vs. 94% in the continued group (p > 0.05).

Conclusion: ART discontinuation due to intolerance/toxicity and virologic failure decreased over time. InSTI-based regimens were less likely to be discontinued than PI- and NNRTI-based ART.

Keywords: Antiretroviral therapy, Treatment modification, Integrase strand transfer inhibitor, Treatment outcome, Cohort study

National Case Fatality Rates of the COVID-19 Pandemic

Önder Ergönül, Merve Akyol, Cem Tanrıöver, Henning Tiemeier, Eskild Petersen, Nicola Petrosillo, Mehmet Gönen

JournalClinical Microbiology and Infection, vol. 27, no. 1, pp. 118–124, 2021

Abstract

Objectives: The case fatality rate (CFR) of coronavirus disease 2019 (COVID-19) varies significantly between countries. We aimed to describe the associations of health indicators with the national CFRs of COVID-19.

Methods: We identified health for each country indicators potentially associated with the national CFRs of COVID-19. We extracted data for 18 variables from international administrative data sources for 34 member countries of the Organization for Economic Cooperation and Development (OECD). We excluded the collinear variables and examined the 16 variables in multivariable analysis. A dynamic web-based model was developed to analyse and display the associations for the CFRs of COVID-19. We followed the Guideline for Accurate and Transparent Health Estimates Reporting (GATHER).

Results: In multivariable analysis, the variables significantly associated with the increased CFRs were percent of obesity in ages >18 years (β = 3.26, 95% CI = [1.20, 5.33], p = 0.003), tuberculosis incidence (β = 3.15, 95% CI = [1.09, 5.22], p = 0.004), duration (days) since first death due to COVID-19 (β = 2.89, 95% CI = [0.83, 4.96], p = 0.008), median age (β = 2.83, 95% CI = [0.76, 4.89], p = 0.009). The COVID-19 test rate (β = -3.54, 95% CI = [-5.60, -1.47], p = 0.002), hospital bed density (β = -2.47, 95% CI = [-4.54, -0.41], p = 0.021), and rural population ratio (β = -2.19, 95% CI = [-4.25, -0.13], p = 0.039) decreased the CFR.

Conclusions: The pandemic hits population-dense cities. Available hospital beds should be increased. Test capacity should be increased to enable more effective diagnostic tests. Older patients and patients with obesity and their caregivers should be warned about a potentially increased risk.

Improving Fraud Detection and Concept Drift Adaptation in Credit Card Transactions Using Incremental Gradient Boosting Trees

Barış Bayram, Bilge Köroğlu, Mehmet Gönen

ConferenceProceedings of the 19th IEEE International Conference on Machine Learning and Applications (ICMLA 2020), pp. 545–550, 2020

Abstract

Due to the increase in the use of credit cards in electronic shopping, card payments for online commerce have rapidly become a popular trend, which also led to the growth in the number of retailers. Because of these various online shopping options, a more frequent variation in the spending behaviors of the customers and purchasing trends of online markets, known as concept drift problem, can be observed over time which also causes an increase in the need for novel fraudulent strategies. This drifting problem may significantly hinder the effective performance of state-of-the-art fraud detection approaches in real credit card transaction data, which also has the imbalanced class distribution problem. In this study, a card-based incremental Gradient Boosting Tree (GBT) is investigated to detect credit card frauds and to adapt in real-time to drifts occurred in online transactions. The card-based incremental learning is achieved in which the transactions of the fraudulent credit cards reported in each day are incrementally learned by the GBT model. Therefore, the card-based incremental GBT model is compared with the regular GBT model, and retraining of a new transaction set formed by combining the previous set and the transactions of the cards reported as fraudulent. The experiments have been carried out on the 4-month real transaction data from December 2019 to March 2020 in which the concept drift problem occurred in December, dramatically affecting the performance of the GBT model. In these experiments, the improvements in the fraud detection performance have been realized in all months, and also the effectiveness of the card-based increment has been verified by comparing it with the transaction-based incremental learning that may cause catastrophic forgetting problem.

An Efficient Framework to Identify Key miRNA–mRNA Regulatory Modules in Cancer

Milad Mokharidoost, Mehmet Gönen

JournalBioinformatics, vol. 36, no. 26, pp. i592–i600, 2020

Abstract

Motivation: Micro RNAs (miRNAs) are known as the important components of RNA silencing and post-transcriptional gene regulation, and they interact with messenger RNAs (mRNAs) either by degradation or by translational repression. miRNA alterations have a significant impact on the formation and progression of human cancers. Accordingly, it is important to establish computational methods with high predictive performance to identify cancer-specific miRNA–mRNA regulatory modules.

Results: We presented a two-step framework to model miRNA–mRNA relationships and identify cancer-specific modules between miRNA and mRNA from their matched expression profiles of more than 9000 primary tumors. We first estimated the regulatory matrix between miRNA and mRNA expression profiles by solving multiple linear programming problems. We then formulated a unified regularized factor regression (RFR) model that simultaneously estimates the effective number of modules (i.e. latent factors) and extracts modules by decomposing regulatory matrix into two low-rank matrices. Our RFR model groups correlated miRNAs together and correlated mRNAs together, and also controls sparsity levels of both matrices. These attributes lead to interpretable results with high predictive performance. We applied our method on a very comprehensive data collection including 32 TCGA cancer types. To find the biological relevance of our approach, we performed functional gene set enrichment and survival analyses. A large portion of the identified modules are significantly enriched in Hallmark, PID and KEGG pathways/gene sets. To validate the identified modules, we also performed literature validation as well as validation using experimentally supported miRTarBase database.

Identifying Key miRNA–mRNA Regulatory Modules in Cancer Using Sparse Multivariate Factor Regression

Milad Mokharidoost, Mehmet Gönen

ConferenceProceedings of the 6th International Conference on Machine Learning, Optimization, and Data Science (LOD 2020), pp. 422–433, 2020

Abstract

The interactions between microRNAs (miRNAs) and messenger RNAs (mRNAs) are known to have a major effect on the formation and progression of cancer. In this study, we identified regulatory modules of 32 cancer types using a sparse multivariate factor regression model on matched miRNA and mRNA expression profiles of more than 9,000 primary tumors. We used an algorithm that decomposes the coefficient matrix into two low-rank matrices with separate sparsity-inducing penalty terms on each. The first matrix linearly transforms the predictors to a set of latent factors, and the second one regresses the responses using these factors. Our solution significantly outperformed another decomposition-based approach in terms of normalized root mean squared error in all 32 cohorts. We demonstrated the biological relevance of our results by performing survival and gene set enrichment analyses. The validation of overall results indicated that our solution is highly efficient for identifying key miRNA–mRNA regulatory modules.

A Multitask Multiple Kernel Learning Formulation for Discriminating Early- and Late-Stage Cancers

Arezou Rahimi, Mehmet Gönen

JournalBioinformatics, vol. 36, no. 12, pp. 3766–3772, 2020

Abstract

Motivation: Genomic information is increasingly being used in diagnosis, prognosis and treatment of cancer. The severity of the disease is usually measured by the tumor stage. Therefore, identifying pathways playing an important role in progression of the disease stage is of great interest. Given that there are similarities in the underlying mechanisms of different cancers, in addition to the considerable correlation in the genomic data, there is a need for machine learning methods that can take these aspects of genomic data into account. Furthermore, using machine learning for studying multiple cancer cohorts together with a collection of molecular pathways creates an opportunity for knowledge extraction.

Results: We studied the problem of discriminating early- and late-stage tumors of several cancers using genomic information while enforcing interpretability on the solutions. To this end, we developed a multitask multiple kernel learning (MTMKL) method with a co-clustering step based on a cutting-plane algorithm to identify the relationships between the input tasks and kernels. We tested our algorithm on 15 cancer cohorts and observed that, in most cases, MTMKL outperforms other algorithms (including random forests, support vector machine and single-task multiple kernel learning) in terms of predictive power. Using the aggregate results from multiple replications, we also derived similarity matrices between cancer cohorts, which are, in many cases, in agreement with available relationships reported in the relevant literature.

Androgen Receptor Binding Sites are Highly Mutated in Prostate Cancer

Tunç Morova, Daniel R. McNeill, Nada Lallous, Mehmet Gönen, Kush Dalal, David M. Wilson III, Attila Gürsoy, Özlem Keskin, Nathan A. Lack

JournalNature Communications, vol. 11, p. 832, 2020

Abstract

Androgen receptor (AR) signalling is essential in nearly all prostate cancers. Any alterations to AR-mediated transcription can have a profound effect on carcinogenesis and tumour growth. While mutations of the AR protein have been extensively studied, little is known about those somatic mutations that occur at the non-coding regions where AR binds DNA. Using clinical whole genome sequencing, we show that AR binding sites have a dramatically increased rate of mutations that is greater than any other transcription factor and specific to only prostate cancer. Demonstrating this may be common to lineage-specific transcription factors, estrogen receptor binding sites were also found to have elevated rate of mutations in breast cancer. We provide evidence that these mutations at AR binding sites, and likely other related transcription factors, are caused by faulty repair of abasic sites. Overall, this work demonstrates that non-coding AR binding sites are frequently mutated in prostate cancer and can impact enhancer activity.

A Prospective Prediction Tool for Understanding Crimean–Congo Haemorrhagic Fever Dynamics in Turkey

Çiğdem Ak, Önder Ergönül, Mehmet Gönen

JournalClinical Microbiology and Infection, vol. 26, no. 1, pp. 123.e1–123.e7, 2020

Abstract

Objectives: We aimed to develop a prospective prediction tool on Crimean–Congo haemorrhagic fever (CCHF) to identify geographic regions at risk. The tool could support public health decision makers in implementation of an effective control strategy in a timely manner.

Methods: We used monthly surveillance data between 2004 and 2015 to predict case counts between 2016 and 2017 prospectively. Turkish nationwide surveillance dataset collected by Ministry of Health contained 10,411 confirmed CCHF cases. We collected potential explanatory covariates about climate, land use, and animal and human population at risk to capture spatiotemporal transmission dynamics. We developed a structured Gaussian process algorithm and prospectively tested this tool predicting the future year’s cases given past years’ cases.

Results: We predicted the annual cases in 2016 and 2017 as 438 and 341, whereas the observed cases were 432 and 343, respectively. Pearson’s correlation coefficient and normalized root mean squared error values for 2016 and 2017 predictions were (0.83; 0.58) and (0.87; 0.52), respectively. The most important covariates were found to be the number of settlements with fewer than 25,000 inhabitants, latitude, longitude, and potential evapotranspiration (evaporation and transpiration).

Conclusions: Main driving factors of CCHF dynamics were human population at risk in rural areas, geographical dependency, and climate effect on ticks. Our model was able to prospectively predict the numbers of CCHF cases. Our proof-of-concept study also provided insight for understanding possible mechanisms of infectious diseases and found the important directions for practice and policy to combat against emerging infectious diseases.

Path2Surv: Pathway/Gene Set-Based Survival Analysis Using Multiple Kernel Learning

Onur Dereli, Ceyda Oğuz, Mehmet Gönen

JournalBioinformatics, vol. 35, no. 24, pp. 5137–5145, 2019

Abstract

Motivation: Survival analysis methods that integrate pathways/gene sets into their learning model could identify molecular mechanisms that determine survival characteristics of patients. Rather than first picking the predictive pathways/gene sets from a given collection and then training a predictive model on the subset of genomic features mapped to these selected pathways/gene sets, we developed a novel machine learning algorithm (Path2Surv) that conjointly performs these two steps using multiple kernel learning.

Results: We extensively tested our Path2Surv algorithm on 7655 patients from 20 cancer types using cancer-specific pathway/gene set collections and gene expression profiles of these patients. Path2Surv statistically significantly outperformed survival random forest (RF) on 12 out of 20 datasets and obtained comparable predictive performance against survival support vector machine (SVM) using significantly fewer gene expression features (i.e. less than 10% of what survival RF and survival SVM used).

The Fungal Metabolite Chaetocin is a Sensitizer for Pro-Apoptotic Therapies in Glioblastoma

Ezgi Ozyerli-Goknar, Ilknur Sur-Erdem, Fidan Seker, Ahmet Cingöz, Alisan Kayabolen, Zeynep Kahya, Fırat Uyulur, Melike Gezen, Nazife Tolay, Batu Erman, Mehmet Gönen, James Dunford, Udo Oppermann, Tugba Bagci-Onder

JournalCell Death & Disease, vol. 10, p. 894, 2019

Abstract

Glioblastoma Multiforme (GBM) is the most common and aggressive primary brain tumor. Despite recent developments in surgery, chemo- and radio-therapy, a currently poor prognosis of GBM patients highlights an urgent need for novel treatment strategies. TRAIL (TNF Related Apoptosis Inducing Ligand) is a potent anti-cancer agent that can induce apoptosis selectively in cancer cells. GBM cells frequently develop resistance to TRAIL which renders clinical application of TRAIL therapeutics inefficient. In this study, we undertook a chemical screening approach using a library of epigenetic modifier drugs to identify compounds that could augment TRAIL response. We identified the fungal metabolite chaetocin, an inhibitor of histone methyl transferase SUV39H1, as a novel TRAIL sensitizer. Combining low subtoxic doses of chaetocin and TRAIL resulted in very potent and rapid apoptosis of GBM cells. Chaetocin also effectively sensitized GBM cells to further pro-apoptotic agents, such as FasL and BH3 mimetics. Chaetocin mediated apoptosis sensitization was achieved through ROS generation and consequent DNA damage induction that involved P53 activity. Chaetocin induced transcriptomic changes showed induction of antioxidant defense mechanisms and DNA damage response pathways. Heme Oxygenase 1 (HMOX1) was among the top upregulated genes, whose induction was ROS-dependent and HMOX1 depletion enhanced chaetocin mediated TRAIL sensitization. Finally, chaetocin and TRAIL combination treatment revealed efficacy in vivo. Taken together, our results provide a novel role for chaetocin as an apoptosis priming agent and its combination with pro-apoptotic therapies might offer new therapeutic approaches for GBMs.

Identification of SERPINE1 as a Regulator of Glioblastoma Cell Dispersal with Transcriptome Profiling

Fidan Seker, Ahmet Cingoz, Ilknur Sur-Erdem, Nazli Erguder, Alp Erkent, Fırat Uyulur, Myvizhi Esai Selvan, Zeynep Hülya Gümüş, Mehmet Gönen, Halil Bayraktar, Hiroaki Wakimoto, Tugba Bagci-Onder

JournalCancers, vol. 11, p. 1651, 2019

Abstract: High mortality rates of glioblastoma (GBM) patients are partly attributed to the invasive behavior of tumor cells that exhibit extensive infiltration into adjacent brain tissue, leading to rapid, inevitable, and therapy-resistant recurrence. In this study, we analyzed transcriptome of motile (dispersive) and non-motile (core) GBM cells using an in vitro spheroid dispersal model and identified SERPINE1 as a modulator of GBM cell dispersal. Genetic or pharmacological inhibition of SERPINE1 reduced spheroid dispersal and cell adhesion by regulating cell-substrate adhesion. We examined TGFβ as a potential upstream regulator of SERPINE1 expression. We also assessed the significance of SERPINE1 in GBM growth and invasion using TCGA glioma datasets and a patient-derived orthotopic GBM model. SERPINE1 expression was associated with poor prognosis and mesenchymal GBM in patients. SERPINE1 knock-down in primary GBM cells suppressed tumor growth and invasiveness in the brain. Together, our results indicate that SERPINE1 is a key player in GBM dispersal and provide insights for future anti-invasive therapy design.

Keywords: GBM; transcriptome analysis; dispersal

Promoters of Colistin Resistance in Acinetobacter baumannii Infections

Elif Nurtop, Fulya Bayındır Bilman, Sirin Menekse, Ozlem Kurt Azap, Mehmet Gönen, Onder Ergonul, Fusun Can

JournalMicrobial Drug Resistance, vol. 25, no. 7, pp. 997–1002, 2019

Abstract

Objectives: We aimed to describe the mechanisms of colistin resistance in Acinetobacter baumannii.

Methods: Twenty-nine patients diagnosed with colistin resistant A. baumannii infection were included to the study. The mutations in pmrCAB, lpxA, lpxC, and lpxD genes, expressions of pmrCAB, carbapenemases and mcr-1 positivity were studied.

Results: Twenty-seven (93%) of the patients received IV colistin therapy during their stay, and the case fatality rate was 45%. All of the mutations in pmrC and pmrB were found to be accompanied with a mutation in lpxD. The most common mutations were I42V and L150F in pmrC (65%), E117K in lpxD (65%), A138T in pmrB (58.6%). The colistin minimum inhibitory concentrations (MICs) of the isolates having any of these four mutations were higher than the isolates with no mutations (p < 0.001). The two most common mutations in pmrC (I42V and L150F) were found to be associated with higher expressions of pmrA and pmrC and higher colistin MIC values (p = 0.010 and 0.031). All isolates were blaOXA-23 positive.

Conclusions: Coexistence of the lpxD mutation along with mutations in pmrCAB indicates synergistic function of these genes in development of colistin resistance in A. baumannii.

A Multitask Multiple Kernel Learning Algorithm for Survival Analysis with Application to Cancer Biology

Onur Dereli, Ceyda Oğuz, Mehmet Gönen

ConferenceProceedings of the 36th International Conference on Machine Learning (ICML 2019), pp. 1576–1585, 2019

Abstract

Predictive performance of machine learning algorithms on related problems can be improved using multitask learning approaches. Rather than performing survival analysis on each data set to predict survival times of cancer patients, we developed a novel multitask approach based on multiple kernel learning (MKL). Our multitask MKL algorithm both works on multiple cancer data sets and integrates cancer-related pathways/gene sets into survival analysis. We tested our algorithm, which is named as Path2MSurv, on the Cancer Genome Atlas data sets analyzing gene expression profiles of 7,655 patients from 20 cancer types together with cancer-specific pathway/gene set collections. Path2MSurv obtained better or comparable predictive performance when benchmarked against random survival forest, survival support vector machine, and single-task variant of our algorithm. Path2MSurv has the ability to identify key pathways/gene sets in predicting survival times of patients from different cancer types.

Protein Dynamics Analysis Reveals that Missense Mutations in Cancer-Related Genes Appear Frequently on Hinge-Neighboring Residues

Jan Fehmi Sayılgan, Türkan Haliloğlu, Mehmet Gönen

JournalProteins: Structure, Function, and Bioinformatics, vol. 87, pp. 512–519, 2019

Abstract

Missense mutations have various effects on protein structures, also leading to distorted protein dynamics that plausibly affects the function. We hypothesized that missense mutations in cancer-related genes selectively target hinge-neighboring residues that orchestrate collective structural dynamics. To test our hypothesis, we selected 69 cancer-related genes from the Cancer Gene Census database and their representative protein structures from the Protein Data Bank. We first identified the hinge residues in two global modes of motion by applying the Gaussian Network Model. We then showed that missense mutations are significantly enriched on hinge-neighboring residues in oncogenes and tumor suppressor genes. We observed that several oncogenes (eg, MAP2K1, PTPN11, and KRAS) and tumor suppressor genes (eg, EZH2, CDKN2C, and RHOA) strongly exhibit this phenomenon. This study highlights and rationalizes the functional importance of missense mutations on hinge-neighboring residues in cancer.

The Quality of ECG Data Acquisition, and Diagnostic Performance of a Novel Adhesive Patch for Ambulatory Cardiac Rhythm Monitoring in Arrhythmia Detection

M. Remzi Karaoğuz, Ece Yurtseven, Gamze Aslan, Bilgen Gülşen Deliormanlı, Ömer Adıgüzel, Mehmet Gönen, Ko-Mai Li, Elif Nur Yılmaz

JournalJournal of Electrocardiology, vol. 54, pp. 28–35, 2019

Abstract

Background: Short and long ambulatory electrocardiographic monitoring with different systems is a widely used method to detect cardiac arrhythmias. In this study, we aimed to evaluate the effectiveness of a novel monitoring device on cardiac arrhythmia detection.

Methods: We used two different protocols to evaluate device performance. For the first one, 36 healthy subjects were enrolled. The standard 12-lead, 24-hour Holter monitoring and the novel single lead electrocardiogram (ECG) Patch Monitor (EPM) device (BeyondCare®, Rooti Labs Ltd., Taipei City, Taiwan) were simultaneously applied to all subjects for 24 hours. The quality of ECG data acquisition of novel system was compared to that of standard Holter. The second phase included 73 patients that were referred from our outpatient arrhythmia clinic for evaluation of their symptoms relevant to the cardiac arrhythmias. Advanced algorithms, statistical methods (cross-correlation method, Pearson’s correlation coefficient, Bland-Altman plots) were used to process and verify the acquired data.

Results: The overall average beat per minute correlation between BeyondCare® and standard 12-lead Holter was found 98% in 33 healthy subjects. The mean percentage of invalid measurements in BeyondCare® was 1.6% while the Holter’s was 1.7%. In the second protocol of the study, prospective data from 67 patients who were referred for evaluation of their symptoms relevant to cardiac arrhythmias, showed that the mean BeyondCare® wear time was 4.7±0.5 days out of five total days per protocol. The mean analyzable wear time was 93.6%. The water-resistant design enabled 73.5% of the participants to take a shower. 7.3% of participants had minor skin irritations related to the electrodes. Among the patients with detected arrhythmia (40.2% of all patients), 29.6% had their first arrhythmia after the initial two days period. A clinically significant pause was detected in one patient, ventricular tachycardia was detected in four patients, and supraventricular tachycardia was detected in 15 patients. Paroxysmal atrial fibrillation was identified in seven patients. Three of them had their first episodes after the second day of monitoring.

Conclusion: BeyondCare® Patch was well-tolerated and allowed prolonged time periods for continuous ECG monitoring, may result in an improvement in clinical accuracy and detection of arrhythmias by cloud-based artificial intelligence operating system.

Keywords: Ambulatory electrocardiographic monitoring, ECG patch monitoring device, Cardiac arrhythmias

The Role of AcrAB–TolC Efflux Pumps on Quinolone Resistance of E. coli ST131

Nazli Atac, Ozlem Kurt‐Azap, Istar Dolapci, Aysegul Yesilkaya, Onder Ergonul, Mehmet Gonen, Fusun Can

JournalCurrent Microbiology, vol. 75, no. 12, pp. 1661–1666, 2018

Abstract

Escherichia coli ST131 is a cause for global concern because of its high multidrug resistance and several virulence factors. In this study, the contribution of acrAB–TolC efflux system of E. coli ST131 to fluoroquinolone resistance was evaluated. A total of nonrepetitive 111 ciprofloxacin-resistant E. coli isolates were included in the study. Multilocus sequence typing was used for genotyping. Expressions of acrA, acrB, and TolC efflux pump genes were measured by RT-PCR. Mutations in marA, gyrA, parC, and aac(6′)-lb-cr positivity were studied by Sanger sequencing. Sixty-four (57.7%) of the isolates were classified as ST131, and 52 (81.3%) of the ST131 isolates belonged to H30-Rx subclone. In ST131, CTX-M 15 positivity (73%) and aac(6′)-lb-cr carriage (75%) were significantly higher than those in non-ST131 (12.8% and 51%, respectively) (P < 0.05). The ampicillin–sulbactam (83%) resistance was higher, and gentamicin resistance (20%) was lower in ST131 than that in non-ST131 (64% and 55%, respectively) (P = 0.001 and P = 0.0002). Numbers of the isolates with MDR or XDR profiles did not differ in both groups. Multiple in-dels (up to 16) were recorded in all quinolone-resistant isolates. However, marA gene was more overexpressed in ST131 compared to that in non-ST131 (median 5.98 vs. 3.99; P = 0.0007). Belonging to H30-Rx subclone, isolation site, ciprofloxacin MIC values did not correlate with efflux pump expressions. In conclusion, the marA regulatory gene of AcrAB–TolC efflux pump system has a significant impact on quinolone resistance and progression to MDR profile in ST131 clone. Efflux pump inhibitors might be alternative drugs for the treatment of infections caused by E. coli ST131 if used synergistically in combination with antibiotics.

Structured Gaussian Processes with Twin Multiple Kernel Learning

Çiğdem Ak, Önder Ergönül, Mehmet Gönen

ConferenceProceedings of the 10th Asian Conference on Machine Learning (ACML 2018), pp. 65–80, 2018

Abstract

Vanilla Gaussian processes (GPs) have prohibitive computational needs for very large data sets. To overcome this difficulty, special structures in the covariance matrix, if exist, should be exploited using decomposition methods such as the Kronecker product. In this paper, we integrated the Kronecker decomposition approach into a multiple kernel learning (MKL) framework for GP regression. We first formulated a regression algorithm with the Kronecker decomposition of structured kernels for spatiotemporal modeling to learn the contribution of spatial and temporal features as well as learning a model for out-of-sample prediction. We then evaluated the performance of our proposed computational framework, namely, structured GPs with twin MKL, on two different real data sets to show its efficiency and effectiveness. MKL helped us extract relative importance of input features by assigning weights to kernels calculated on different subsets of temporal and spatial features.

Systematic Review and Meta-Analysis of Postexposure Prophylaxis for Crimean-Congo Hemorrhagic Fever Virus among Healthcare Workers

Önder Ergönül, Şiran Keske, Melis Gökçe Çeldir, İlayda Arjen Kara, Natalia Pshenichnaya, Gulzhan Abuova, Lucille Blumberg, Mehmet Gönen

JournalEmerging Infectious Diseases, vol. 24, no. 9, pp. 1642–1648, 2018

Abstract

We performed a systematic review and meta-analysis on the effectiveness of ribavirin use for the prevention of infection and death of healthcare workers exposed to patients with Crimean-Congo hemorrhagic fever virus (CCHFV) infection. Splashes with blood or bodily fluids (odds ratio [OR] 4.2), being a nurse or physician (OR 2.1), and treating patients who died from CCHFV infection (OR 3.8) were associated with healthcare workers acquiring CCHFV infection; 7% of the workers who received postexposure prophylaxis (PEP) with ribavirin and 89% of those who did not became infected. PEP with ribavirin reduced the odds of infection (OR 0.01, 95% CI 0–0.03), and ribavirin use ≤48 hours after symptom onset reduced the odds of death (OR 0.03, 95% CI 0–0.58). The odds of death increased 2.4-fold every day without ribavirin treatment. Ribavirin should be recommended as PEP and early treatment for workers at medium-to-high risk for CCHFV infection.

Spatiotemporal Prediction of Infectious Diseases Using Structured Gaussian Processes with Application to Crimean-Congo Hemorrhagic Fever

Çiğdem Ak, Önder Ergönül, İrfan Şencan, Mehmet Ali Torunoğlu, Mehmet Gönen

JournalPLoS Neglected Tropical Diseases, vol. 12, no. 8, p. e0006737, 2018

Abstract

Background: Infectious diseases are one of the primary healthcare problems worldwide, leading to millions of deaths annually. To develop effective control and prevention strategies, we need reliable computational tools to understand disease dynamics and to predict future cases. These computational tools can be used by policy makers to make more informed decisions.

Methodology/Principal findings: In this study, we developed a computational framework based on Gaussian processes to perform spatiotemporal prediction of infectious diseases and exploited the special structure of similarity matrices in our formulation to obtain a very efficient implementation. We then tested our framework on the problem of modeling Crimean–Congo hemorrhagic fever cases between years 2004 and 2015 in Turkey.

Conclusions/Significance: We showed that our Gaussian process formulation obtained better results than two frequently used standard machine learning algorithms (i.e., random forests and boosted regression trees) under temporal, spatial, and spatiotemporal prediction scenarios. These results showed that our framework has the potential to make an important contribution to public health policy makers.

Discriminating Early- and Late-Stage Cancers Using Multiple Kernel Learning on Gene Sets

Arezou Rahimi, Mehmet Gönen

JournalBioinformatics, vol. 34, no. 13, pp. i412–i421, 2018

Abstract

Motivation: Identifying molecular mechanisms that drive cancers from early to late stages is highly important to develop new preventive and therapeutic strategies. Standard machine learning algorithms could be used to discriminate early- and late-stage cancers from each other using their genomic characterizations. Even though these algorithms would get satisfactory predictive performance, their knowledge extraction capability would be quite restricted due to highly correlated nature of genomic data. That is why we need algorithms that can also extract relevant information about these biological mechanisms using our prior knowledge about pathways/gene sets.

Results: In this study, we addressed the problem of separating early- and late-stage cancers from each other using their gene expression profiles. We proposed to use a multiple kernel learning (MKL) formulation that makes use of pathways/gene sets (i) to obtain satisfactory/improved predictive performance and (ii) to identify biological mechanisms that might have an effect in cancer progression. We extensively compared our proposed MKL on gene sets algorithm against two standard machine learning algorithms, namely, random forests and support vector machines, on 20 diseases from the Cancer Genome Atlas cohorts for two different sets of experiments. Our method obtained statistically significantly better or comparable predictive performance on most of the datasets using significantly fewer gene expression features. We also showed that our algorithm was able to extract meaningful and disease-specific information that gives clues about the progression mechanism.

Pan-Cancer Transcriptional Signatures Predictive of Oncogenic Mutations Reveal that Fbw7 Regulates Cancer Cell Oxidative Metabolism

Ryan J. Davis*, Mehmet Gönen*, Daciana H. Margineantu*, Shlomo Handeli, Jherek Swanger, Pia Hoellerbauer, Patrick J. Paddison, Haiwei Gu, Daniel Raftery, Jonathan E. Grim, David M. Hockenbery, Adam A. Margolin, Bruce E. Clurman

*Joint first authors

JournalProceedings of the National Academy of Sciences of the United States of America, vol. 115, no. 21, pp. 5462–5467, 2018

Abstract

The Fbw7 (F-box/WD repeat-containing protein 7) ubiquitin ligase targets multiple oncoproteins for degradation and is commonly mutated in cancers. Like other pleiotropic tumor suppressors, Fbw7’s complex biology has impeded our understanding of how Fbw7 mutations promote tumorigenesis and hindered the development of targeted therapies. To address these needs, we employed a transfer learning approach to derive gene-expression signatures from The Cancer Gene Atlas datasets that predict Fbw7 mutational status across tumor types and identified the pathways enriched within these signatures. Genes involved in mitochondrial function were highly enriched in pan-cancer signatures that predict Fbw7 mutations. Studies in isogenic colorectal cancer cell lines that differed in Fbw7 mutational status confirmed that Fbw7 mutations increase mitochondrial gene expression. Surprisingly, Fbw7 mutations shifted cellular metabolism toward oxidative phosphorylation and caused context-specific metabolic vulnerabilities. Our approach revealed unexpected metabolic reprogramming and possible therapeutic targets in Fbw7-mutant cancers and provides a framework to study other complex, oncogenic mutations.

Impact of the ST101 Clone on Fatality among Patients with Colistin-Resistant Klebsiella pneumoniae Infection

Fusun Can, Sirin Menekse, Pelin Ispir, Nazlı Atac, Ozgur Albayrak, Tuana Demir, Doruk Can Karaaslan, Salih Nafiz Karahan, Mahir Kapmaz, Ozlem Kurt Azap, Funda Timurkaynak, Serap Simsek Yavuz, Seniha Basaran, Fugen Yoruk, Alpay Azap, Safiye Koculu, Nur Benzonana, Nathan A. Lack, Mehmet Gönen, Onder Ergonul

JournalJournal of Antimicrobial Chemotherapy, vol. 73, no. 5, pp. 1235–1241, 2018

Abstract

Objectives: We describe molecular characteristics of colistin resistance and its impact on patient mortality.

Methods: A prospective cohort study was performed in seven different Turkish hospitals. The genotype of each isolate was determined by MLST and repetitive extragenic palindromic PCR (rep-PCR). Alterations in the mgrB were detected by sequencing. Upregulation of pmrCAB, phoQ and pmrK was quantified by RT-PCR. mcr-1 and the genes encoding OXA-48, NDM-1 and KPC were amplified by PCR.

Results: A total of 115 patients diagnosed with colistin-resistant K. pneumoniae (ColR-Kp) infection were included. Patients were predominantly males (55%) with a median age of 63 (IQR 46–74) and the 30 day mortality rate was 61%. ST101 was the most common ST and accounted for 68 (59%) of the ColR-Kp. The 30 day mortality rate in patients with these isolates was 72%. In ST101, 94% (64/68) of the isolates had an altered mgrB gene, whereas the alteration occurred in 40% (19/47) of non-ST101 isolates. The OXA-48 and NDM-1 carbapenemases were found in 93 (81%) and 22 (19%) of the total 115 isolates, respectively. In multivariate analysis for the prediction of 30 day mortality, ST101 (OR 3.4, CI 1.46–8.15, P = 0.005) and ICU stay (OR 7.4, CI 2.23–29.61, P = 0.002) were found to be significantly associated covariates.

Conclusions: Besides ICU stay, ST101 was found to be a significant independent predictor of patient mortality among those infected with ColR-Kp. A significant association was detected between ST101 and OXA-48. ST101 may become a global threat in dissemination of colistin resistance and increased morbidity and mortality of K.pneumoniae infection.

A Community Challenge for Inferring Genetic Predictors of Gene Essentialities through Analysis of a Functional Screen of Cancer Cell Lines

Mehmet Gönen*, Barbara A. Weir*, Glenn S. Cowley*, Francisca Vazquez*, Yuanfang Guan*, Alok Jaiswal*, Masayuki Karasuyama*, Vladislav Uzunangelov*, Tao Wang*, Aviad Tsherniak, Sara Howell, Daniel Marbach, Bruce Hoff, Thea C. Norman, Antti Airola, Adrian Bivol, Kerstin Bunte, Daniel Carlin, Sahil Chopra, Alden Deran, Kyle Ellrott, Peddinti Gopalacharyulu, Kiley Graim, Samuel Kaski, Suleiman A. Khan, Yulia Newton, Sam Ng, Tapio Pahikkala, Evan Paull, Artem Sokolov, Hao Tang, Jing Tang, Krister Wennerberg, Yang Xie, Xiaowei Zhan, Fan Zhu, Broad-DREAM Community, Tero Aittokallio, Hiroshi Mamitsuka, Joshua M. Stuart, Jesse S. Boehm, David E. Root, Guanghua Xiao, Gustavo Stolovitzky, William C. Hahn, Adam A. Margolin

*Joint first authors

JournalCell Systems, vol. 5, no. 5, pp. 485–497, 2017

Summary

We report the results of a DREAM challenge designed to predict preferential/relative genetic vulnerabilities/essentialities based on a novel data set testing 98,000 shRNAs against 149 cancer cell lines. We analyzed the results of over 3,000 submissions over a period of four months. We found that algorithms combining essentiality data across multiple genes demonstrated increased accuracy; gene expression was the most informative molecular data; the identity of the gene being predicted was far more important than the modeling strategy; well-predicted genes and selected molecular features showed enrichment in functional categories; and frequently selected expression features correlate with survival in primary tumors. This study establishes benchmarks for gene essentiality prediction, presents a community resource for future comparison against this benchmark, and provides insights into factors influencing the ability to predict gene essentiality from functional genetic screens. This study demonstrated the value of releasing pre-publication data publicly to engage the community in an open research collaboration.

Cytokine Response in Crimean-Congo Hemorrhagic Fever Virus Infection

Önder Ergönül, Ceren Şeref, Şebnem Eren, Aysel Çelikbaş, Nurcan Baykam, Başak Dokuzoğuz, Mehmet Gönen, Füsun Can

JournalJournal of Medical Virology, vol. 89, no. 10, pp. 1707–1713, 2017

Abstract

Background: We described the predictive role of cytokines in fatality of Crimean-Congo Hemorrhagic Fever Virus (CCHFV) infection by using daily clinical sera samples.

Methods: Consequent serum samples of the selected patients in different severity groups and healthy controls were examined by using human cytokine 17-plex assay.

Results: We included 12 (23%) mild, 30 (58%) moderate, 10 (19%) severe patients, and 10 healthy volunteers. The mean age of the patients was 52 (sd 15), 52% were female. Forty-six patients (88%) received ribavirin. During disease course, the median levels of IL-6, IL-8, IL-10, IL-10/12, IFN-γ, MCP-1 and MIP-1b were found to be significantly higher among CCHF patients than the healthy controls. Within the first five days after onset of disease, among the fatal cases, the median levels of IL-6 and IL-8 were found to be significantly higher than the survived ones (Figure 3), and MCP-1 was elevated among fatal cases, but statistical significance was not detected.In receiver operating characteristic (ROC) analysis, IL-8 (92%), IL-6 (92%), MCP-1 (79%) were found to be the most significant cytokines in predicting the fatality rates in the early period of the disease (5 days).

Conclusion: IL-6 and IL-8 can predict the poor outcome, within the first five days of disease course. Elevated IL-6 and IL-8 levels within first five days could be used as prognostic markers.

Modeling Gene-Wise Dependencies Improves the Identification of Drug Response Biomarkers in Cancer Studies

Olga Nikolova, Russell Moser, Christopher Kemp, Mehmet Gönen, Adam A. Margolin

JournalBioinformatics, vol. 33, no. 9, pp. 1362–1369, 2017

Abstract

Motivation: In recent years, vast advances in biomedical technologies and comprehensive sequencing have revealed the genomic landscape of common forms of human cancer in unprecedented detail. The broad heterogeneity of the disease calls for rapid development of personalized therapies. Translating the readily available genomic data into useful knowledge that can be applied in the clinic remains a challenge. Computational methods are needed to aid these efforts by robustly analyzing genome-scale data from distinct experimental platforms for prioritization of targets and treatments.

Results: We propose a novel, biologically-motivated, Bayesian multitask approach, which explicitly models gene-centric dependencies across multiple and distinct genomic platforms. We introduce a genewise prior and present a fully Bayesian formulation of a group factor analysis model. In supervised prediction applications, our multitask approach leverages similarities in response profiles of groups of drugs that are more likely to be related to true biological signal, which leads to more robust performance and improved generalization ability. We evaluate the performance of our method on molecularly characterized collections of cell lines profiled against two compound panels, namely the Cancer Cell Line Encyclopedia and the Cancer Therapeutics Response Portal. We demonstrate that accounting for the gene-centric dependencies enables leveraging information from multi-omic input data and improves prediction and feature selection performance. We further demonstrate the applicability of our method in an unsupervised dimensionality reduction application by inferring genes essential to tumorigenesis in the pancreatic ductal adenocarcinoma and lung adenocarcinoma patient cohorts from The Cancer Genome Atlas.

Integrating Gene Set Analysis and Nonlinear Predictive Modeling of Disease Phenotypes Using a Bayesian Multitask Formulation

Mehmet Gönen

JournalBMC Bioinformatics, vol. 17, p. 1311, 2016

Abstract

Motivation: Identifying molecular signatures of disease phenotypes is studied using two mainstream approaches: (i) Predictive modeling methods such as linear classification and regression algorithms are used to find signatures predictive of phenotypes from genomic data, which may not be robust due to limited sample size or highly correlated nature of genomic data. (ii) Gene set analysis methods are used to find gene sets on which phenotypes are linearly dependent by bringing prior biological knowledge into the analysis, which may not capture more complex nonlinear dependencies. Thus, formulating an integrated model of gene set analysis and nonlinear predictive modeling is of great practical importance.

Results: In this study, we propose a Bayesian binary classification framework to integrate gene set analysis and nonlinear predictive modeling. We then generalize this formulation to multitask learning setting to model multiple related datasets conjointly. Our main novelty is the probabilistic nonlinear formulation that enables us to robustly capture nonlinear dependencies between genomic data and phenotype even with small sample sizes. We demonstrate the performance of our algorithms using repeated random subsampling validation experiments on two cancer and two tuberculosis datasets by predicting important disease phenotypes from genome-wide gene expression data. We are able to obtain comparable or even better predictive performance than a baseline Bayesian nonlinear algorithm and to identify sparse sets of relevant genes and gene sets on all datasets. We also show that our multitask learning formulation enables us to further improve the generalization performance and to better understand biological processes behind disease phenotypes.

Ultrasensitive Proteomic Quantitation of Cellular Signaling by Digitized Nanoparticle-Protein Counting

Thomas Jacob, Anupriya Agarwal, Damien Ramunno-Johnson, Thomas O’Hare, Mehmet Gönen, Jeffrey W. Tyner, Brian J. Druker, Tania Q. Vu

JournalScientific Reports, vol. 6, p. 28163, 2016

Abstract

Many important signaling and regulatory proteins are expressed at low abundance and are difficult to measure in single cells. We report a molecular imaging approach to quantitate protein levels by digitized, discrete counting of nanoparticle-tagged proteins. Digitized protein counting provides ultrasensitive molecular detection of proteins in single cells that surpasses conventional methods of quantitating total diffuse fluorescence, and offers a substantial improvement in protein quantitation. We implement this digitized proteomic approach in an integrated imaging platform, the single cell-quantum dot platform (SC-QDP), to execute sensitive single cell phosphoquantitation in response to multiple drug treatment conditions and using limited primary patient material. The SC-QDP: 1) identified pAKT and pERK phospho-heterogeneity and insensitivity in individual leukemia cells treated with a multi-drug panel of FDA-approved kinase inhibitors, and 2) revealed subpopulations of drug-insensitive CD34+ stem cells with high pCRKL and pSTAT5 signaling in chronic myeloid leukemia patient blood samples. This ultrasensitive digitized protein detection approach is valuable for uncovering subtle but important differences in signaling, drug insensitivity, and other key cellular processes amongst single cells.

AUC Maximization in Bayesian Hierarchical Models

Mehmet Gönen

ConferenceProceedings of the 22nd European Conference on Artificial Intelligence (ECAI 2016), pp. 21–27, 2016

Abstract

The area under the curve (AUC) measures such as the area under the receiver operating characteristics curve (AUROC) and the area under the precision-recall curve (AUPR) are known to be more appropriate than the error rate, especially, for imbalanced data sets. There are several algorithms to optimize AUC measures instead of minimizing the error rate. However, this idea has not been fully exploited in Bayesian hierarchical models owing to the difficulties in inference. Here, we formulate a general Bayesian inference framework, called Bayesian AUC Maximization (BAM), to integrate AUC maximization into Bayesian hierarchical models by borrowing the pairwise and listwise ranking ideas from the information retrieval literature. To showcase our BAM framework, we develop two Bayesian linear classifier variants for two ranking approaches and derive their variational inference procedures. We perform validation experiments on four biomedical data sets to demonstrate the better predictive performance of our framework over its error-minimizing counterpart in terms of average AUROC and AUPR values.

Understanding Emotional Impact of Images Using Bayesian Multiple Kernel Learning

He Zhang, Mehmet Gönen, Zhirong Yang, Erkki Oja

JournalNeurocomputing, vol. 165, pp. 3–13, 2015

Abstract

Affective classification and retrieval of multimedia such as audio, image, and video have become emerging research areas in recent years. The previous research focused on designing features and developing feature extraction methods. Generally, a multimedia content can be represented with different feature representations (i.e., views). However, the most suitable feature representation related to people׳s emotions is usually not known a priori. We propose here a novel Bayesian multiple kernel learning algorithm for affective classification and retrieval tasks. The proposed method can make use of different representations simultaneously (i.e., multiview learning) to obtain a better prediction performance than using a single feature representation (i.e., single-view learning) or a subset of features, with the advantage of automatic feature selections. In particular, our algorithm has been implemented within a multilabel setup to capture the correlation between emotions, and the Bayesian formulation enables our method to produce probabilistic outputs for measuring a set of emotions triggered by a single image. As a case study, we perform classification and retrieval experiments with our algorithm for predicting people׳s emotional states evoked by images, using generic low-level image features. The empirical results with our approach on the widely-used International Affective Picture System (IAPS) data set outperform several existing methods in terms of classification performance and results interpretability.

A Community Effort to Assess and Improve Drug Sensitivity Prediction Algorithms

James C. Costello*, Laura M. Heiser*, Elisabeth Georgii*, Mehmet Gönen, Michael P. Menden, Nicholas J. Wang, Mukesh Bansal, Muhammad Ammad-ud-din, Petteri Hintsanen, Suleiman A. Khan, John-Patrick Mpindi, Olli Kallioniemi, Antti Honkela, Tero Aittokallio, Krister Wennerberg, NCI DREAM Community, James J. Collins, Dan Gallahan, Dinah Singer, Julio Saez-Rodriguez, Samuel Kaski, Joe W. Gray, Gustavo Stolovitzky

*Joint first authors

JournalNature Biotechnology, vol. 32, no. 12, pp. 1202–1212, 2014

Abstract

Predicting the best treatment strategy from genomic information is a core goal of precision medicine. Here we focus on predicting drug response based on a cohort of genomic, epigenomic and proteomic profiling data sets measured in human breast cancer cell lines. Through a collaborative effort between the National Cancer Institute (NCI) and the Dialogue on Reverse Engineering Assessment and Methods (DREAM) project, we analyzed a total of 44 drug sensitivity prediction algorithms. The top-performing approaches modeled nonlinear relationships and incorporated biological pathway information. We found that gene expression microarrays consistently provided the best predictive power of the individual profiling data sets; however, performance was increased by including multiple, independent data sets. We discuss the innovations underlying the top-performing methodology, Bayesian multitask MKL, and we provide detailed descriptions of all methods. This study establishes benchmarks for drug sensitivity prediction and identifies approaches that can be leveraged for the development of new methods.

Kernelized Bayesian Matrix Factorization

Mehmet Gönen, Samuel Kaski

JournalIEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 10, pp. 2047–2060, 2014

Abstract

We extend kernelized matrix factorization with a full-Bayesian treatment and with an ability to work with multiple side information sources expressed as different kernels. Kernels have been introduced to integrate side information about the rows and columns, which is necessary for making out-of-matrix predictions. We discuss specifically binary output matrices but extensions to realvalued matrices are straightforward. We extend the state of the art in two key aspects: (i) A full-conjugate probabilistic formulation of the kernelized matrix factorization enables an efficient variational approximation, whereas full-Bayesian treatments are not computationally feasible in the earlier approaches. (ii) Multiple side information sources are included, treated as different kernels in multiple kernel learning which additionally reveals which side sources are informative. We then show that the framework can also be used for supervised and semi-supervised multilabel classification and multi-output regression, by considering samples and outputs as the domains where matrix factorization operates. Our method outperforms alternatives in predicting drug-protein interactions on two data sets. On multilabel classification, our algorithm obtains the lowest Hamming losses on 10 out of 14 data sets compared to five state-of-the-art multilabel classification algorithms. We finally show that the proposed approach outperforms alternatives in multi-output regression experiments on a yeast cell cycle data set.

Drug Susceptibility Prediction Against a Panel of Drugs Using Kernelized Bayesian Multitask Learning

Mehmet Gönen, Adam A. Margolin

JournalBioinformatics, vol. 30, no. 17, pp. i556–i563, 2014

Abstract

Motivation: Human immunodeficiency virus (HIV) and cancer require personalized therapies owing to their inherent heterogeneous nature. For both diseases, large-scale pharmacogenomic screens of molecularly characterized samples have been generated with the hope of identifying genetic predictors of drug susceptibility. Thus, computational algorithms capable of inferring robust predictors of drug responses from genomic information are of great practical importance. Most of the existing computational studies that consider drug susceptibility prediction against a panel of drugs formulate a separate learning problem for each drug, which cannot make use of commonalities between subsets of drugs.

Results: In this study, we propose to solve the problem of drug susceptibility prediction against a panel of drugs in a multitask learning framework by formulating a novel Bayesian algorithm that combines kernel-based non-linear dimensionality reduction and binary classification (or regression). The main novelty of our method is the joint Bayesian formulation of projecting data points into a shared subspace and learning predictive models for all drugs in this subspace, which helps us to eliminate off-target effects and drug-specific experimental noise. Another novelty of our method is the ability of handling missing phenotype values owing to experimental conditions and quality control reasons. We demonstrate the performance of our algorithm via cross-validation experiments on two benchmark drug susceptibility datasets of HIV and cancer. Our method obtains statistically significantly better predictive performance on most of the drugs compared with baseline single-task algorithms that learn drug-specific models. These results show that predicting drug susceptibility against a panel of drugs simultaneously within a multitask learning framework improves overall predictive performance over single-task learning approaches.

Integrative and Personalized QSAR Analysis in Cancer by Kernelized Bayesian Matrix Factorization

Muhammad Ammad-ud-din, Elisabeth Georgii, Mehmet Gönen, Tuomo Laitinen, Olli Kallioniemi, Krister Wennerberg, Antti Poso, Samuel Kaski

JournalJournal of Chemical Information and Modeling, vol. 54, no. 8, pp. 2347–2359, 2014

Abstract

With data from recent large-scale drug sensitivity measurement campaigns, it is now possible to build and test models predicting responses for more than one hundred anticancer drugs against several hundreds of human cancer cell lines. Traditional quantitative structure–activity relationship (QSAR) approaches focus on small molecules in searching for their structural properties predictive of the biological activity in a single cell line or a single tissue type. We extend this line of research in two directions: (1) an integrative QSAR approach predicting the responses to new drugs for a panel of multiple known cancer cell lines simultaneously and (2) a personalized QSAR approach predicting the responses to new drugs for new cancer cell lines. To solve the modeling task, we apply a novel kernelized Bayesian matrix factorization method. For maximum applicability and predictive performance, the method optionally utilizes genomic features of cell lines and target information on drugs in addition to chemical drug descriptors. In a case study with 116 anticancer drugs and 650 cell lines, we demonstrate the usefulness of the method in several relevant prediction scenarios, differing in the amount of available information, and analyze the importance of various types of drug features for the response prediction. Furthermore, after predicting the missing values of the data set, a complete global map of drug response is explored to assess treatment potential and treatment range of therapeutically interesting anticancer drugs.

Multi-Task and Multi-View Learning of User State

Melih Kandemir, Akos Vetek, Mehmet Gönen, Arto Klami, Samuel Kaski

JournalNeurocomputing, vol. 139, pp. 97–106, 2014

Abstract

Several computational approaches have been proposed for inferring the affective state of the user, motivated for example by the goal of building improved interfaces that can adapt to the user׳s needs and internal state. While fairly good results have been obtained for inferring the user state under highly controlled conditions, a considerable amount of work remains to be done for learning high-quality estimates of subjective evaluations of the state in more natural conditions. In this work, we discuss how two recent machine learning concepts, multi-view learning and multi-task learning, can be adapted for user state recognition, and demonstrate them on two data collections of varying quality. Multi-view learning enables combining multiple measurement sensors in a justified way while automatically learning the importance of each sensor. Multi-task learning, in turn, tells how multiple learning tasks can be learned together to improve the accuracy. We demonstrate the use of two types of multi-task learning: learning both multiple state indicators and models for multiple users together. We also illustrate how the benefits of multi-task learning and multi-view learning can be effectively combined in a unified model by introducing a novel algorithm.

Coupled Dimensionality Reduction and Classification for Supervised and Semi-Supervised Multilabel Learning

Mehmet Gönen

JournalPattern Recognition Letters, vol. 38, pp. 132–141, 2014

Abstract

Coupled training of dimensionality reduction and classification is proposed previously to improve the prediction performance for single-label problems. Following this line of research, in this paper, we first introduce a novel Bayesian method that combines linear dimensionality reduction with linear binary classification for supervised multilabel learning and present a deterministic variational approximation algorithm to learn the proposed probabilistic model. We then extend the proposed method to find intrinsic dimensionality of the projected subspace using automatic relevance determination and to handle semi-supervised learning using a low-density assumption. We perform supervised learning experiments on four benchmark multilabel learning data sets by comparing our method with baseline linear dimensionality reduction algorithms. These experiments show that the proposed approach achieves good performance values in terms of hamming loss, average AUC, macro F1, and micro F1 on held-out test data. The low-dimensional embeddings obtained by our method are also very useful for exploratory data analysis. We also show the effectiveness of our approach in finding intrinsic subspace dimensionality and semi-supervised learning tasks.

Localized Data Fusion for Kernel k-Means Clustering with Application to Cancer Biology

Mehmet Gönen, Adam A. Margolin

ConferenceAdvances in Neural Information Processing Systems 27 (NIPS 2014), pp. 1305–1313, 2014

Abstract

In many modern applications from, for example, bioinformatics and computer vision, samples have multiple feature representations coming from different data sources. Multiview learning algorithms try to exploit all these available informa- tion to obtain a better learner in such scenarios. In this paper, we propose a novel multiple kernel learning algorithm that extends kernel k-means clustering to the multiview setting, which combines kernels calculated on the views in a localized way to better capture sample-specific characteristics of the data. We demonstrate the better performance of our localized data fusion approach on a human colon and rectal cancer data set by clustering patients. Our method finds more relevant prognostic patient groups than global data fusion methods when we evaluate the results with respect to three commonly used clinical biomarkers.

Bayesian Multiview Dimensionality Reduction for Learning Predictive Subspaces

Mehmet Gönen, Gülefşan Bozkurt Gönen, Fikret Gürgen

ConferenceProceedings of the 21st European Conference on Artificial Intelligence (ECAI 2014), pp. 387–392, 2014

Abstract

Multiview learning basically tries to exploit different feature representations to obtain better learners. For example, in video and image recognition problems, there are many possible feature representations such as color- and texture-based features. There are two common ways of exploiting multiple views: forcing similarity (i) in predictions and (ii) in latent subspace. In this paper, we introduce a novel Bayesian multiview dimensionality reduction method coupled with supervised learning to find predictive subspaces and its inference details. Experiments show that our proposed method obtains very good results on image recognition tasks in terms of classification and retrieval performances.

Embedding Heterogeneous Data by Preserving Multiple Kernels

Mehmet Gönen

ConferenceProceedings of the 21st European Conference on Artificial Intelligence (ECAI 2014), pp. 381–386, 2014

Abstract

Heterogeneous data may arise in many real-life applications under different scenarios. In this paper, we formulate a general framework to address the problem of modeling heterogeneous data. Our main contribution is a novel embedding method, called multiple kernel preserving embedding (MKPE), which projects heterogeneous data into a unified embedding space by preserving cross-domain interactions and within-domain similarities simultaneously. These interactions and similarities between data points are approximated with Gaussian kernels to transfer local neighborhood information to the projected subspace. We also extend our method for out-of-sample embedding using a parametric formulation in the projection step. The performance of MKPE is illustrated on two tasks: (i) modeling biological interaction networks and (ii) cross-domain information retrieval. Empirical results of these two tasks validate the predictive performance of our algorithm.

Kernelized Bayesian Transfer Learning

Mehmet Gönen, Adam A. Margolin

ConferenceProceedings of the 28th AAAI Conference on Artificial Intelligence (AAAI 2014), pp. 1831–1839, 2014

Abstract

Transfer learning considers related but distinct tasks defined on heterogenous domains and tries to transfer knowledge between these tasks to improve generalization performance. It is particularly useful when we do not have sufficient amount of labeled training data in some tasks, which may be very costly, laborious, or even infeasible to obtain. Instead, learning the tasks jointly enables us to effectively increase the amount of labeled training data. In this paper, we formulate a kernelized Bayesian transfer learning framework that is a principled combination of kernel-based dimensionality reduction models with task-specific projection matrices to find a shared subspace and a coupled classification model for all of the tasks in this subspace. Our two main contributions are: (i) two novel probabilistic models for binary and multiclass classification, and (ii) very efficient variational approximation procedures for these models. We illustrate the generalization performance of our algorithms on two different applications. In computer vision experiments, our method outperforms the state-of-the-art algorithms on nine out of 12 benchmark supervised domain adaptation experiments defined on two object recognition data sets. In cancer biology experiments, we use our algorithm to predict mutation status of important cancer genes from gene expression profiles using two distinct cancer populations, namely, patient-derived primary tumor data and in-vitro-derived cancer cell line data. We show that we can increase our generalization performance on primary tumors using cell lines as an auxiliary data source.

Bayesian Supervised Dimensionality Reduction

Mehmet Gönen

JournalIEEE Transactions on Cybernetics, vol. 43, no. 6, pp. 2179–2189, 2013

Abstract

Dimensionality reduction is commonly used as a preprocessing step before training a supervised learner. However, coupled training of dimensionality reduction and supervised learning steps may improve the prediction performance. In this paper, we introduce a simple and novel Bayesian supervised dimensionality reduction method that combines linear dimensionality reduction and linear supervised learning in a principled way. We present both Gibbs sampling and variational approximation approaches to learn the proposed probabilistic model for multiclass classification. We also extend our formulation towards model selection using automatic relevance determination in order to find the intrinsic dimensionality. Classification experiments on three benchmark data sets show that the new model significantly outperforms seven baseline linear dimensionality reduction algorithms on very low dimensions in terms of generalization performance on test data. The proposed model also obtains the best results on an image recognition task in terms of classification and retrieval performances.

Supervised Multiple Kernel Embedding for Learning Predictive Subspaces

Mehmet Gönen

JournalIEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 10, pp. 2381–2389, 2013

Abstract

For supervised learning problems, dimensionality reduction is generally applied as a preprocessing step. However, coupled training of dimensionality reduction and supervised learning steps may improve the prediction performance. In this paper, we propose a novel dimensionality reduction algorithm coupled with a supervised kernel-based learner, called supervised multiple kernel embedding, that integrates multiple kernel learning to dimensionality reduction and performs prediction on the projected subspace with a joint optimization framework. Combining multiple kernels allows us to combine different feature representations and/or similarity measures toward a unified subspace. We perform experiments on one digit recognition and two bioinformatics data sets. Our proposed method significantly outperforms multiple kernel Fisher discriminant analysis followed by a standard kernel-based learner, especially on low dimensions.

Localized Algorithms for Multiple Kernel Learning

Mehmet Gönen, Ethem Alpaydın

JournalPattern Recognition, vol. 46, no. 3, pp. 795–807, 2013

Abstract

Instead of selecting a single kernel, multiple kernel learning (MKL) uses a weighted sum of kernels where the weight of each kernel is optimized during training. Such methods assign the same weight to a kernel over the whole input space, and we discuss localized multiple kernel learning (LMKL) that is composed of a kernel-based learning algorithm and a parametric gating model to assign local weights to kernel functions. These two components are trained in a coupled manner using a two-step alternating optimization algorithm. Empirical results on benchmark classification and regression data sets validate the applicability of our approach. We see that LMKL achieves higher accuracy compared with canonical MKL on classification problems with different feature representations. LMKL can also identify the relevant parts of images using the gating model as a saliency detector in image recognition problems. In regression tasks, LMKL improves the performance significantly or reduces the model complexity by storing significantly fewer support vectors.

Predicting Emotional States of Images Using Bayesian Multiple Kernel Learning

He Zhang, Mehmet Gönen, Zhirong Yang, Erkki Oja

ConferenceProceedings of the 20th International Conference on Neural Information Processing (ICONIP 2013), pp. 274–282, 2013

Abstract

Images usually convey information that can influence people’s emotional states. Such affective information can be used by search engines and social networks for better understanding the user’s preferences. We propose here a novel Bayesian multiple kernel learning method for predicting the emotions evoked by images. The proposed method can make use of different image features simultaneously to obtain a better prediction performance, with the advantage of automatically selecting important features. Specifically, our method has been implemented within a multilabel setup in order to capture the correlations between emotions. Due to its probabilistic nature, our method is also able to produce probabilistic outputs for measuring a distribution of emotional intensities. The experimental results on the International Affective Picture System (IAPS) dataset show that the proposed approach achieves a better classification performance and provides a more interpretable feature selection capability than the state-of-the-art methods.

Affective Abstract Image Classification and Retrieval Using Multiple Kernel Learning

He Zhang, Zhirong Yang, Mehmet Gönen, Markus Koskela, Jorma Laaksonen, Timo Honkela, Erkki Oja

ConferenceProceedings of the 20th International Conference on Neural Information Processing (ICONIP 2013), pp. 166–175, 2013

Abstract

Emotional semantic image retrieval systems aim at incorporating the user’s affective states for responding adequately to the user's interests. One challenge is to select features specific to image affect detection. Another challenge is to build effective learning models or classifiers to bridge the so-called "affective gap". In this work, we study the affective classification and retrieval of abstract images by applying multiple kernel learning framework. An image can be represented by different feature spaces and multiple kernel learning can utilize all these feature representations simultaneously (i.e., multiview learning), such that it jointly learns the feature representation weights and corresponding classifier in an intelligent manner. Our experimental results on two abstract image datasets demonstrate the advantage of the multiple kernel learning framework for image affect detection in terms of feature selection, classification performance, and interpretation.

Kernelized Bayesian Matrix Factorization

Mehmet Gönen, Suleiman A. Khan, Samuel Kaski

ConferenceProceedings of the 30th International Conference on Machine Learning (ICML 2013), pp. 864–872, 2013

Abstract

We extend kernelized matrix factorization with a fully Bayesian treatment and with an ability to work with multiple side information sources expressed as different kernels. Kernel functions have been introduced to matrix factorization to integrate side information about the rows and columns (e.g., objects and users in recommender systems), which is necessary for making out-of-matrix (i.e., cold start) predictions. We discuss specifically bipartite graph inference, where the output matrix is binary, but extensions to more general matrices are straightforward. We extend the state of the art in two key aspects: (i) A fully conjugate probabilistic formulation of the kernelized matrix factorization problem enables an efficient variational approximation, whereas fully Bayesian treatments are not computationally feasible in the earlier approaches. (ii) Multiple side information sources are included, treated as different kernels in multiple kernel learning that additionally reveals which side information sources are informative. Our method outperforms alternatives in predicting drug-protein interactions on two data sets. We then show that our framework can also be used for solving multilabel learning problems by considering samples and labels as the two domains where matrix factorization operates on. Our algorithm obtains the lowest Hamming loss values on 10 out of 14 multilabel classification data sets compared to five state-of-the-art multilabel learning algorithms.

Kernelized Bayesian Matrix Factorization

Mehmet Gönen, Muhammad Ammad-ud-din, Suleiman A. Khan, Samuel Kaski

WorkshopNIPS Workshop on Machine Learning in Computational Biology, 2013

Predicting Drug–Target Interactions from Chemical and Genomic Kernels Using Bayesian Matrix Factorization

Mehmet Gönen

JournalBioinformatics, vol. 28, no. 18, pp. 2304–2310, 2012

Abstract

Motivation: Identifying interactions between drug compounds and target proteins has a great practical importance in the drug discovery process for known diseases. Existing databases contain very few experimentally validated drug–target interactions and formulating successful computational methods for predicting interactions remains challenging.

Results: In this study, we consider four different drug–target interaction networks from humans involving enzymes, ion channels, G-protein-coupled receptors and nuclear receptors. We then propose a novel Bayesian formulation that combines dimensionality reduction, matrix factorization and binary classification for predicting drug–target interaction networks using only chemical similarity between drug compounds and genomic similarity between target proteins. The novelty of our approach comes from the joint Bayesian formulation of projecting drug compounds and target proteins into a unified subspace using the similarities and estimating the interaction network in that subspace. We propose using a variational approximation in order to obtain an efficient inference scheme and give its detailed derivations. Lastly, we demonstrate the performance of our proposed method in three different scenarios: (i) exploratory data analysis using low-dimensional projections, (ii) predicting interactions for the out-of-sample drug compounds and (iii) predicting unknown interactions of the given network.

Probabilistic and Discriminative Group-Wise Feature Selection Methods for Credit Risk Analysis

Gülefşan Bozkurt Gönen, Mehmet Gönen, Fikret Gürgen

JournalExpert Systems with Applications, vol. 39, no. 14, pp. 11709–11717, 2012

Abstract

Many financial organizations such as banks and retailers use computational credit risk analysis (CRA) tools heavily due to recent financial crises and more strict regulations. This strategy enables them to manage their financial and operational risks within the pool of financial institutes. Machine learning algorithms especially binary classifiers are very popular for that purpose. In real-life applications such as CRA, feature selection algorithms are used to decrease data acquisition cost and to increase interpretability of the decision process. Using feature selection methods directly on CRA data sets may not help due to categorical variables such as marital status. Such features are usually are converted into binary features using 1-of-k encoding and eliminating a subset of features from a group does not help in terms of data collection cost or interpretability. In this study, we propose to use the probit classifier with a proper prior structure and multiple kernel learning with a proper kernel construction procedure to perform group-wise feature selection (i.e., eliminating a group of features together if they are not helpful). Experiments on two standard CRA data sets show the validity and effectiveness of the proposed binary classification algorithm variants.

A Bayesian Multiple Kernel Learning Framework for Single and Multiple Output Regression

Mehmet Gönen

ConferenceProceedings of the 20th European Conference on Artificial Intelligence (ECAI 2012), pp. 354–359, 2012

Abstract

Multiple kernel learning algorithms are proposed to combine kernels in order to obtain a better similarity measure or to integrate feature representations coming from different data sources. Most of the previous research on such methods is focused on classification formulations and there are few attempts for regression. We propose a fully conjugate Bayesian formulation and derive a deterministic variational approximation for single output regression. We then show that the proposed formulation can be extended to multiple output regression. We illustrate the effectiveness of our approach on a single output benchmark data set. Our framework outperforms previously reported results with better generalization performance on two image recognition data sets using both single and multiple output formulations.

Bayesian Efficient Multiple Kernel Learning

Mehmet Gönen

ConferenceProceedings of the 29th International Conference on Machine Learning (ICML 2012), pp. 1–8, 2012

Abstract

Multiple kernel learning algorithms are proposed to combine kernels in order to obtain a better similarity measure or to integrate feature representations coming from different data sources. Most of the previous research on such methods is focused on classification formulations and there are few attempts for regression. We propose a fully conjugate Bayesian formulation and derive a deterministic variational approximation for single output regression. We then show that the proposed formulation can be extended to multiple output regression. We illustrate the effectiveness of our approach on a single output benchmark data set. Our framework outperforms previously reported results with better generalization performance on two image recognition data sets using both single and multiple output formulations.

Bayesian Supervised Multilabel Learning with Coupled Embedding and Classification

Mehmet Gönen

ConferenceProceedings of the 12th SIAM International Conference on Data Mining (SDM 2012), pp. 367–378, 2012

Abstract

Coupled training of dimensionality reduction and classification is proposed previously to improve the prediction performance for single-label problems. Following this line of research, in this paper, we introduce a novel Bayesian supervised multilabel learning method that combines linear dimensionality reduction with linear binary classification. We present a deterministic variational approximation approach to learn the proposed probabilistic model for multilabel classification. We perform experiments on four benchmark multilabel learning data sets by comparing our method with four baseline linear dimensionality reduction algorithms. Experiments show that the proposed approach achieves good performance values in terms of hamming loss, macro F1, and micro F1 on held-out test data. The low-dimensional embeddings obtained by our method are also very useful for exploratory data analysis.

A Localized MKL Method for Brain Classification with Known Intra-Class Variability

Aydın Ulaş, Mehmet Gönen, Umberto Castellani, Vittorio Murino, Marcella Bellani, Michele Tansella, Paolo Brambilla

WorkshopProceedings of the 3rd International Workshop on Machine Learning in Medical Imaging, pp. 152–159, 2012

Abstract

Automatic decisional systems based on pattern classification methods are becoming very important to support medical diagnosis. In general, the overall objective is to classify between healthy subjects and patients affected by a certain disease. To reach this aim, significant efforts have been spent in finding reliable biomarkers which are able to robustly discriminate between the two populations (i.e., patients and controls). However, in real medical scenarios there are many factors, like the gender or the age, which make the source data very heterogeneous. This introduces a large intra-class variation by affecting the performance of the classification procedure. In this paper we exploit how to use the knowledge on heterogeneity factors to improve the classification accuracy. We propose a Clustered Localized Multiple Kernel Learning (CLMKL) algorithm by encoding in the classication model the information on the clusters of apriory known stratifications.

Experiments are carried out for brain classification in Schizophrenia. We show that our algorithm performs clearly better than single kernel Support Vector Machines (SVMs), linear MKL algorithms and canonical Localized MKL algorithms when the gender information is considered as apriori knowledge.

Multiple Kernel Learning Algorithms

Mehmet Gönen, Ethem Alpaydın

JournalJournal of Machine Learning Research, vol. 12, no. Jul, pp. 2211–2268, 2011

Abstract

In recent years, several methods have been proposed to combine multiple kernels instead of using a single one. These different kernels may correspond to using different notions of similarity or may be using information coming from multiple sources (different representations or different feature subsets). In trying to organize and highlight the similarities and differences between them, we give a taxonomy of and review several multiple kernel learning algorithms. We perform experiments on real data sets for better illustration and comparison of existing algorithms. We see that though there may not be large differences in terms of accuracy, there is difference between them in complexity as given by the number of stored support vectors, the sparsity of the solution as given by the number of used kernels, and training time complexity. We see that overall, using multiple kernels instead of a single one is useful and believe that combining kernels in a nonlinear or data-dependent way seems more promising than linear combination in fusing information provided by simple linear kernels, whereas linear methods are more reasonable when combining complex Gaussian kernels.

Regularizing Multiple Kernel Learning Using Response Surface Methodology

Mehmet Gönen, Ethem Alpaydın

JournalPattern Recognition, vol. 44, no. 1, pp. 159–171, 2011

Abstract

In recent years, several methods have been proposed to combine multiple kernels using a weighted linear sum of kernels. These different kernels may be using information coming from multiple sources or may correspond to using different notions of similarity on the same source. We note that such methods, in addition to the usual ones of the canonical support vector machine formulation, introduce new regularization parameters that affect the solution quality and, in this work, we propose to optimize them using response surface methodology on cross-validation data. On several bioinformatics and digit recognition benchmark data sets, we compare multiple kernel learning and our proposed regularized variant in terms of accuracy, support vector count, and the number of kernels selected. We see that our proposed variant achieves statistically similar or higher accuracy results by using fewer kernel functions and/or support vectors through suitable regularization; it also allows better knowledge extraction because unnecessary kernels are pruned and the favored kernels reflect the properties of the problem at hand.

Multitask Learning Using Regularized Multiple Kernel Learning

Mehmet Gönen, Melih Kandemir, Samuel Kaski

ConferenceProceedings of the 18th International Conference on Neural Information Processing (ICONIP 2011), pp. 500–509, 2011

Abstract

Empirical success of kernel-based learning algorithms is very much dependent on the kernel function used. Instead of using a single fixed kernel function, multiple kernel learning (MKL) algorithms learn a combination of different kernel functions in order to obtain a similarity measure that better matches the underlying problem. We study multitask learning (MTL) problems and formulate a novel MTL algorithm that trains coupled but nonidentical MKL models across the tasks. The proposed algorithm is especially useful for tasks that have different input and/or output space characteristics and is computationally very efficient. Empirical results on three data sets validate the generalization performance and the efficiency of our approach.

Combining Data Sources Nonlinearly for Cell Nucleus Classification of Renal Cell Carcinoma

Mehmet Gönen, Aydın Ulaş, Peter Schüffler, Umberto Castellani, Vittorio Murino

WorkshopProceedings of the 1st International Workshop on Similarity-Based Pattern Analysis and Recognition, pp. 250–260, 2011

Abstract

In kernel-based machine learning algorithms, we can learn a combination of different kernel functions in order to obtain a similarity measure that better matches the underlying problem instead of using a single fixed kernel function. This approach is called multiple kernel learning (MKL). In this paper, we formulate a nonlinear MKL variant and apply it for nuclei classification in tissue microarray images of renal cell carcinoma (RCC). The proposed variant is tested on several feature representations extracted from the automatically segmented nuclei. We compare our results with single-kernel support vector machines trained on each feature representation separately and three linear MKL algorithms from the literature. We demonstrate that our variant obtains more accurate classifiers than competing algorithms for RCC detection by combining information from different feature representations nonlinearly.

Supervised Learning of Local Projection Kernels

Mehmet Gönen, Ethem Alpaydın

JournalNeurocomputing, vol. 73, no. 10–12, pp. 1694–1703, 2010

Abstract

We formulate a supervised, localized dimensionality reduction method using a gating model that divides up the input space into regions and selects the dimensionality reduction projection separately in each region. The gating model, the locally linear projections, and the kernel-based supervised learning algorithm which uses them in its kernels are coupled and their training is performed with an alternating optimization procedure. Our proposed local projection kernel projects a data instance into different feature spaces by using the local projection matrices, combines them with the gating model, and performs the dot product in the combined feature space. Empirical results on benchmark data sets for visualization and classification tasks validate the idea. The method is generalizable to regression estimation and novelty detection.

Cost-Conscious Multiple Kernel Learning

Mehmet Gönen, Ethem Alpaydın

JournalPattern Recognition Letters, vol. 31, no. 9, pp. 959–965, 2010

Abstract

Recently, it has been proposed to combine multiple kernels using a weighted linear sum. In certain applications, different kernels may be using different input representations and these methods do not consider neither the cost of acquiring them nor the cost of evaluating the kernels. We generalize the framework of Multiple Kernel Learning (MKL) for this cost-conscious methodology. On 12 benchmark data sets from the UCI repository, we compare MKL and its cost-conscious variants in terms of accuracy, support vector count, and total cost. Cost-conscious MKL achieves statistically similar accuracy results by using fewer support vectors/kernels by best trading off accuracy brought by each representation/kernel with the concomitant cost. We also test our approach on two popular bioinformatics data sets from MIPS comprehensive yeast genome database (CYGD) and see that integrating the cost factor into kernel combination allows us to obtain cheaper kernel combinations by using fewer active kernels and/or support vectors.

Localized Multiple Kernel Regression

Mehmet Gönen, Ethem Alpaydın

ConferenceProceedings of the 20th IAPR International Conference on Pattern Recognition (ICPR 2010), pp. 1425–1428, 2010

Abstract

Multiple kernel learning (MKL) uses a weighted combination of kernels where the weight of each kernel is optimized during training. However, MKL assigns the same weight to a kernel over the whole input space. Our main objective is the formulation of the localized multiple kernel learning (LMKL) framework that allows kernels to be combined with different weights in different regions of the input space by using a gating model. In this paper, we apply the LMKL framework to regression estimation and derive a learning algorithm for this extension. Canonical support vector regression may overfit unless the kernel parameters are selected appropriately; we see that even if provide more kernels than necessary, LMKL uses only as many as needed and does not overfit due to its inherent regularization.

Supervised and Localized Dimensionality Reduction from Multiple Feature Representations or Kernels

Mehmet Gönen, Ethem Alpaydın

WorkshopNIPS Workshop on New Directions in Multiple Kernel Learning, 2010

Abstract

We propose a supervised and localized dimensionality reduction method that combines multiple feature representations or kernels. Each feature representation or kernel is used where it is suitable through a parametric gating model in a supervised manner for efficient dimensionality reduction and classification, and local projection matrices are learned for each feature representation or kernel. The kernel machine parameters, the local projection matrices, and the gating model parameters are optimized using an alternating optimization procedure composed of kernel machine training and gradient-descent updates. Empirical results on benchmark data sets validate the method in terms of classification accuracy, smoothness of the solution, and ease of visualization.

Machine Learning Integration for Predicting the Effect of Single Amino Acid Substitutions on Protein Stability

Ayşegül Özen*, Mehmet Gönen*, Ethem Alpaydın, Türkan Haliloğlu

*Joint first authors

JournalBMC Structural Biology, vol. 9, p. 66, 2009

Abstract

Background: Computational prediction of protein stability change due to single-site amino acid substitutions is of interest in protein design and analysis. We consider the following four ways to improve the performance of the currently available predictors: (1) We include additional sequence- and structure-based features, namely, the amino acid substitution likelihoods, the equilibrium fluctuations of the alpha- and beta-carbon atoms, and the packing density. (2) By implementing different machine learning integration approaches, we combine information from different features or representations. (3) We compare classification vs. regression methods to predict the sign vs. the output of stability change. (4) We allow a reject option for doubtful cases where the risk of misclassification is high.

Results: We investigate three different approaches: early, intermediate and late integration, which respectively combine features, kernels over feature subsets, and decisions. We perform simulations on two data sets: (1) S1615 is used in previous studies, (2) S2783 is the updated version (as of July 2, 2009) extracted also from ProTherm. For S1615 data set, our highest accuracy using both sequence and structure information is 0.842 on cross-validation and 0.904 on testing using early integration. Newly added features, namely, local compositional packing and the mobility extent of the mutated residues, improve accuracy significantly with intermediate integration. For S2783 data set, we also train regression methods to estimate not only the sign but also the amount of stability change and apply risk-based classification to reject when the learner has low confidence and the loss of misclassification is high. The highest accuracy is 0.835 on cross-validation and 0.832 on testing using only sequence information. The percentage of false positives can be decreased to less than 0.005 by rejecting 10 per cent using late integration.

Conclusions: We find that in both early and late integration, combining inputs or decisions is useful in increasing accuracy. Intermediate integration allows assessing the contributions of individual features by looking at the assigned weights. Overall accuracy of regression is not better than that of classification but it has less false positives, especially when combined with the reject option. The server for stability prediction for three integration approaches and the data sets are available at http://www.prc.boun.edu.tr/appserv/prc/mlsta.

Multiple Kernel Machines Using Localized Kernels

Mehmet Gönen, Ethem Alpaydın

ConferenceSupplementary Proceedings of the 4th IAPR International Conference on Pattern Recognition in Bioinformatics (PRIB 2009), 2009

Abstract

Multiple kernel learning (MKL) uses a convex combination of kernels where the weight of each kernel is optimized during training. However, MKL assigns the same weight to a kernel over the whole input space. Localized multiple kernel learning (LMKL) framework extends the MKL framework to allow combining kernels with different weights in different regions of the input space by using a gating model. LMKL extracts the relative importance of kernels in each region whereas MKL gives their relative importance over the whole input space. In this paper, we generalize the LMKL framework with a kernel-based gating model and derive the learning algorithm for binary classification. Empirical results on toy classification problems are used to illustrate the algorithm. Experiments on two bioinformatics data sets are performed to show that kernel machines can also be localized in a data-dependent way by using kernel values as gating model features. The localized variant achieves significantly higher accuracy on one of the bioinformatics data sets.

Localized Multiple Kernel Learning for Image Recognition

Mehmet Gönen, Ethem Alpaydın

WorkshopNIPS Workshop on Understanding Multiple Kernel Learning Methods, 2009

Abstract

We review our work on localized multiple kernels (Gönen and Alpaydın, 2008, 2009) that allows kernels to be combined with different weights in different regions of the input space by using a gating model. We give example uses in image recognition for combining kernels of different representations and costs.

Multiclass Posterior Probability Support Vector Machines

Mehmet Gönen, Ayşe Gönül Tanuğur, Ethem Alpaydın

JournalIEEE Transactions on Neural Networks, vol. 19, no. 1, pp. 130–139, 2008

Abstract

Tao et al. have recently proposed the posterior probability support vector machine (PPSVM) which uses soft labels derived from estimated posterior probabilities to be more robust to noise and outliers. Tao et al.’s model uses a window-based density estimator to calculate the posterior probabilities and is a binary classifier. We propose a neighbor-based density estimator and also extend the model to the multiclass case. Our bias–variance analysis shows that the decrease in error by PPSVM is due to a decrease in bias. On 20 benchmark data sets, we observe that PPSVM obtains accuracy results that are higher or comparable to those of canonical SVM using significantly fewer support vectors.

Localized Multiple Kernel Learning

Mehmet Gönen, Ethem Alpaydın

ConferenceProceedings of the 25th International Conference on Machine Learning (ICML 2008), pp. 352–359, 2008

Abstract

Recently, instead of selecting a single kernel, multiple kernel learning (MKL) has been proposed which uses a convex combination of kernels, where the weight of each kernel is optimized during training. However, MKL assigns the same weight to a kernel over the whole input space. In this paper, we develop a localized multiple kernel learning (LMKL) algorithm using a gating model for selecting the appropriate kernel function locally. The localizing gating model and the kernel-based classifier are coupled and their optimization is done in a joint manner. Empirical results on ten benchmark and two bioinformatics data sets validate the applicability of our approach. LMKL achieves statistically similar accuracy results compared with MKL by storing fewer support vectors. LMKL can also combine multiple copies of the same kernel function localized in different parts. For example, LMKL with multiple linear kernels gives better accuracy results than using a single linear kernel on bioinformatics data sets.

Real-Time Shop Floor Control Implementations in BUFAIM Model Factory

Ümit Bilge, Ayçin Polat, Yavuz Tunç, Mehmet Gönen

ConferenceProceedings of the 15th International Conference on Flexible Automation and Intelligent Manufacturing (FAIM 2005), pp. 383–390, 2005