Research Interests

  • Machine Learning

    Bayesian Methods, Kernel Methods

  • Computational Biology

    Cancer Biology, Personalized Medicine

Research Summary

My research interests revolve around developing machine learning algorithms that enable scientific discovery from heterogeneous data. Machine learning methods are usually constrained by the quality of feature representations and similarity measures that describe objects. My main focus has been to learn good feature representations and similarity measures by developing theoretically well-founded algorithms that uncover underlying mechanisms of complex systems under consideration using my statistics and optimization background. I have focused on applications that lie at the intersection of theoretical and applied research, in particular those arising in computational biology.

In the future, machine learning will become even more widespread to analyze high-dimensional data coming from complex systems such as cancer. In this vein of research, my current research agenda integrates novel machine learning solutions into cancer biology. However, it is not yet very clear how to benefit from the results of these studies for practice changes at the bedside. There are three major challenges in cancer biology: (i) being able to analyze the rapidly increasing amount of high-throughput data with these computational methods, possibly incorporating prior knowledge as well, (ii) being able to develop computational methods that support personalized cancer therapies based on biological characterization of each patient, and (iii) being able to explore the molecular mechanisms of cancer to identify and validate novel biomarkers.

Publications

Sort by year:    

A Community Challenge for Inferring Genetic Predictors of Gene Essentialities through Analysis of a Functional Screen of Cancer Cell Lines

Mehmet Gönen, Barbara A. Weir, Glenn S. Cowley, Francisca Vazquez, Yuanfang Guan, Alok Jaiswal, Masayuki Karasuyama, Vladislav Uzunangelov, Tao Wang, Aviad Tsherniak, Sara Howell, Daniel Marbach, Bruce Hoff, Thea C. Norman, Antti Airola, Adrian Bivol, Kerstin Bunte, Daniel Carlin, Sahil Chopra, Alden Deran, Kyle Ellrott, Peddinti Gopalacharyulu, Kiley Graim, Samuel Kaski, Suleiman A. Khan, Yulia Newton, Sam Ng, Tapio Pahikkala, Evan Paull, Artem Sokolov, Hao Tang, Jing Tang, Krister Wennerberg, Yang Xie, Xiaowei Zhan, Fan Zhu, Broad-DREAM Community, Tero Aittokallio, Hiroshi Mamitsuka, Joshua M. Stuart, Jesse S. Boehm, David E. Root, Guanghua Xiao, Gustavo Stolovitzky, William C. Hahn, Adam A. Margolin

JournalCell Systems, in press, 2017

Summary

We report the results of a DREAM challenge designed to predict preferential/relative genetic vulnerabilities/essentialities based on a novel data set testing 98,000 shRNAs against 149 cancer cell lines. We analyzed the results of over 3,000 submissions over a period of four months. We found that algorithms combining essentiality data across multiple genes demonstrated increased accuracy; gene expression was the most informative molecular data; the identity of the gene being predicted was far more important than the modeling strategy; well-predicted genes and selected molecular features showed enrichment in functional categories; and frequently selected expression features correlate with survival in primary tumors. This study establishes benchmarks for gene essentiality prediction, presents a community resource for future comparison against this benchmark, and provides insights into factors influencing the ability to predict gene essentiality from functional genetic screens. This study demonstrated the value of releasing pre-publication data publicly to engage the community in an open research collaboration.

Keywords

cancer genomics, community challenge, crowdsourcing, functional screen, machine learning, oncogene

Cytokine Response in Crimean-Congo Hemorrhagic Fever Virus Infection

Önder Ergönül, Ceren Şeref, Şebnem Eren, Aysel Çelikbaş, Nurcan Baykam, Başak Dokuzoğuz, Mehmet Gönen, Füsun Can

JournalJournal of Medical Virology, vol. 89, no. 10, pp. 1707–1713, 2017

Abstract

Background: We described the predictive role of cytokines in fatality of Crimean-Congo Hemorrhagic Fever Virus (CCHFV) infection by using daily clinical sera samples.

Methods: Consequent serum samples of the selected patients in different severity groups and healthy controls were examined by using human cytokine 17-plex assay.

Results: We included 12 (23%) mild, 30 (58%) moderate, 10 (19%) severe patients, and 10 healthy volunteers. The mean age of the patients was 52 (sd 15), 52% were female. Forty-six patients (88%) received ribavirin. During disease course, the median levels of IL-6, IL-8, IL-10, IL-10/12, IFN-γ, MCP-1 and MIP-1b were found to be significantly higher among CCHF patients than the healthy controls. Within the first five days after onset of disease, among the fatal cases, the median levels of IL-6 and IL-8 were found to be significantly higher than the survived ones (Figure 3), and MCP-1 was elevated among fatal cases, but statistical significance was not detected.In receiver operating characteristic (ROC) analysis, IL-8 (92%), IL-6 (92%), MCP-1 (79%) were found to be the most significant cytokines in predicting the fatality rates in the early period of the disease (5 days).

Conclusion: IL-6 and IL-8 can predict the poor outcome, within the first five days of disease course. Elevated IL-6 and IL-8 levels within first five days could be used as prognostic markers.

Modeling Gene-Wise Dependencies Improves the Identification of Drug Response Biomarkers in Cancer Studies

Olga Nikolova, Russell Moser, Christopher Kemp, Mehmet Gönen, Adam A. Margolin

JournalBioinformatics, vol. 33, no. 9, pp. 1362–1369, 2017

Abstract

Motivation: In recent years, vast advances in biomedical technologies and comprehensive sequencing have revealed the genomic landscape of common forms of human cancer in unprecedented detail. The broad heterogeneity of the disease calls for rapid development of personalized therapies. Translating the readily available genomic data into useful knowledge that can be applied in the clinic remains a challenge. Computational methods are needed to aid these efforts by robustly analyzing genome-scale data from distinct experimental platforms for prioritization of targets and treatments.

Results: We propose a novel, biologically-motivated, Bayesian multitask approach, which explicitly models gene-centric dependencies across multiple and distinct genomic platforms. We introduce a genewise prior and present a fully Bayesian formulation of a group factor analysis model. In supervised prediction applications, our multitask approach leverages similarities in response profiles of groups of drugs that are more likely to be related to true biological signal, which leads to more robust performance and improved generalization ability. We evaluate the performance of our method on molecularly characterized collections of cell lines profiled against two compound panels, namely the Cancer Cell Line Encyclopedia and the Cancer Therapeutics Response Portal. We demonstrate that accounting for the gene-centric dependencies enables leveraging information from multi-omic input data and improves prediction and feature selection performance. We further demonstrate the applicability of our method in an unsupervised dimensionality reduction application by inferring genes essential to tumorigenesis in the pancreatic ductal adenocarcinoma and lung adenocarcinoma patient cohorts from The Cancer Genome Atlas.

Integrating Gene Set Analysis and Nonlinear Predictive Modeling of Disease Phenotypes Using a Bayesian Multitask Formulation

Mehmet Gönen

JournalBMC Bioinformatics, vol. 17, p. 1311, 2016

Abstract

Motivation: Identifying molecular signatures of disease phenotypes is studied using two mainstream approaches: (i) Predictive modeling methods such as linear classification and regression algorithms are used to find signatures predictive of phenotypes from genomic data, which may not be robust due to limited sample size or highly correlated nature of genomic data. (ii) Gene set analysis methods are used to find gene sets on which phenotypes are linearly dependent by bringing prior biological knowledge into the analysis, which may not capture more complex nonlinear dependencies. Thus, formulating an integrated model of gene set analysis and nonlinear predictive modeling is of great practical importance.

Results: In this study, we propose a Bayesian binary classification framework to integrate gene set analysis and nonlinear predictive modeling. We then generalize this formulation to multitask learning setting to model multiple related datasets conjointly. Our main novelty is the probabilistic nonlinear formulation that enables us to robustly capture nonlinear dependencies between genomic data and phenotype even with small sample sizes. We demonstrate the performance of our algorithms using repeated random subsampling validation experiments on two cancer and two tuberculosis datasets by predicting important disease phenotypes from genome-wide gene expression data. We are able to obtain comparable or even better predictive performance than a baseline Bayesian nonlinear algorithm and to identify sparse sets of relevant genes and gene sets on all datasets. We also show that our multitask learning formulation enables us to further improve the generalization performance and to better understand biological processes behind disease phenotypes.

Keywords

Gene set analysis, Nonlinear predictive modeling, Disease phenotypes, Multiple kernel learning, Cancer, Tuberculosis

Ultrasensitive Proteomic Quantitation of Cellular Signaling by Digitized Nanoparticle-Protein Counting

Thomas Jacob, Anupriya Agarwal, Damien Ramunno-Johnson, Thomas O’Hare, Mehmet Gönen, Jeffrey W. Tyner, Brian J. Druker, and Tania Q. Vu

JournalScientific Reports, vol. 6, p. 28163, 2016

Abstract

Many important signaling and regulatory proteins are expressed at low abundance and are difficult to measure in single cells. We report a molecular imaging approach to quantitate protein levels by digitized, discrete counting of nanoparticle-tagged proteins. Digitized protein counting provides ultrasensitive molecular detection of proteins in single cells that surpasses conventional methods of quantitating total diffuse fluorescence, and offers a substantial improvement in protein quantitation. We implement this digitized proteomic approach in an integrated imaging platform, the single cell-quantum dot platform (SC-QDP), to execute sensitive single cell phosphoquantitation in response to multiple drug treatment conditions and using limited primary patient material. The SC-QDP: 1) identified pAKT and pERK phospho-heterogeneity and insensitivity in individual leukemia cells treated with a multi-drug panel of FDA-approved kinase inhibitors, and 2) revealed subpopulations of drug-insensitive CD34+ stem cells with high pCRKL and pSTAT5 signaling in chronic myeloid leukemia patient blood samples. This ultrasensitive digitized protein detection approach is valuable for uncovering subtle but important differences in signaling, drug insensitivity, and other key cellular processes amongst single cells.

Keywords

Phosphoprotein, signaling, nanoparticle, drug insensitivity, kinase inhibitor, proteomics, single cell

AUC Maximization in Bayesian Hierarchical Models

Mehmet Gönen

ConferenceProceedings of the 22nd European Conference on Artificial Intelligence (ECAI 2016), pp. 21–27, 2016

Abstract

The area under the curve (AUC) measures such as the area under the receiver operating characteristics curve (AUROC) and the area under the precision-recall curve (AUPR) are known to be more appropriate than the error rate, especially, for imbalanced data sets. There are several algorithms to optimize AUC measures instead of minimizing the error rate. However, this idea has not been fully exploited in Bayesian hierarchical models owing to the difficulties in inference. Here, we formulate a general Bayesian inference framework, called Bayesian AUC Maximization (BAM), to integrate AUC maximization into Bayesian hierarchical models by borrowing the pairwise and listwise ranking ideas from the information retrieval literature. To showcase our BAM framework, we develop two Bayesian linear classifier variants for two ranking approaches and derive their variational inference procedures. We perform validation experiments on four biomedical data sets to demonstrate the better predictive performance of our framework over its error-minimizing counterpart in terms of average AUROC and AUPR values.

Understanding Emotional Impact of Images Using Bayesian Multiple Kernel Learning

He Zhang, Mehmet Gönen, Zhirong Yang, and Erkki Oja

JournalNeurocomputing, vol. 165, pp. 3–13, 2015

Abstract

Affective classification and retrieval of multimedia such as audio, image, and video have become emerging research areas in recent years. The previous research focused on designing features and developing feature extraction methods. Generally, a multimedia content can be represented with different feature representations (i.e., views). However, the most suitable feature representation related to people׳s emotions is usually not known a priori. We propose here a novel Bayesian multiple kernel learning algorithm for affective classification and retrieval tasks. The proposed method can make use of different representations simultaneously (i.e., multiview learning) to obtain a better prediction performance than using a single feature representation (i.e., single-view learning) or a subset of features, with the advantage of automatic feature selections. In particular, our algorithm has been implemented within a multilabel setup to capture the correlation between emotions, and the Bayesian formulation enables our method to produce probabilistic outputs for measuring a set of emotions triggered by a single image. As a case study, we perform classification and retrieval experiments with our algorithm for predicting people׳s emotional states evoked by images, using generic low-level image features. The empirical results with our approach on the widely-used International Affective Picture System (IAPS) data set outperform several existing methods in terms of classification performance and results interpretability.

Keywords

Image emotions, multiple kernel learning, multiview learning, variational approximation, low-level image features

A Community Effort to Assess and Improve Drug Sensitivity Prediction Algorithms

James C. Costello, Laura M. Heiser, Elisabeth Georgii, Mehmet Gönen, Michael P. Menden, Nicholas J. Wang, Mukesh Bansal, Muhammad Ammad-ud-din, Petteri Hintsanen, Suleiman A. Khan, John-Patrick Mpindi, Olli Kallioniemi, Antti Honkela, Tero Aittokallio, Krister Wennerberg, NCI DREAM Community, James J. Collins, Dan Gallahan, Dinah Singer, Julio Saez-Rodriguez, Samuel Kaski, Joe W. Gray, and Gustavo Stolovitzky

JournalNature Biotechnology, vol. 32, no. 12, pp. 1202–1212, 2014

Abstract

Predicting the best treatment strategy from genomic information is a core goal of precision medicine. Here we focus on predicting drug response based on a cohort of genomic, epigenomic and proteomic profiling data sets measured in human breast cancer cell lines. Through a collaborative effort between the National Cancer Institute (NCI) and the Dialogue on Reverse Engineering Assessment and Methods (DREAM) project, we analyzed a total of 44 drug sensitivity prediction algorithms. The top-performing approaches modeled nonlinear relationships and incorporated biological pathway information. We found that gene expression microarrays consistently provided the best predictive power of the individual profiling data sets; however, performance was increased by including multiple, independent data sets. We discuss the innovations underlying the top-performing methodology, Bayesian multitask MKL, and we provide detailed descriptions of all methods. This study establishes benchmarks for drug sensitivity prediction and identifies approaches that can be leveraged for the development of new methods.

Kernelized Bayesian Matrix Factorization

Mehmet Gönen and Samuel Kaski

JournalIEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 10, pp. 2047–2060, 2014

Abstract

We extend kernelized matrix factorization with a full-Bayesian treatment and with an ability to work with multiple side information sources expressed as different kernels. Kernels have been introduced to integrate side information about the rows and columns, which is necessary for making out-of-matrix predictions. We discuss specifically binary output matrices but extensions to realvalued matrices are straightforward. We extend the state of the art in two key aspects: (i) A full-conjugate probabilistic formulation of the kernelized matrix factorization enables an efficient variational approximation, whereas full-Bayesian treatments are not computationally feasible in the earlier approaches. (ii) Multiple side information sources are included, treated as different kernels in multiple kernel learning which additionally reveals which side sources are informative. We then show that the framework can also be used for supervised and semi-supervised multilabel classification and multi-output regression, by considering samples and outputs as the domains where matrix factorization operates. Our method outperforms alternatives in predicting drug-protein interactions on two data sets. On multilabel classification, our algorithm obtains the lowest Hamming losses on 10 out of 14 data sets compared to five state-of-the-art multilabel classification algorithms. We finally show that the proposed approach outperforms alternatives in multi-output regression experiments on a yeast cell cycle data set.

Index Terms

Automatic relevance determination, biological interaction networks, large margin learning, matrix factorization, multilabel classification, multiple kernel learning, multiple output regression, variational approximation

Drug Susceptibility Prediction Against a Panel of Drugs Using Kernelized Bayesian Multitask Learning

Mehmet Gönen and Adam A. Margolin

JournalBioinformatics, vol. 30, no. 17, pp. i556–i563, 2014

Abstract

Motivation: Human immunodeficiency virus (HIV) and cancer require personalized therapies owing to their inherent heterogeneous nature. For both diseases, large-scale pharmacogenomic screens of molecularly characterized samples have been generated with the hope of identifying genetic predictors of drug susceptibility. Thus, computational algorithms capable of inferring robust predictors of drug responses from genomic information are of great practical importance. Most of the existing computational studies that consider drug susceptibility prediction against a panel of drugs formulate a separate learning problem for each drug, which cannot make use of commonalities between subsets of drugs.

Results: In this study, we propose to solve the problem of drug susceptibility prediction against a panel of drugs in a multitask learning framework by formulating a novel Bayesian algorithm that combines kernel-based non-linear dimensionality reduction and binary classification (or regression). The main novelty of our method is the joint Bayesian formulation of projecting data points into a shared subspace and learning predictive models for all drugs in this subspace, which helps us to eliminate off-target effects and drug-specific experimental noise. Another novelty of our method is the ability of handling missing phenotype values owing to experimental conditions and quality control reasons. We demonstrate the performance of our algorithm via cross-validation experiments on two benchmark drug susceptibility datasets of HIV and cancer. Our method obtains statistically significantly better predictive performance on most of the drugs compared with baseline single-task algorithms that learn drug-specific models. These results show that predicting drug susceptibility against a panel of drugs simultaneously within a multitask learning framework improves overall predictive performance over single-task learning approaches.

Integrative and Personalized QSAR Analysis in Cancer by Kernelized Bayesian Matrix Factorization

Muhammad Ammad-ud-din, Elisabeth Georgii, Mehmet Gönen, Tuomo Laitinen, Olli Kallioniemi, Krister Wennerberg, Antti Poso, and Samuel Kaski

JournalJournal of Chemical Information and Modeling, vol. 54, no. 8, pp. 2347–2359, 2014

Abstract

With data from recent large-scale drug sensitivity measurement campaigns, it is now possible to build and test models predicting responses for more than one hundred anticancer drugs against several hundreds of human cancer cell lines. Traditional quantitative structure–activity relationship (QSAR) approaches focus on small molecules in searching for their structural properties predictive of the biological activity in a single cell line or a single tissue type. We extend this line of research in two directions: (1) an integrative QSAR approach predicting the responses to new drugs for a panel of multiple known cancer cell lines simultaneously and (2) a personalized QSAR approach predicting the responses to new drugs for new cancer cell lines. To solve the modeling task, we apply a novel kernelized Bayesian matrix factorization method. For maximum applicability and predictive performance, the method optionally utilizes genomic features of cell lines and target information on drugs in addition to chemical drug descriptors. In a case study with 116 anticancer drugs and 650 cell lines, we demonstrate the usefulness of the method in several relevant prediction scenarios, differing in the amount of available information, and analyze the importance of various types of drug features for the response prediction. Furthermore, after predicting the missing values of the data set, a complete global map of drug response is explored to assess treatment potential and treatment range of therapeutically interesting anticancer drugs.

Multi-Task and Multi-View Learning of User State

Melih Kandemir, Akos Vetek, Mehmet Gönen, Arto Klami, and Samuel Kaski

JournalNeurocomputing, vol. 139, pp. 97–106, 2014

Abstract

Several computational approaches have been proposed for inferring the affective state of the user, motivated for example by the goal of building improved interfaces that can adapt to the user׳s needs and internal state. While fairly good results have been obtained for inferring the user state under highly controlled conditions, a considerable amount of work remains to be done for learning high-quality estimates of subjective evaluations of the state in more natural conditions. In this work, we discuss how two recent machine learning concepts, multi-view learning and multi-task learning, can be adapted for user state recognition, and demonstrate them on two data collections of varying quality. Multi-view learning enables combining multiple measurement sensors in a justified way while automatically learning the importance of each sensor. Multi-task learning, in turn, tells how multiple learning tasks can be learned together to improve the accuracy. We demonstrate the use of two types of multi-task learning: learning both multiple state indicators and models for multiple users together. We also illustrate how the benefits of multi-task learning and multi-view learning can be effectively combined in a unified model by introducing a novel algorithm.

Keywords

Affect recognition, machine learning, multi-task learning, multi-view learning

Coupled Dimensionality Reduction and Classification for Supervised and Semi-Supervised Multilabel Learning

Mehmet Gönen

JournalPattern Recognition Letters, vol. 38, pp. 132–141, 2014

Abstract

Coupled training of dimensionality reduction and classification is proposed previously to improve the prediction performance for single-label problems. Following this line of research, in this paper, we first introduce a novel Bayesian method that combines linear dimensionality reduction with linear binary classification for supervised multilabel learning and present a deterministic variational approximation algorithm to learn the proposed probabilistic model. We then extend the proposed method to find intrinsic dimensionality of the projected subspace using automatic relevance determination and to handle semi-supervised learning using a low-density assumption. We perform supervised learning experiments on four benchmark multilabel learning data sets by comparing our method with baseline linear dimensionality reduction algorithms. These experiments show that the proposed approach achieves good performance values in terms of hamming loss, average AUC, macro F1, and micro F1 on held-out test data. The low-dimensional embeddings obtained by our method are also very useful for exploratory data analysis. We also show the effectiveness of our approach in finding intrinsic subspace dimensionality and semi-supervised learning tasks.

Keywords

Multilabel learning, dimensionality reduction, supervised learning, semi-supervised learning, variational approximation, automatic relevance determination

Localized Data Fusion for Kernel k-Means Clustering with Application to Cancer Biology

Mehmet Gönen and Adam A. Margolin

ConferenceAdvances in Neural Information Processing Systems 27 (NIPS 2014), pp. 1305–1313, 2014

Abstract

In many modern applications from, for example, bioinformatics and computer vision, samples have multiple feature representations coming from different data sources. Multiview learning algorithms try to exploit all these available informa- tion to obtain a better learner in such scenarios. In this paper, we propose a novel multiple kernel learning algorithm that extends kernel k-means clustering to the multiview setting, which combines kernels calculated on the views in a localized way to better capture sample-specific characteristics of the data. We demonstrate the better performance of our localized data fusion approach on a human colon and rectal cancer data set by clustering patients. Our method finds more relevant prognostic patient groups than global data fusion methods when we evaluate the results with respect to three commonly used clinical biomarkers.

Bayesian Multiview Dimensionality Reduction for Learning Predictive Subspaces

Mehmet Gönen, Gülefşan Bozkurt Gönen, and Fikret Gürgen

ConferenceProceedings of the 21st European Conference on Artificial Intelligence (ECAI 2014), pp. 387–392, 2014

Abstract

Multiview learning basically tries to exploit different feature representations to obtain better learners. For example, in video and image recognition problems, there are many possible feature representations such as color- and texture-based features. There are two common ways of exploiting multiple views: forcing similarity (i) in predictions and (ii) in latent subspace. In this paper, we introduce a novel Bayesian multiview dimensionality reduction method coupled with supervised learning to find predictive subspaces and its inference details. Experiments show that our proposed method obtains very good results on image recognition tasks in terms of classification and retrieval performances.

Embedding Heterogeneous Data by Preserving Multiple Kernels

Mehmet Gönen

ConferenceProceedings of the 21st European Conference on Artificial Intelligence (ECAI 2014), pp. 381–386, 2014

Abstract

Heterogeneous data may arise in many real-life applications under different scenarios. In this paper, we formulate a general framework to address the problem of modeling heterogeneous data. Our main contribution is a novel embedding method, called multiple kernel preserving embedding (MKPE), which projects heterogeneous data into a unified embedding space by preserving cross-domain interactions and within-domain similarities simultaneously. These interactions and similarities between data points are approximated with Gaussian kernels to transfer local neighborhood information to the projected subspace. We also extend our method for out-of-sample embedding using a parametric formulation in the projection step. The performance of MKPE is illustrated on two tasks: (i) modeling biological interaction networks and (ii) cross-domain information retrieval. Empirical results of these two tasks validate the predictive performance of our algorithm.

Kernelized Bayesian Transfer Learning

Mehmet Gönen and Adam A. Margolin

ConferenceProceedings of the 28th AAAI Conference on Artificial Intelligence (AAAI 2014), pp. 1831–1839, 2014

Abstract

Transfer learning considers related but distinct tasks defined on heterogenous domains and tries to transfer knowledge between these tasks to improve generalization performance. It is particularly useful when we do not have sufficient amount of labeled training data in some tasks, which may be very costly, laborious, or even infeasible to obtain. Instead, learning the tasks jointly enables us to effectively increase the amount of labeled training data. In this paper, we formulate a kernelized Bayesian transfer learning framework that is a principled combination of kernel-based dimensionality reduction models with task-specific projection matrices to find a shared subspace and a coupled classification model for all of the tasks in this subspace. Our two main contributions are: (i) two novel probabilistic models for binary and multiclass classification, and (ii) very efficient variational approximation procedures for these models. We illustrate the generalization performance of our algorithms on two different applications. In computer vision experiments, our method outperforms the state-of-the-art algorithms on nine out of 12 benchmark supervised domain adaptation experiments defined on two object recognition data sets. In cancer biology experiments, we use our algorithm to predict mutation status of important cancer genes from gene expression profiles using two distinct cancer populations, namely, patient-derived primary tumor data and in-vitro-derived cancer cell line data. We show that we can increase our generalization performance on primary tumors using cell lines as an auxiliary data source.

Bayesian Supervised Dimensionality Reduction

Mehmet Gönen

JournalIEEE Transactions on Cybernetics, vol. 43, no. 6, pp. 2179–2189, 2013

Abstract

Dimensionality reduction is commonly used as a preprocessing step before training a supervised learner. However, coupled training of dimensionality reduction and supervised learning steps may improve the prediction performance. In this paper, we introduce a simple and novel Bayesian supervised dimensionality reduction method that combines linear dimensionality reduction and linear supervised learning in a principled way. We present both Gibbs sampling and variational approximation approaches to learn the proposed probabilistic model for multiclass classification. We also extend our formulation towards model selection using automatic relevance determination in order to find the intrinsic dimensionality. Classification experiments on three benchmark data sets show that the new model significantly outperforms seven baseline linear dimensionality reduction algorithms on very low dimensions in terms of generalization performance on test data. The proposed model also obtains the best results on an image recognition task in terms of classification and retrieval performances.

Index Terms

Dimensionality reduction, Gibbs sampling, handwritten digit recognition, image recognition, image retrieval, multiclass classification, subspace learning, variational approximation

Supervised Multiple Kernel Embedding for Learning Predictive Subspaces

Mehmet Gönen

JournalIEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 10, pp. 2381–2389, 2013

Abstract

For supervised learning problems, dimensionality reduction is generally applied as a preprocessing step. However, coupled training of dimensionality reduction and supervised learning steps may improve the prediction performance. In this paper, we propose a novel dimensionality reduction algorithm coupled with a supervised kernel-based learner, called supervised multiple kernel embedding, that integrates multiple kernel learning to dimensionality reduction and performs prediction on the projected subspace with a joint optimization framework. Combining multiple kernels allows us to combine different feature representations and/or similarity measures toward a unified subspace. We perform experiments on one digit recognition and two bioinformatics data sets. Our proposed method significantly outperforms multiple kernel Fisher discriminant analysis followed by a standard kernel-based learner, especially on low dimensions.

Index Terms

Dimensionality reduction, kernel machines, multiple kernel learning, subspace learning, supervised learning

Localized Algorithms for Multiple Kernel Learning

Mehmet Gönen and Ethem Alpaydın

JournalPattern Recognition, vol. 46, no. 3, pp. 795–807, 2013

Abstract

Instead of selecting a single kernel, multiple kernel learning (MKL) uses a weighted sum of kernels where the weight of each kernel is optimized during training. Such methods assign the same weight to a kernel over the whole input space, and we discuss localized multiple kernel learning (LMKL) that is composed of a kernel-based learning algorithm and a parametric gating model to assign local weights to kernel functions. These two components are trained in a coupled manner using a two-step alternating optimization algorithm. Empirical results on benchmark classification and regression data sets validate the applicability of our approach. We see that LMKL achieves higher accuracy compared with canonical MKL on classification problems with different feature representations. LMKL can also identify the relevant parts of images using the gating model as a saliency detector in image recognition problems. In regression tasks, LMKL improves the performance significantly or reduces the model complexity by storing significantly fewer support vectors.

Keywords

Multiple kernel learning, support vector machines, support vector regression, classification, regression, selective attention

Predicting Emotional States of Images Using Bayesian Multiple Kernel Learning

He Zhang, Mehmet Gönen, Zhirong Yang, and Erkki Oja

ConferenceProceedings of the 20th International Conference on Neural Information Processing (ICONIP 2013), pp. 274–282, 2013

Abstract

Images usually convey information that can influence people’s emotional states. Such affective information can be used by search engines and social networks for better understanding the user’s preferences. We propose here a novel Bayesian multiple kernel learning method for predicting the emotions evoked by images. The proposed method can make use of different image features simultaneously to obtain a better prediction performance, with the advantage of automatically selecting important features. Specifically, our method has been implemented within a multilabel setup in order to capture the correlations between emotions. Due to its probabilistic nature, our method is also able to produce probabilistic outputs for measuring a distribution of emotional intensities. The experimental results on the International Affective Picture System (IAPS) dataset show that the proposed approach achieves a better classification performance and provides a more interpretable feature selection capability than the state-of-the-art methods.

Keywords

Image emotion, low-level image features, multiview learning, multiple kernel learning, variational approximation

Affective Abstract Image Classification and Retrieval Using Multiple Kernel Learning

He Zhang, Zhirong Yang, Mehmet Gönen, Markus Koskela, Jorma Laaksonen, Timo Honkela, and Erkki Oja

ConferenceProceedings of the 20th International Conference on Neural Information Processing (ICONIP 2013), pp. 166–175, 2013

Abstract

Emotional semantic image retrieval systems aim at incorporating the user’s affective states for responding adequately to the user's interests. One challenge is to select features specific to image affect detection. Another challenge is to build effective learning models or classifiers to bridge the so-called "affective gap". In this work, we study the affective classification and retrieval of abstract images by applying multiple kernel learning framework. An image can be represented by different feature spaces and multiple kernel learning can utilize all these feature representations simultaneously (i.e., multiview learning), such that it jointly learns the feature representation weights and corresponding classifier in an intelligent manner. Our experimental results on two abstract image datasets demonstrate the advantage of the multiple kernel learning framework for image affect detection in terms of feature selection, classification performance, and interpretation.

Keywords

Image affect, multiple kernel learning, group lasso, low-level image features, image classification and retrieval

Kernelized Bayesian Matrix Factorization

Mehmet Gönen, Suleiman A. Khan, and Samuel Kaski

ConferenceProceedings of the 30th International Conference on Machine Learning (ICML 2013), pp. 864–872, 2013

Abstract

We extend kernelized matrix factorization with a fully Bayesian treatment and with an ability to work with multiple side information sources expressed as different kernels. Kernel functions have been introduced to matrix factorization to integrate side information about the rows and columns (e.g., objects and users in recommender systems), which is necessary for making out-of-matrix (i.e., cold start) predictions. We discuss specifically bipartite graph inference, where the output matrix is binary, but extensions to more general matrices are straightforward. We extend the state of the art in two key aspects: (i) A fully conjugate probabilistic formulation of the kernelized matrix factorization problem enables an efficient variational approximation, whereas fully Bayesian treatments are not computationally feasible in the earlier approaches. (ii) Multiple side information sources are included, treated as different kernels in multiple kernel learning that additionally reveals which side information sources are informative. Our method outperforms alternatives in predicting drug-protein interactions on two data sets. We then show that our framework can also be used for solving multilabel learning problems by considering samples and labels as the two domains where matrix factorization operates on. Our algorithm obtains the lowest Hamming loss values on 10 out of 14 multilabel classification data sets compared to five state-of-the-art multilabel learning algorithms.

Kernelized Bayesian Matrix Factorization

Mehmet Gönen, Muhammad Ammad-ud-din, Suleiman A. Khan, and Samuel Kaski

WorkshopNIPS Workshop on Machine Learning in Computational Biology, 2013

Predicting Drug–Target Interactions from Chemical and Genomic Kernels Using Bayesian Matrix Factorization

Mehmet Gönen

JournalBioinformatics, vol. 28, no. 18, pp. 2304–2310, 2012

Abstract

Motivation: Identifying interactions between drug compounds and target proteins has a great practical importance in the drug discovery process for known diseases. Existing databases contain very few experimentally validated drug–target interactions and formulating successful computational methods for predicting interactions remains challenging.

Results: In this study, we consider four different drug–target interaction networks from humans involving enzymes, ion channels, G-protein-coupled receptors and nuclear receptors. We then propose a novel Bayesian formulation that combines dimensionality reduction, matrix factorization and binary classification for predicting drug–target interaction networks using only chemical similarity between drug compounds and genomic similarity between target proteins. The novelty of our approach comes from the joint Bayesian formulation of projecting drug compounds and target proteins into a unified subspace using the similarities and estimating the interaction network in that subspace. We propose using a variational approximation in order to obtain an efficient inference scheme and give its detailed derivations. Lastly, we demonstrate the performance of our proposed method in three different scenarios: (i) exploratory data analysis using low-dimensional projections, (ii) predicting interactions for the out-of-sample drug compounds and (iii) predicting unknown interactions of the given network.

Probabilistic and Discriminative Group-Wise Feature Selection Methods for Credit Risk Analysis

Gülefşan Bozkurt Gönen, Mehmet Gönen, and Fikret Gürgen

JournalExpert Systems with Applications, vol. 39, no. 14, pp. 11709–11717, 2012

Abstract

Many financial organizations such as banks and retailers use computational credit risk analysis (CRA) tools heavily due to recent financial crises and more strict regulations. This strategy enables them to manage their financial and operational risks within the pool of financial institutes. Machine learning algorithms especially binary classifiers are very popular for that purpose. In real-life applications such as CRA, feature selection algorithms are used to decrease data acquisition cost and to increase interpretability of the decision process. Using feature selection methods directly on CRA data sets may not help due to categorical variables such as marital status. Such features are usually are converted into binary features using 1-of-k encoding and eliminating a subset of features from a group does not help in terms of data collection cost or interpretability. In this study, we propose to use the probit classifier with a proper prior structure and multiple kernel learning with a proper kernel construction procedure to perform group-wise feature selection (i.e., eliminating a group of features together if they are not helpful). Experiments on two standard CRA data sets show the validity and effectiveness of the proposed binary classification algorithm variants.

Keywords

Credit risk analysis, feature selection, probit classifier, multiple kernel learning, sparsity

A Bayesian Multiple Kernel Learning Framework for Single and Multiple Output Regression

Mehmet Gönen

ConferenceProceedings of the 20th European Conference on Artificial Intelligence (ECAI 2012), pp. 354–359, 2012

Abstract

Multiple kernel learning algorithms are proposed to combine kernels in order to obtain a better similarity measure or to integrate feature representations coming from different data sources. Most of the previous research on such methods is focused on classification formulations and there are few attempts for regression. We propose a fully conjugate Bayesian formulation and derive a deterministic variational approximation for single output regression. We then show that the proposed formulation can be extended to multiple output regression. We illustrate the effectiveness of our approach on a single output benchmark data set. Our framework outperforms previously reported results with better generalization performance on two image recognition data sets using both single and multiple output formulations.

Bayesian Efficient Multiple Kernel Learning

Mehmet Gönen

ConferenceProceedings of the 29th International Conference on Machine Learning (ICML 2012), pp. 1–8, 2012

Abstract

Multiple kernel learning algorithms are proposed to combine kernels in order to obtain a better similarity measure or to integrate feature representations coming from different data sources. Most of the previous research on such methods is focused on classification formulations and there are few attempts for regression. We propose a fully conjugate Bayesian formulation and derive a deterministic variational approximation for single output regression. We then show that the proposed formulation can be extended to multiple output regression. We illustrate the effectiveness of our approach on a single output benchmark data set. Our framework outperforms previously reported results with better generalization performance on two image recognition data sets using both single and multiple output formulations.

Bayesian Supervised Multilabel Learning with Coupled Embedding and Classification

Mehmet Gönen

ConferenceProceedings of the 12th SIAM International Conference on Data Mining (SDM 2012), pp. 367–378, 2012

Abstract

Coupled training of dimensionality reduction and classification is proposed previously to improve the prediction performance for single-label problems. Following this line of research, in this paper, we introduce a novel Bayesian supervised multilabel learning method that combines linear dimensionality reduction with linear binary classification. We present a deterministic variational approximation approach to learn the proposed probabilistic model for multilabel classification. We perform experiments on four benchmark multilabel learning data sets by comparing our method with four baseline linear dimensionality reduction algorithms. Experiments show that the proposed approach achieves good performance values in terms of hamming loss, macro F1, and micro F1 on held-out test data. The low-dimensional embeddings obtained by our method are also very useful for exploratory data analysis.

A Localized MKL Method for Brain Classification with Known Intra-Class Variability

Aydın Ulaş, Mehmet Gönen, Umberto Castellani, Vittorio Murino, Marcella Bellani, Michele Tansella, and Paolo Brambilla

WorkshopProceedings of the 3rd International Workshop on Machine Learning in Medical Imaging, pp. 152–159, 2012

Abstract

Automatic decisional systems based on pattern classification methods are becoming very important to support medical diagnosis. In general, the overall objective is to classify between healthy subjects and patients affected by a certain disease. To reach this aim, significant efforts have been spent in finding reliable biomarkers which are able to robustly discriminate between the two populations (i.e., patients and controls). However, in real medical scenarios there are many factors, like the gender or the age, which make the source data very heterogeneous. This introduces a large intra-class variation by affecting the performance of the classification procedure. In this paper we exploit how to use the knowledge on heterogeneity factors to improve the classification accuracy. We propose a Clustered Localized Multiple Kernel Learning (CLMKL) algorithm by encoding in the classication model the information on the clusters of apriory known stratifications.

Experiments are carried out for brain classification in Schizophrenia. We show that our algorithm performs clearly better than single kernel Support Vector Machines (SVMs), linear MKL algorithms and canonical Localized MKL algorithms when the gender information is considered as apriori knowledge.

Keywords

Brain imaging, magnetic resonance imaging, computer-aided diagnosis, localized multiple kernel learning, schizophrenia

Multiple Kernel Learning Algorithms

Mehmet Gönen and Ethem Alpaydın

JournalJournal of Machine Learning Research, vol. 12, no. Jul, pp. 2211–2268, 2011

Abstract

In recent years, several methods have been proposed to combine multiple kernels instead of using a single one. These different kernels may correspond to using different notions of similarity or may be using information coming from multiple sources (different representations or different feature subsets). In trying to organize and highlight the similarities and differences between them, we give a taxonomy of and review several multiple kernel learning algorithms. We perform experiments on real data sets for better illustration and comparison of existing algorithms. We see that though there may not be large differences in terms of accuracy, there is difference between them in complexity as given by the number of stored support vectors, the sparsity of the solution as given by the number of used kernels, and training time complexity. We see that overall, using multiple kernels instead of a single one is useful and believe that combining kernels in a nonlinear or data-dependent way seems more promising than linear combination in fusing information provided by simple linear kernels, whereas linear methods are more reasonable when combining complex Gaussian kernels.

Keywords

Support vector machines, kernel machines, multiple kernel learning

Regularizing Multiple Kernel Learning Using Response Surface Methodology

Mehmet Gönen and Ethem Alpaydın

JournalPattern Recognition, vol. 44, no. 1, pp. 159–171, 2011

Abstract

In recent years, several methods have been proposed to combine multiple kernels using a weighted linear sum of kernels. These different kernels may be using information coming from multiple sources or may correspond to using different notions of similarity on the same source. We note that such methods, in addition to the usual ones of the canonical support vector machine formulation, introduce new regularization parameters that affect the solution quality and, in this work, we propose to optimize them using response surface methodology on cross-validation data. On several bioinformatics and digit recognition benchmark data sets, we compare multiple kernel learning and our proposed regularized variant in terms of accuracy, support vector count, and the number of kernels selected. We see that our proposed variant achieves statistically similar or higher accuracy results by using fewer kernel functions and/or support vectors through suitable regularization; it also allows better knowledge extraction because unnecessary kernels are pruned and the favored kernels reflect the properties of the problem at hand.

Keywords

Support vector machine, multiple kernel learning, regularization, response surface methodology

Multitask Learning Using Regularized Multiple Kernel Learning

Mehmet Gönen, Melih Kandemir, and Samuel Kaski

ConferenceProceedings of the 18th International Conference on Neural Information Processing (ICONIP 2011), pp. 500–509, 2011

Abstract

Empirical success of kernel-based learning algorithms is very much dependent on the kernel function used. Instead of using a single fixed kernel function, multiple kernel learning (MKL) algorithms learn a combination of different kernel functions in order to obtain a similarity measure that better matches the underlying problem. We study multitask learning (MTL) problems and formulate a novel MTL algorithm that trains coupled but nonidentical MKL models across the tasks. The proposed algorithm is especially useful for tasks that have different input and/or output space characteristics and is computationally very efficient. Empirical results on three data sets validate the generalization performance and the efficiency of our approach.

Keywords

Kernel machines, multilabel learning, multiple kernel learning, multitask learning, support vector machines

Combining Data Sources Nonlinearly for Cell Nucleus Classification of Renal Cell Carcinoma

Mehmet Gönen, Aydın Ulaş, Peter Schüffler, Umberto Castellani, and Vittorio Murino

WorkshopProceedings of the 1st International Workshop on Similarity-Based Pattern Analysis and Recognition, pp. 250–260, 2011

Abstract

In kernel-based machine learning algorithms, we can learn a combination of different kernel functions in order to obtain a similarity measure that better matches the underlying problem instead of using a single fixed kernel function. This approach is called multiple kernel learning (MKL). In this paper, we formulate a nonlinear MKL variant and apply it for nuclei classification in tissue microarray images of renal cell carcinoma (RCC). The proposed variant is tested on several feature representations extracted from the automatically segmented nuclei. We compare our results with single-kernel support vector machines trained on each feature representation separately and three linear MKL algorithms from the literature. We demonstrate that our variant obtains more accurate classifiers than competing algorithms for RCC detection by combining information from different feature representations nonlinearly.

Keywords

Multiple kernel learning, renal cell carcinoma, support vector machines

Supervised Learning of Local Projection Kernels

Mehmet Gönen and Ethem Alpaydın

JournalNeurocomputing, vol. 73, no. 10–12, pp. 1694–1703, 2010

Abstract

We formulate a supervised, localized dimensionality reduction method using a gating model that divides up the input space into regions and selects the dimensionality reduction projection separately in each region. The gating model, the locally linear projections, and the kernel-based supervised learning algorithm which uses them in its kernels are coupled and their training is performed with an alternating optimization procedure. Our proposed local projection kernel projects a data instance into different feature spaces by using the local projection matrices, combines them with the gating model, and performs the dot product in the combined feature space. Empirical results on benchmark data sets for visualization and classification tasks validate the idea. The method is generalizable to regression estimation and novelty detection.

Keywords

Dimensionality reduction, local embedding, kernel machines, subspace learning

Cost-Conscious Multiple Kernel Learning

Mehmet Gönen and Ethem Alpaydın

JournalPattern Recognition Letters, vol. 31, no. 9, pp. 959–965, 2010

Abstract

Recently, it has been proposed to combine multiple kernels using a weighted linear sum. In certain applications, different kernels may be using different input representations and these methods do not consider neither the cost of acquiring them nor the cost of evaluating the kernels. We generalize the framework of Multiple Kernel Learning (MKL) for this cost-conscious methodology. On 12 benchmark data sets from the UCI repository, we compare MKL and its cost-conscious variants in terms of accuracy, support vector count, and total cost. Cost-conscious MKL achieves statistically similar accuracy results by using fewer support vectors/kernels by best trading off accuracy brought by each representation/kernel with the concomitant cost. We also test our approach on two popular bioinformatics data sets from MIPS comprehensive yeast genome database (CYGD) and see that integrating the cost factor into kernel combination allows us to obtain cheaper kernel combinations by using fewer active kernels and/or support vectors.

Keywords

Support vector machines, kernel combination, multiple kernel learning

Localized Multiple Kernel Regression

Mehmet Gönen and Ethem Alpaydın

ConferenceProceedings of the 20th IAPR International Conference on Pattern Recognition (ICPR 2010), pp. 1425–1428, 2010

Abstract

Multiple kernel learning (MKL) uses a weighted combination of kernels where the weight of each kernel is optimized during training. However, MKL assigns the same weight to a kernel over the whole input space. Our main objective is the formulation of the localized multiple kernel learning (LMKL) framework that allows kernels to be combined with different weights in different regions of the input space by using a gating model. In this paper, we apply the LMKL framework to regression estimation and derive a learning algorithm for this extension. Canonical support vector regression may overfit unless the kernel parameters are selected appropriately; we see that even if provide more kernels than necessary, LMKL uses only as many as needed and does not overfit due to its inherent regularization.

Supervised and Localized Dimensionality Reduction from Multiple Feature Representations or Kernels

Mehmet Gönen and Ethem Alpaydın

WorkshopNIPS Workshop on New Directions in Multiple Kernel Learning, 2010

Abstract

We propose a supervised and localized dimensionality reduction method that combines multiple feature representations or kernels. Each feature representation or kernel is used where it is suitable through a parametric gating model in a supervised manner for efficient dimensionality reduction and classification, and local projection matrices are learned for each feature representation or kernel. The kernel machine parameters, the local projection matrices, and the gating model parameters are optimized using an alternating optimization procedure composed of kernel machine training and gradient-descent updates. Empirical results on benchmark data sets validate the method in terms of classification accuracy, smoothness of the solution, and ease of visualization.

Machine Learning Integration for Predicting the Effect of Single Amino Acid Substitutions on Protein Stability

Ayşegül Özen, Mehmet Gönen, Ethem Alpaydın, and Türkan Haliloğlu

JournalBMC Structural Biology, vol. 9, p. 66, 2009

Abstract

Background: Computational prediction of protein stability change due to single-site amino acid substitutions is of interest in protein design and analysis. We consider the following four ways to improve the performance of the currently available predictors: (1) We include additional sequence- and structure-based features, namely, the amino acid substitution likelihoods, the equilibrium fluctuations of the alpha- and beta-carbon atoms, and the packing density. (2) By implementing different machine learning integration approaches, we combine information from different features or representations. (3) We compare classification vs. regression methods to predict the sign vs. the output of stability change. (4) We allow a reject option for doubtful cases where the risk of misclassification is high.

Results: We investigate three different approaches: early, intermediate and late integration, which respectively combine features, kernels over feature subsets, and decisions. We perform simulations on two data sets: (1) S1615 is used in previous studies, (2) S2783 is the updated version (as of July 2, 2009) extracted also from ProTherm. For S1615 data set, our highest accuracy using both sequence and structure information is 0.842 on cross-validation and 0.904 on testing using early integration. Newly added features, namely, local compositional packing and the mobility extent of the mutated residues, improve accuracy significantly with intermediate integration. For S2783 data set, we also train regression methods to estimate not only the sign but also the amount of stability change and apply risk-based classification to reject when the learner has low confidence and the loss of misclassification is high. The highest accuracy is 0.835 on cross-validation and 0.832 on testing using only sequence information. The percentage of false positives can be decreased to less than 0.005 by rejecting 10 per cent using late integration.

Conclusions: We find that in both early and late integration, combining inputs or decisions is useful in increasing accuracy. Intermediate integration allows assessing the contributions of individual features by looking at the assigned weights. Overall accuracy of regression is not better than that of classification but it has less false positives, especially when combined with the reject option. The server for stability prediction for three integration approaches and the data sets are available at http://www.prc.boun.edu.tr/appserv/prc/mlsta.

Multiple Kernel Machines Using Localized Kernels

Mehmet Gönen and Ethem Alpaydın

ConferenceSupplementary Proceedings of the 4th IAPR International Conference on Pattern Recognition in Bioinformatics (PRIB 2009), 2009

Abstract

Multiple kernel learning (MKL) uses a convex combination of kernels where the weight of each kernel is optimized during training. However, MKL assigns the same weight to a kernel over the whole input space. Localized multiple kernel learning (LMKL) framework extends the MKL framework to allow combining kernels with different weights in different regions of the input space by using a gating model. LMKL extracts the relative importance of kernels in each region whereas MKL gives their relative importance over the whole input space. In this paper, we generalize the LMKL framework with a kernel-based gating model and derive the learning algorithm for binary classification. Empirical results on toy classification problems are used to illustrate the algorithm. Experiments on two bioinformatics data sets are performed to show that kernel machines can also be localized in a data-dependent way by using kernel values as gating model features. The localized variant achieves significantly higher accuracy on one of the bioinformatics data sets.

Localized Multiple Kernel Learning for Image Recognition

Mehmet Gönen and Ethem Alpaydın

WorkshopNIPS Workshop on Understanding Multiple Kernel Learning Methods, 2009

Abstract

We review our work on localized multiple kernels (Gönen and Alpaydın, 2008, 2009) that allows kernels to be combined with different weights in different regions of the input space by using a gating model. We give example uses in image recognition for combining kernels of different representations and costs.

Multiclass Posterior Probability Support Vector Machines

Mehmet Gönen, Ayşe Gönül Tanuğur, and Ethem Alpaydın

JournalIEEE Transactions on Neural Networks, vol. 19, no. 1, pp. 130–139, 2008

Abstract

Tao et al. have recently proposed the posterior probability support vector machine (PPSVM) which uses soft labels derived from estimated posterior probabilities to be more robust to noise and outliers. Tao et al.’s model uses a window-based density estimator to calculate the posterior probabilities and is a binary classifier. We propose a neighbor-based density estimator and also extend the model to the multiclass case. Our bias–variance analysis shows that the decrease in error by PPSVM is due to a decrease in bias. On 20 benchmark data sets, we observe that PPSVM obtains accuracy results that are higher or comparable to those of canonical SVM using significantly fewer support vectors.

Index Terms

Density estimation, kernel machines, multiclass classification, support vector machines (SVMs)

Localized Multiple Kernel Learning

Mehmet Gönen and Ethem Alpaydın

ConferenceProceedings of the 25th International Conference on Machine Learning (ICML 2008), pp. 352–359, 2008

Abstract

Recently, instead of selecting a single kernel, multiple kernel learning (MKL) has been proposed which uses a convex combination of kernels, where the weight of each kernel is optimized during training. However, MKL assigns the same weight to a kernel over the whole input space. In this paper, we develop a localized multiple kernel learning (LMKL) algorithm using a gating model for selecting the appropriate kernel function locally. The localizing gating model and the kernel-based classifier are coupled and their optimization is done in a joint manner. Empirical results on ten benchmark and two bioinformatics data sets validate the applicability of our approach. LMKL achieves statistically similar accuracy results compared with MKL by storing fewer support vectors. LMKL can also combine multiple copies of the same kernel function localized in different parts. For example, LMKL with multiple linear kernels gives better accuracy results than using a single linear kernel on bioinformatics data sets.

Real-Time Shop Floor Control Implementations in BUFAIM Model Factory

Ümit Bilge, Ayçin Polat, Yavuz Tunç, and Mehmet Gönen

ConferenceProceedings of the 15th International Conference on Flexible Automation and Intelligent Manufacturing (FAIM 2005), pp. 383–390, 2005