Multi-view cluster analysis with incomplete data to understand treatment effects
Introduction
Granular computing, as defined by Bargiela and Pedrycz in [3], is a computational principle for effectively using granules in data such as subsets or groups of samples, or intervals of parameters to build an efficient computational model for complex systems with massive quantities of data, information and knowledge. It provides an umbrella to cover any theories, methodologies, techniques, and tools that make use of granules - components or subspaces of a space - in problem solving [42]. It can consist of a structured combination of algorithmic abstraction of data and non-algorithmic, empirical verification of the semantics of these abstraction [3], [43]. Cluster analysis is such an important technique aiming to identify subgroups in a population so that subjects in the same group are more similar to each other than to those in other groups. It has been extensively used in computer vision [29], [45], natural language processing [6], [7], [24] and bioinformatics [21], [26]. In this paper, we propose a method to identify the cluster granules in a patient population to analyze treatment study data where missing values occur. In particular, we take into account the nature of the treatment studies, i.e., multiple views of input variables with incomplete data to model treatment effects.
Multi-view data exist in many real-world applications. For instance, a web page can be represented by words on the page or by all the hyper-links pointing to it from other pages. Similarly, an image can be represented by the visual features extracted from it or by the text describing it. Multi-view data analytics aims to make the full use of the multiple views of data, and has attracted wide interests in recent years such as in those works of semi-supervised learning with unlabeled data [2], [4], [11], or unsupervised multi-view data analytics [8], [9], [10], [20], [35]. In this paper, we focus on the unsupervised multi-view clustering methods [14], [15], [18], [28], [41], [44], specifically multi-view co-clustering [33], [34], [35]. Consider a dataset in which data matrices have rows representing subjects and columns representing features. They share the same set of subjects but each matrix has a different set of features. Multi-view co-clustering is a technique to cluster the rows (subjects) consistently across multiple data matrices (sets of features). A family of such methods [33], [34], [35] can find subspaces in each different view (rather than using all features in each view) to group subjects consistently across the views. However, existing multi-view co-clustering methods cannot deal with incomplete datasets. Subjects with missing values often need to be removed or imputation has to be done before clustering. Eliminating data weakens the results by reducing the sample size. On the other hand, imputation may bring a separate layer of uncertainty, especially when some data are missing at random but others are not.
The issue of missing value is common in real-world applications. Data may be missing at random or due to selection bias. For example, in the study of an asthma education intervention [27], some missing values were caused by the participants who forgot to visit the school clinic to fill out the form; some were caused by the students whose asthma was too serious to visit the school clinic to report. The former values are missing at random and the latter are not. According to different reasons, the strategies to handle missing values are different. If the data are missing at random, researchers either use only the samples with complete variables [40] or impute the missing values [12] from the available data; if data are missing systematically, there can be a variety of difficulties for researchers to recognize and capture the missing patterns.
In longitudinal studies [16], the missing patterns are very complicated and difficult to deal with. A prospective treatment study usually begins with a baseline assessment and follows up through time, and missing values are commonly encountered because study subjects may not be available at all time points. Just as in our heroin dependence treatment study, both random and non random missing values exist. Because of the mixed missing value patterns, we choose a simple yet effective strategy to handle this problem: introducing an indicator matrix to indicate which feature is observed for which subjects and then omitting the calculation of the loss ocurring on missing locations while clustering. Since the missing values is unknown, imputation cannot guarantee the right values. Ignoring the loss in the missing locations should be a better choice.
In multi-view data, if there are many missing values in different views, then it is useful but challenging to make the different views compensate each other on the missing information to obtain consistent subject grouping. The most recent multi-view co-clustering methods cannot handle incomplete data that potentially occur in all of the views. Moreover, although imputation methods have been studied for decades, our simulation studies show that even the latest imputation method might not effectively handle the nature of mixed missing patterns, and create another layer of uncertainty in the imputed data. A few recent methods handle incomplete data [22], [30], [31], [37], but they commonly assume that there is at least one complete view for all the sample subjects or each subject should have one or more complete views, which is however not the case in treatment studies (we can have incomplete features in every view).
For each view of the data, all the methods mentioned so far require either having the complete features in a view or having no features in the view. Two kernel based methods [31], [37] borrowed the idea of graph Laplacian to complete the incomplete kernel matrix. The partial multi-view clustering (PVC) method [22] reorganized the data into three parts (in the case of two views): subjects with both complete views, subjects with complete view 1, and subjects with complete view 2, and then projected them into a latent space and finally conducted a standard clustering algorithm in the latent space. When multiple incomplete views are present, clustering via weighted nonnegative matrix factorization with L21 regularization (the so-called WNMF21) is the most similar to our method which also introduces an indicator matrix. That method used only one weighted matrix to indicate which instance misses which view while we introduce an indicator matrix for each view to indicate the observed entries in the corresponding view. Among all the multi-view clustering methods with incomplete data, only ours is not restricted to any specific missing data pattern. In comparison with the common strategy of removing subjects with missing values, our approach can use all observed data in a cluster analysis. In comparison with common methods that impute missing values and then use regular multi-view analytics, our approach is less sensitive to the imputation uncertainty. In comparison with other state of the art multi-view incomplete clustering methods, our approach is applicable to any pattern of missing data. We first validate the proposed algorithm in a simulation study, and then use it in a longitudinal treatment study to better understand the differential responses of heroin users to the medication naltrexone.
The main contributions of our work include the following two aspects:
- 1.
In terms of methodology, we propose an enhanced multi-view co-clustering algorithm that is capable of dealing with complex patterns of incomplete data, and validate its performance by comparing against other state of the art methods.
- 2.
In terms of application, we have successfully applied the proposed method to an opioid dependence treatment study and identified meaningful patient subgroups, which would be implausible otherwise. By analyzing the study data, we produce an important finding that features such as changes in craving for heroin in response to cues at baseline could be a useful predictor for patient adherence to naltrexone.
The rest of this paper is organized as follows: we describe the longitudinal multi-view data collected in our treatment study in Section 2; an enhanced multi-view co-clustering method is introduced in Section 3 to deal with missing values; Section 4 presents the performance comparison on the synthetic datasets and the statistical analysis results in the case study; we then conclude and discuss in Section 5.
Section snippets
Incomplete data in treatment study
Opioid addiction is a resurgent public health problem in the United States [36]. There exist three Food and Drug Administration (FDA) approved medications for the treatment of opioid use disorder in general and heroin addiction in particular. Two of these options are opioid agonists, acting on the principle of opioid substitution and one - naltrexone, is an opioid antagonist. Naltrexone is an important treatment option because it is pharmacologically analogous to abstinence. However, the
Multi-view co-clustering with incomplete data
Multi-view co-clustering aims to group subjects in the same way across multiple views and identify the important variables from each view. In other words, multi-view co-clustering can group the subjects into some subgroups and at the same time the selected variables from different views play an important role in the grouping process. Since the selected variables from different views identify the same subject groups, the characteristics of each group helps show the correlation of the variables
Experiments
We validated the proposed approach in both simulation studies and the analysis of the clinical data collected in our heroin treatment study.
Discussion and conclusion
As data acquisition technologies advance, more and more data collected in real-world applications are from heterogeneous sources, resulting in multi-view datasets. Different views may provide complementary information. Cluster analysis in any single view may miss important cluster characteristics from other views. Simply concatenating all views together cannot guarantee finding clusters recognizable in individual views. To exploit such multiple view information, we have adopted the much-needed
Conflict of interest
None.
Acknowledgment
This work was supported by National Institutes of Health (NIH) grants R01DA037349 and K02DA043063, and National Science Foundation (NSF) grants DBI-1356655, CCF-1514357, and IIS-1718738. Jinbo Bi was also supported by NSF grants IIS-1320586, IIS-1407205, and IIS-1447711. An-Li Wang was also supported by NIH grant R00HD84746.
References (45)
- et al.
Enhancing diversity and coverage of document summaries through subspace clustering and clustering-based optimization
Inf. Sci.
(2014) - et al.
Consensus and complementarity based maximum entropy discrimination for multi-view classification
Inf. Sci.
(2016) - et al.
Semi-supervised multi-view maximum entropy discrimination with expectation laplacian regularization
Inf. Fusion
(2019) - et al.
Auto-weighted multi-view clustering via kernelized graph learning
Pattern Recognit.
(2019) - et al.
Self-weighted multi-view clustering with soft capped norm
Knowl. Based Syst.
(2018) - et al.
Protein complex identification through Markov clustering with firefly algorithm on dynamic protein–protein interaction networks
Inf. Sci.
(2016) - et al.
Semi-supervised concept factorization for document clustering
Inf. Sci.
(2016) - et al.
Analysing microarray expression data through effective clustering
Inf. Sci.
(2014) - et al.
Multi-type clustering and classification from heterogeneous networks
Inf. Sci.
(2018) - et al.
Bi-level multi-source learning for heterogeneous block-wise missing data
Neuroimage
(2014)
Deep low-rank subspace ensemble for multi-view clustering
Inf. Sci.
Robust low-rank kernel multi-view subspace clustering based on the schatten p-norm and correntropy
Inf. Sci.
Multiple correspondence analysis
Encycl. Meas. Stat.
Co-training and expansion: towards bridging theory and practice
The roots of granular computing
2006 IEEE International Conference on Granular Computing
Combining labeled and unlabeled data with co-training
Proceedings of the 11th Annual Conference on Computational Learning Theory
Proximal alternating linearized minimization for nonconvex and nonsmooth problems
Math. Program.
Discriminative k-means laplacian clustering
Neural Process. Lett.
Alternative multi-view maximum entropy discrimination
IEEE Trans. Neural Netw. Learn. Syst.
Multi-kernel maximum entropy discrimination for multi-view learning
Intell. Data Anal.
Imputation of missing values of tumour stage in population-based cancer registration
BMC Med. Res. Method.
Unsupervised Learning of Visuomotor Associations
Cited by (50)
Incomplete multi-view learning: Review, analysis, and prospects
2024, Applied Soft ComputingMutual structure learning for multiple kernel clustering
2023, Information SciencesMathematical modelling of waste flows and treatment based on reconstruction of historical data: Case of wastewater sludge in Czech Republic
2023, Journal of Cleaner ProductionConsensus and fuzzy partition of dendrograms from a three-way dissimilarity array
2023, Information SciencesK-DGHC: A hierarchical clustering method based on K-dominance granularity
2023, Information Sciences