Lingsong Zhang∗, J. S. Marron, Haipeng Shen and Zhengyuan Zhu
January 21, 2007
Singular Value Decomposition (SVD) is a useful tool in Functional Data Analysis (FDA).
Compared to Principal Component Analysis (PCA), SVD is more fundamental, because SVD simultaneously provides the PCAs in both row and column spaces. We compare SVD and PCA from the FDA view point, and extend the usual SVD to variations by considering different centerings. A generalized scree plot is proposed to select an appropriate centering in practice.
Several matrix views of the SVD components are introduced to explore different features in data, including SVD surface plots, rotation movies, curve movies and image plots. These methods visualize both column and row information of a two-way matrix simultaneously, relate the matrix to relevant curves, show local variations and interactions between columns and rows. Several toy examples are designed to compare as well as reveal the different variations of SVD, and real data examples are used to illustrate the usefulness of the visualization methods.
Key words: Exploratory Data Analysis, Functional Data Analysis, Principal Component Analysis.
Correspondence to Lingsong Zhang, Department of Statistics and Operations Research, University of North
Carolina, Chapel Hill, NC, 27599-3260. Email: email@example.com
Functional Data Analysis (FDA) is the study of curves (and more complex objects) as data (Ramsay and Silverman, 1997, 2002). Methods related to Principal Component Analysis (PCA) have provided many insights. Compared to the PCA method, Singular Value Decomposition (SVD) can be thought of as more fundamental, because SVD not only provides a direct approach to calculate the principal components (PCs), but also derives the PCAs in row and column spaces simultaneously. In this paper, we view a set of curves as a two-way data matrix, explore the connections and differences between SVD and PCA from a FDA view point, and propose several visualization methods for the SVD components.
Let X be a data matrix. In the statistical literature, the rows of X are often viewed as observations for an experiment, and the columns of X are thought of as the covariates. SVD provides a useful factorization of the data matrix X, while PCA provides a nearly parallel factoring, via eigen-analysis of the sample covariance matrix, i.e. X T X, when X is column centered at 0. The eigenvalues for X T X are then the squares of the singular values for X, and the eigenvectors for X T X are the singular rows for X. In this paper, we extend the usual (column centered) PCA method into a general SVD framework, and consider four types of SVDs based on different centerings.
Several criteria are discussed for model selection, i.e. selecting the appropriate type of SVD, including approximation performance, complexity, and interpretability. We introduce a generalized scree plot, which gives a simple way to understand the tradeoff between model complexity and approximation performance, and provides a visual aid for model selection in terms of these two criteria. See Section 3 and 5 for details.
Visualization methods can be very helpful in finding underlying features of a data set. In the context of PCA or SVD, common visualization methods include the biplot (Gabriel, 1971), scatter
plots between singular columns or singular rows (Section 5.1 in Jolliffe (2002)), etc. The biplot shows the relations between the rows and columns, and the scatter plot can be used to cluster them. However, for FDA data sets, these plots fail to show the functional curves.
In the FDA field, it is also common to plot singular columns or singular rows as curves. Marron et al. (2004) provided a visualization method for functional data (using functional PCA), which shows the functional objects (curves), projections on the PCs and the residuals. When considering a time series of curves,…