Each item has a loading corresponding to each of the 8 components. This is also known as the communality , and in a PCA the communality for each item is equal to the total variance. Summing the squared component loadings across the components columns gives you the communality estimates for each item, and summing each squared loading down the items rows gives you the eigenvalue for each component.
For example, to obtain the first eigenvalue we calculate:. Recall that the eigenvalue represents the total amount of variance that can be explained by a given principal component. Starting from the first component, each subsequent component is obtained from partialling out the previous component. Therefore the first component explains the most variance, and the last component explains the least. Looking at the Total Variance Explained table, you will get the total variance explained by each component.
Because we extracted the same number of components as the number of items, the Initial Eigenvalues column is the same as the Extraction Sums of Squared Loadings column. Since the goal of running a PCA is to reduce our set of variables down, it would useful to have a criterion for selecting the optimal number of components that are of course smaller than the total number of items.
One criterion is the choose components that have eigenvalues greater than 1. Under the Total Variance Explained table, we see the first two components have an eigenvalue greater than 1. This can be confirmed by the Scree Plot which plots the eigenvalue total variance explained by the component number.
Recall that we checked the Scree Plot option under Extraction — Display, so the scree plot should be produced automatically. The first component will always have the highest total variance and the last component will always have the least, but where do we see the largest drop? Using the scree plot we pick two components.
Picking the number of components is a bit of an art and requires input from the whole research team. Running the two component PCA is just as easy as running the 8 component solution. The only difference is under Fixed number of factors — Factors to extract you enter 2. We will focus the differences in the output between the eight and two-component solution. Again, we interpret Item 1 as having a correlation of 0. From glancing at the solution, we see that Item 4 has the highest correlation with Component 1 and Item 2 the lowest.
Similarly, we see that Item 2 has the highest correlation with Component 2 and Item 7 the lowest. The communality is the sum of the squared component loadings up to the number of components you extract. In the SPSS output you will see a table of communalities. Since PCA is an iterative estimation process, it starts with 1 as an initial estimate of the communality since this is the total variance across all 8 components , and then proceeds with the analysis until a final communality extracted.
Notice that the Extraction column is smaller than the Initial column because we only extracted two components. Recall that squaring the loadings and summing down the components columns gives us the communality:.
Is that surprising? In an 8-component PCA, how many components must you extract so that the communality for the Initial column is equal to the Extraction column? F, the eigenvalue is the total communality across all items for a single component, 2. F you can only sum communalities across items, and sum eigenvalues across components, but if you do that they are equal. The partitioning of variance differentiates a principal components analysis from what we call common factor analysis.
Both methods try to reduce the dimensionality of the dataset down to fewer unobserved variables, but whereas PCA assumes that there common variances takes up all of total variance, common factor analysis assumes that total variance can be partitioned into common and unique variance.
It is usually more reasonable to assume that you have not measured your set of items perfectly. The unobserved or latent variable that makes up common variance is called a factor , hence the name factor analysis. The other main difference between PCA and factor analysis lies in the goal of your analysis. If your goal is to simply reduce your variable list down into a linear combination of smaller components then PCA is the way to go. However, if you believe there is some latent construct that defines the interrelationship among items, then factor analysis may be more appropriate.
In this case, we assume that there is a construct called SPSS Anxiety that explains why you see a correlation among all the items on the SAQ-8, we acknowledge however that SPSS Anxiety cannot explain all the shared variance among items in the SAQ, so we model the unique variance as well. Based on the results of the PCA, we will start with a two factor extraction. Note that we continue to set Maximum Iterations for Convergence at and we will see why later. The most striking difference between this communalities table and the one from the PCA is that the initial extraction is no longer one.
Recall that for a PCA, we assume the total variance is completely taken up by the common variance or communality, and therefore we pick 1 as our best initial guess. To see this in action for Item 1 run a linear regression where Item 1 is the dependent variable and Items 2 -8 are independent variables. Go to Analyze — Regression — Linear and enter q01 under Dependent and q02 to q08 under Independent s.
Note that 0. We can do eight more linear regressions in order to get all eight communality estimates but SPSS already does that for us.
Like PCA, factor analysis also uses an iterative estimation process to obtain the final estimates under the Extraction column. Finally, summing all the rows of the extraction column, and we get 3.
This represents the total common variance shared among all items for a two factor solution. The next table we will look at is Total Variance Explained. In fact, SPSS simply borrows the information from the PCA analysis for use in the factor analysis and the factors are actually components in the Initial Eigenvalues column. The main difference now is in the Extraction Sums of Squares Loadings.
We notice that each corresponding row in the Extraction column is lower than the Initial column. This is expected because we assume that total variance can be partitioned into common and unique variance, which means the common variance explained will be lower. Factor 1 explains Just as in PCA the more factors you extract, the less variance explained by each successive factor. A subtle note that may be easily overlooked is that when SPSS plots the scree plot or the Eigenvalues greater than 1 criterion Analyze — Dimension Reduction — Factor — Extraction , it bases it off the Initial and not the Extraction solution.
This is important because the criterion here assumes no unique variance as in PCA, which means that this is the total variance explained not accounting for specific or measurement error.
Note that in the Extraction of Sums Squared Loadings column the second factor has an eigenvalue that is less than 1 but is still retained because the Initial value is 1. If you want to use this criterion for the common variance explained you would need to modify the criterion yourself. Answers: 1. When there is no unique variance PCA assumes this whereas common factor analysis does not, so this is in theory and not in practice , 2.
F, it uses the initial PCA solution and the eigenvalues assume no unique variance. First note the annotation that 79 iterations were required. If we had simply used the default 25 iterations in SPSS, we would not have obtained an optimal solution. The elements of the Factor Matrix table are called loadings and represent the correlation of each item with the corresponding factor.
Just as in PCA, squaring each loading and summing down the items rows gives the total variance explained by each factor. Note that they are no longer called eigenvalues as in PCA.
This number matches the first row under the Extraction column of the Total Variance Explained table. We can repeat this for Factor 2 and get matching results for the second row. Additionally, we can get the communality estimates by summing the squared loadings across the factors columns for each item. For example, for Item Note that these results match the value of the Communalities table for Item 1 under the Extraction column. This means that the sum of squared loadings across factors represents the communality estimates for each item.
The figure below shows the path diagram of the orthogonal two-factor EFA solution show above note that only selected loadings are shown.
The loadings represent zero-order correlations of a particular factor with each item. Looking at absolute loadings greater than 0. From speaking with the Principal Investigator, we hypothesize that the second factor corresponds to general anxiety with technology rather than anxiety in particular to SPSS. Likewise, people with low social anxiety will give similar low responses to these variables because of their low social anxiety.
The measurement model for a simple, one-factor model looks like the diagram below. So the arrows go in the opposite direction from PCA. Just like in PCA, the relationships between F and each Y are weighted, and the factor analysis is figuring out the optimal weights. In this model we have is a set of error terms. This is the variance in each Y that is unexplained by the factor. As you can probably guess, this fundamental difference has many, many implications.
Not a good explanation. Why the direction of pca and fa required to be opposite. What is the difference of pca and fa regarding the mathematical approach is not mentioned.
I can not connect the explanation with the mathematical concept that I possessed. We try here to help people understand the concepts and meanings without getting much into the math. If you prefer to see the math some people do there are many options out there.
Very nice explanations. The fundamental concepts are explained in very simple language and informative graphs. Thank you so much. That would make this a much more useful document. No where is the above description of PCA does it describe how the individual variables tie together to create the component.
How does W1 relate to W2? Why do those two particular variables group together? Data-driven subtypes of major depressive disorder: a systematic review. BMC Med ; This paper reviewed 47 studies using PCA and compares methods and challenges and mistakes when using PCA for composite health measures. Paper suggests repeating analysis across samples and using complementary methods such as factor analysis. Methodological issues in determining the dimensionality of composite health measures using principal component analysis: case illustration and suggestions for practice.
Quality of life research : an international journal of quality of life aspects of treatment, care and rehabilitation ; This paper outlines common mistakes and errors with EFA from a review of 60 studies in psychology journals.
Provides useful suggestions for improved practices related to use of EFA and reporting in journals. Educational and Psychological Measurement ; This paper reviews the use of EFA and key decisions when conducting EFA reviewing 28 papers from high-impact nursing journals. Gaskin CJ, Happell B. On exploratory factor analysis: A review of recent evidence, an assessment of current practice, and recommendations for future use. International Journal of Nursing Studies ; Nutritional Epidemiology comparison of reduced rank regression, partial least-squares regression and PCA.
Snook, S. Component analysis versus common factor analysis: A Monte Carlo study. Psychological Bulletin , , Spearman, C. The American Journal of Psychology , 15 , Widaman, K. Common factors versus components: Principals and principles, errors and misconceptions. Factor analysis at Historical developments and future directions , Laura, I really appreciate that you took the time to explain the difference between these techniques.
This was very well written and informative and I loved your use of graphs and diagrams for illustrating your points. Thank you! Could you let me know a source, if this is from somewhere else, to use it in my presentation? I'm glad you found the last diagram helpful. I created that image for this post and am not aware of other sources that have something similar.
Super helpful. In fact, so helpful that I'll be incorporating these ideas into an upcoming grant on coral work! The key difference is the conceptualization of the latent variable. Because LCA estimates categorical latent variables, it's used for classification. I made the image below using the measurement level icons in JMP to drive the point further. Hope this helps! I appercaite the answer. To clarify further, in EFA one doesn't know a priori the "configural" structure of the data. That is, which observed variables are caused by the latent variables?
In contrast, this is known in CFA and the analyst specifies the model according to that knowledge.
0コメント