Spearman Pairwise Rank Correlation

What is Spearman's Pairwise Rank Correlation?

The Spearman’s Correlation Coefficient, represented by ρ or by rR, is a nonparametric (Non-parametric tests are useful when data doesn’t follow a normal distribution. They don’t assume a specific distribution. This makes them suitable for skewed or irregular data.) measure of the strength and direction of the association that exists between two ranked variables. It determines the degree to which a relationship is monotonic (In statistics, a monotonic relationship between two variables refers to a scenario where a change in one variable is generally associated with a change in a specific direction in another variable.), i.e., whether there is a monotonic component of the association between two continuous or ordered variables. The Spearman rank correlation coefficient, rs, is the nonparametric version of the Pearson correlation coefficient. Spearman’s returns a value from -1 to 1, where +1 = a perfect positive correlation between ranks, -1 = a perfect negative correlation between ranks, and 0 = no correlation between ranks. Spearman’s Rank Correlation is a statistical measure of the strength and direction of the monotonic relationship between two continuous variables. Therefore, these attributes are ranked or put in the order of their preference. It is denoted by the symbol “rho” (ρ) and can take values between -1 to +1. A positive value of rho indicates that there exists a positive relationship between the two variables, while a negative value of rho indicates a negative relationship. A rho value of 0 indicates no association between the two variables.

Why do we use Spearman Pairwise Rank Correlation?

We use Spearman Pairwise Rank Correlation when working with ranked data or when one or more extreme outliers are present. An example could be a dataset that contains the rank of a student’s math exam score along with the rank of their science exam score in a class. When extreme outliers are present in a dataset, Pearson’s correlation coefficient is highly affected.

What is the difference between Spearman Pairwise Rank Correlation and the Pearson Correlation?

Category Spearman's Rank Correlation Pearson Correlation
Type of Relationship Pearson Correlation measures the linear relationship between variables. It assumes that the relationship between the variables can be described with a straight line. Spearman Correlation measures monotonic relationships, where variables move consistently in one direction but not necessarily linearly.
Data Type Pearson Correlation works with continuous interval or ratio data. Spearman Correlation is suitable for ordinal, ranked, interval, or ratio data.
Assumptions Pearson Correlation assumes linearity and normal distribution of data. Spearman Correlation does not require normality or linearity and works well with non-parametric data.
Sensitivity to Outliers Pearson Correlation is based on the covariance and standard deviations of raw values. Spearman Correlation is based on ranking the data points and calculating the difference in ranks.
Calculation Method Pearson Correlation is based on the covariance and standard deviations of raw values. Spearman Correlation is based on ranking the data points and calculating the difference in ranks.
Range of Coefficient Both Pearson and Spearman correlation coefficients range from - 1 to 1. A value of - indicates a perfect negative correlation, indicates a perfect positive correlation, and 0 indicates no correlation.
Ideal Use Cases Pearson Correlation is ideal when the data follows a normal distribution and shows linear trends. Spearman Correlation is ideal for non-linear or ranked data, or when dealing with outliers.
Example Pearson Correlation: Analyzing the relationship between height and weight of individuals. Spearman Correlation: Assessing the relationship between study hours and exam ranks of students.

How do we calculate Spearman Pairwise Rank Correlation?

We calculate the Spearman Pairwise Rank Correlation with this equation: ρ = 1 – (6∑di2) / n(n2 – 1) where ρ = Spearman Correlation coefficient, rank = the position or order of a variable’s value relative to other values within a dataset, di = the difference in the ranks given to the two variables values for each item of the data, and n = total number of observations. First, creating ranks involves assigning a numerical order to the values in a dataset, where the smallest value gets the rank of 1, the second smallest gets the rank of 2, and so on. If numbers are tied, their average of their ranks are considered. In Spearman’s rank correlation, the process involves converting the original data into ranks. This is done to assess the monotonic relationship between two variables without relying on the specific numerical values of the data points. Then follow the steps: Arrange the values in ascending order, from the smallest to the largest. Assign ranks to each value based on its position in the sorted order. The smallest value gets a rank of 1, the second smallest gets a rank of 2, and so on. Then find out the square of the difference in the ranks given to the two variables values for each item of the data. Once you have got the rank you compute the difference in the ranks. We make the difference in the ranks, and by squaring it we get the final what we call the d-squared values. We sum all the values and then we compute the Spearman coefficient by using this value in the above formula.

Some more helpful resources!