3tej home

Pearson Correlation Calculator (r)

Measure the linear relationship between two paired numeric variables.

Paste your X and Y data below. Values can be separated by spaces, commas, tabs, or new lines. Pairs match by index.

Pearson r-
r squared-
Sample size (n)-
Mean X-
Mean Y-
Strength-
How is this calculated?

Formula: r = sum((xi - mean_x)(yi - mean_y)) / sqrt(sum((xi - mean_x)^2) sum((yi - mean_y)^2)). The result ranges from -1 (perfect negative) to +1 (perfect positive). r squared is the share of Y variance explained by X under a linear fit. Source: Pearson 1895, standard statistics textbooks.

About

The Pearson correlation coefficient r is a standardised measure of the linear relationship between two paired numeric variables. It ranges from -1 (perfect inverse) to +1 (perfect direct), with 0 indicating no linear relationship. The tool computes r, r squared, and basic descriptive statistics from any two equal-length arrays.

How it works

r is the covariance of two variables divided by the product of their standard deviations. Geometrically, it is the cosine of the angle between the mean-centred X and Y vectors. Algebraically:

r = cov(X, Y) / (sigma_X * sigma_Y)

  = sum_i [(x_i - mean(X)) * (y_i - mean(Y))]
    -----------------------------------------------------
    sqrt( sum_i (x_i - mean(X))^2 * sum_i (y_i - mean(Y))^2 )

r squared = proportion of Y variance explained by linear X

Both X and Y must be on interval or ratio scales (numeric, evenly spaced). The formula is symmetric: r(X,Y) = r(Y,X). It assumes the relationship is linear; a perfect quadratic curve can produce r ~ 0.

Worked example

A teacher records hours of weekly study and final exam scores for 5 students: study X = [2, 4, 6, 8, 10]; scores Y = [58, 65, 72, 78, 87].

  1. Means: mean(X) = 6, mean(Y) = 72.
  2. Deviations: X - mean = [-4, -2, 0, 2, 4]; Y - mean = [-14, -7, 0, 6, 15].
  3. Cross-product sum: 56 + 14 + 0 + 12 + 60 = 142.
  4. Sum of squares X: 16 + 4 + 0 + 4 + 16 = 40. Sum of squares Y: 196 + 49 + 0 + 36 + 225 = 506.
  5. Apply formula: r = 142 / sqrt(40 x 506) = 142 / sqrt(20,240) = 142 / 142.27 = 0.9981.
  6. r squared: 0.9962, so 99.6 percent of score variance is linearly explained by study hours.
Result: r = 0.998, a near-perfect positive linear relationship. With n = 5, that r is statistically significant beyond p = 0.001. But correlation alone does not prove studying causes higher scores, only that they move together.

Reference table

Common interpretation thresholds for the absolute value |r|, after Cohen (1988):

|r|Verbal label (Cohen)r squared (var. explained)Field example
0.00-0.09Negligible< 1 percentCoin flips vs weather
0.10-0.29Small / weak1 to 8 percentPersonality trait predicting outcomes
0.30-0.49Medium / moderate9 to 24 percentEducation and income
0.50-0.69Large / strong25 to 48 percentHeight of parents vs children
0.70-0.89Very strong49 to 79 percentSAT verbal vs SAT math
0.90-0.99Near perfect81 to 98 percentTwin IQs (monozygotic)
1.00Perfect linear100 percentF = C x 9/5 + 32

Common pitfalls

  • Confusing correlation with causation. Two variables can correlate strongly because both depend on a third (the "lurking variable"). Ice-cream sales and drownings correlate via summer temperature.
  • Missing non-linear patterns. r near 0 only rules out a linear pattern. A perfect U-shape (y = x^2 centred on zero) gives r = 0.
  • Outliers. One extreme point can drag r from 0 to 0.9 or vice versa. Plot the scatter first; consider Spearman or robust alternatives if outliers are real.
  • Aggregating to ecological correlations. Group-level r is often much larger than individual-level r (Simpson's paradox). Always check the unit of analysis.
  • Statistical significance vs effect size. With n = 10,000, r = 0.02 is "statistically significant" but explains 0.04 percent of variance. Report both r and the confidence interval.
  • Truncated range. Selecting only top scorers compresses Y's variance and shrinks r toward 0 (range restriction).

Related tools and glossary

Frequently asked questions

What does the Pearson r value mean?

Pearson r ranges from -1 to +1 and measures the strength and direction of a linear relationship between two variables. r = +1 is a perfect positive linear fit, 0 is no linear relationship, and -1 is a perfect negative fit. Cohen's 1988 conventions classify |r| as small (0.10), medium (0.30), and large (0.50), but the practical interpretation depends on the field.

What is the difference between r and r squared?

r is the correlation coefficient. r squared (the coefficient of determination) is the share of variance in Y explained by a linear fit on X. r = 0.7 implies r squared = 0.49, so 49 percent of Y variance is linearly explained by X, leaving 51 percent unexplained or noise.

Does a high r mean X causes Y?

No. Correlation is symmetric (r(X,Y) = r(Y,X)) and reflects only co-variation. Causal claims require a research design that rules out confounders, reverse causation, and selection (e.g. randomized experiment, instrumental variable, or natural experiment). Ice-cream sales and drowning rates correlate strongly through summer temperature.

When should I use Spearman or Kendall instead of Pearson?

Use Spearman's rho or Kendall's tau when the relationship is monotonic but not linear, when data are ordinal, or when outliers dominate Pearson r. Pearson assumes bivariate normality and constant variance; both are violated by heavy-tailed financial data, where rank-based correlations give more stable estimates.

Sources

  • Pearson K. (1895), Notes on regression and inheritance in the case of two parents, Proceedings of the Royal Society.
  • Cohen J. (1988), Statistical Power Analysis for the Behavioral Sciences, 2nd ed., Lawrence Erlbaum (effect-size thresholds).
  • Wasserman L. (2004), All of Statistics, Springer (chapter 14: linear regression and correlation).
  • NIST/SEMATECH e-Handbook of Statistical Methods, section 7.2.6 (correlation interpretation).

Last updated 2026-05-28.