About
Pearson r measures linear correlation. -1 = perfect negative, 0 = no relationship, +1 = perfect positive. r² is variance explained. r = 0.7 means 49% of variance in Y is explained by X. Correlation ≠ causation.
Formula
Frequently asked questions
How accurate is the Correlation Coefficient?
It applies the standard formula. Accuracy is limited only by your input precision. For decisions with material consequences (taxes, medical, legal, structural), use the result as a starting point and verify with a qualified professional in the relevant field.
Is the Correlation Coefficient free to use?
Yes. 100% free, no signup, no payment, no API key. The site is funded by display ads around the tool but not inside the calculation flow.
Are my inputs saved anywhere?
No. All inputs stay in your browser tab. Closing the tab discards them. The site uses Google Analytics for traffic measurement (anonymized) but the analytics never see what you type into the form.
Can I use the Correlation Coefficient on my phone?
Yes. The tool is responsive and tested on iOS Safari, Android Chrome, and major desktop browsers. Touch targets meet Apple's 44pt and Google's 48dp minimum.
Does the Correlation Coefficient work offline?
Yes. Once the page has loaded, it works without internet. The calculation runs in JavaScript on your device.
How do I report a bug or suggest improvement to the Correlation Coefficient?
Email hi@3tej.com with the URL of this page and a description of what you saw vs expected. We typically respond within 72 hours.
Can I share results from the Correlation Coefficient?
Take a screenshot or copy the output. The page doesn't generate shareable URLs for specific calculations - inputs stay in your browser only.
Why are the results different from another correlation coefficient tool?
Most likely: different formula assumptions, different default values, different rounding rules, or different applicable rates. Check the methodology if both tools document it. Both can be valid for different scenarios.
Is correlation the same as causation?
No. Correlation = two things move together. Causation = one causes the other. Confounders, reverse causation, and coincidence can all create correlation without causation. To establish causation: randomized controlled trials, natural experiments, or instrumental variables.
What's a good correlation coefficient?
Field-dependent. In physics, r = 0.95 is normal; in social science, r = 0.4 is strong. Squared r (R^2) tells you the % of variance explained: r = 0.7 means ~49% of variance is shared.
Should I report mean or median?
If data is symmetric (no skew, no outliers): mean. If skewed (income, house prices, response times): median. Always report both for unfamiliar audiences plus a measure of spread (σ for symmetric, IQR for skewed).
What sample size do I need?
Depends on effect size, variability, and desired power. For detecting a medium effect (Cohen's d = 0.5) with 80% power at α = 0.05: ~64 per group for a t-test. Use a power calculator before running the study.
Why do I get different results from t-test vs Mann-Whitney?
t-test assumes normal data and equal variances. Mann-Whitney (Wilcoxon) is non-parametric - it ranks values and doesn't assume distribution. For non-normal data, Mann-Whitney is more robust but slightly less powerful.
Descriptive statistics quick reference
| Measure | Formula | Best when |
|---|---|---|
| Mean (arithmetic average) | sum / n | Symmetric data, no outliers |
| Median | Middle value when sorted | Skewed data (income, house prices) |
| Mode | Most frequent value | Categorical data |
| Range | max - min | Quick spread; sensitive to outliers |
| Variance | Σ(xi - mean)² / n | Spread; in squared units |
| Standard deviation (σ) | √Variance | Spread in original units |
| IQR (interquartile range) | Q3 - Q1 | Robust spread, ignores outliers |
The normal distribution
The bell curve. Most natural measurements (heights, IQ, exam scores) approximate normal. Key properties:
- ~68% of values fall within 1 σ of mean
- ~95% within 2 σ
- ~99.7% within 3 σ
A z-score expresses how many σ a value is from the mean: z = (x - mean) / σ. z = 1.96 corresponds to the 97.5 percentile (95% confidence interval bound).
Hypothesis testing decoder
| Test | Use when | Tells you |
|---|---|---|
| t-test (one sample) | Compare one mean to a known value | Is sample mean different from this number? |
| t-test (independent) | Compare means of two groups | Are these two groups different? |
| t-test (paired) | Compare same subjects before/after | Did treatment change the outcome? |
| ANOVA | Compare means of 3+ groups | Is at least one group different? |
| Chi-square | Categorical data (e.g., 2x2 tables) | Is there association between categories? |
| Pearson correlation | Linear relationship between 2 continuous variables | Strength and direction (-1 to +1) |
| Linear regression | Predict one variable from another | Slope, intercept, R^2 fit |
P-value interpretation
The p-value is the probability of seeing data this extreme (or more) IF the null hypothesis were true. Common misuses:
- p < 0.05 does NOT mean 'effect is real'. With many tests, ~5% will be false positives.
- p > 0.05 does NOT mean 'no effect'. Could be underpowered.
- Effect size matters more than p. A statistically significant 0.1% difference is rarely practically important.
Sample size for confidence intervals
For estimating a proportion within ±3 percentage points (e.g., a poll):
- 95% confidence: n ≈ 1,067
- 99% confidence: n ≈ 1,843
- For ±5%: n ≈ 384 (95%) or 664 (99%)
This is why political polls cluster around n=1,000.
