Correlation does not imply causation

Useful to know: lists, summing lots of numbers.

Two variables are correlated if there's some statistical relationship between the two. However, just because two variables are correlated does not mean that one is caused by the other. This misconception is commonly referred to as “correlation does not imply causation”.

One way of computing a correlation coefficient between two variables $X$ and $Y$ with with $n$ measurements $x_1, x_2, \dots, x_n$ and $y_1, y_2, \dots, y_n$ is the Pearson correlation coefficient $$r = \frac{\operatorname{cov}(X,Y)}{\sigma_X\sigma_Y}$$ where $$\operatorname{cov}(X,Y) = \sum_{i=1}^n (x_i - \overline{x})(y_i - \overline{y}) = (x_1-\overline{x})(y_1-\overline{y}) + \cdots + (x_n-\overline{x})(y_n-\overline{y})$$ is the covariance between $X$ and $Y$, $$\sigma_X = \sqrt{\sum_{i=1}^n (x_i - \overline{x})^2} = \sqrt{(x_1 - \overline{x})^2 + \cdots + (x_n-\overline{x})^2} \quad \text{and} \quad \sigma_Y = \sqrt{\sum_{i=1}^n (y_i - \overline{y})^2} = \sqrt{(y_1 - \overline{y})^2 + \cdots + (y_n-\overline{y})^2}$$ are the variances of $X$ and $Y$, and $$\overline{x} = \frac{1}{n} \sum_{i=1}^n x_i = \frac{x_1 + x_2 + \cdots + x_n}{n} \quad \text{and} \quad \overline{y} = \frac{1}{n} \sum_{i=1}^n y_i = \frac{y_1 + y_2 + \cdots + y_n}{n}$$ are the averages (or means) of the $X$ and $Y$ measurements. Here $r$ is always between -1 and 1.

Taking in two lists of measurements $x_n$ and $y_n$, return the Pearson correlation coefficient for them.

Input: Two lists $x_n$ and $y_n$ of size $n$.

Output: The Pearson correlation coefficient $r$ between the two variables.

Example input

([5427, 5688, 6198, 6462, 6635, 7336, 7248, 7491, 8161, 8578, 9000], [18.079, 18.594, 19.753, 20.734, 20.831, 23.029, 23.597, 23.584, 22.525, 27.731, 29.449])

Example output

You must be logged in to submit code but you can play around with the editor.

You must be logged in to upload code.

• There are some really good websites for this stuff.