Any statisticians out there?

johngti · August 2020

Just in case, what are your thoughts on this summary data (it’s from a piece of coursework)

Pearson’s PMCC at 0.52 (moderate correlation according to the textbook, but only just!)

Spearman’s at 0.38 so poor correlation there

My view - there’s no correlation. Even though Pearson’s is moderate, it’s only just moderate and Spearman’s is bad enough to say that there’s no correlation in the data.

Colleague’s view - moderate correlation in the data but no correlation in the ranks so more likely to have a linear relationship so go ahead with regression line etc.

I’m worried that the second approach just comes across as desperately looking for a correlation that isn’t there for the sake of jumping through a hoop.

Thoughts?

capt_slog · August 2020

Thoughts?

I'm glad I didn't do statistics.

johngti · August 2020

capt_slog said:
Thoughts?

I'm glad I didn't do statistics.

Can’t blame you!

I think, for reference, that the answer is as follows. The student is looking to see if more highly paid footballers score more goals. So Pearson’s is the wrong correlation to look at. Because you’re comparing ranks, ie the highest paid should score more goals, it makes more sense to use spearman’s rank correlation coefficient.

I suspect that’s the approach needed anyway.

ddraver · August 2020

Yeah I agree...

😶

johngti · August 2020

Excellent. I think.

Mad_Malx · August 2020

johngti said:

capt_slog said:
Thoughts?

I'm glad I didn't do statistics.
Can’t blame you!

I think, for reference, that the answer is as follows. The student is looking to see if more highly paid footballers score more goals. So Pearson’s is the wrong correlation to look at. Because you’re comparing ranks, ie the highest paid should score more goals, it makes more sense to use spearman’s rank correlation coefficient.

I suspect that’s the approach needed anyway.

Pretty much.

The test choice depends on the data distribution (amongst other things).
Neither of your variables have a normal distribution - they are highly skewed, so the data do not satisfy at least one if the underlying assumptions of the Pearson test. Think of the p value as a calibrated system - If the assumptions are not met then the significance value is flawed.
You can test for normality with Shapiro-Wilke, although the bleeding obvious test tells you that neither wages nor goals are normally distributed.

Spearman’s does not make an assumption about the distribution, so is appropriate, although less powerful than Pearson with smaller samples (provided the underlying assumptions ARE satisfied).

johngti · August 2020

Thanks for that. I hadn’t thought of the lack of a normal distribution being a problem (I’m not a natural statistician - avoided it in my degree!)

Ben6899 · August 2020

I can tell you from simply watching football that there's no correlation.

johngti · August 2020

Well exactly

Any statisticians out there?

Comments

Categories