Morbid statistics

briantrumpet · January 2014

Apologies for the morbid nature of this post (if you've been affected by a death of a friend or family cyclist I suggest you bypass this post).

Something that interests me is the sad but amazingly predictable number of cyclist deaths each year in the UK. Part of the reason this catches my eye is that, in fact, though the 100 or so deaths each year is a horrible waste of life, it is both remarkably low for the number of miles cycled in this country each year (about 3 billion), but also stays within quite a narrow margin too - a "20% increase" actually is (statistically) a tiny increase for the number of miles cycled and number of people on the roads. On average, two cyclists will be killed each week in the UK (compare that with about 40 motorists), and even with the cluster in London in the latter part of 2013, I think the annual totals will be roughly in line with the preceding years.

Now here's the thing - when each of those deaths is, well, if not random, the outcome of a unique set of circumstances (location, people, time, etc.), I can't wrap my head around how consistent the numbers are. Given the random nature of the events, I'd expect a much wider number of deaths - maybe 50 one year, 200 the next - but we are left with two per week, on average, for the past several years. What is 'causing' that consistency, in such widely spaced events?

kieranb · January 2014

probably indicates actually that the road conditions do not vary much hence the probablilty of an accident are fairly constant over time, so just get fluctuations around a central number.

smidsy · January 2014

briantrumpet wrote:

Given the random nature of the events, I'd expect a much wider number of deaths - maybe 50 one year, 200 the next - but we are left with two per week, on average, for the past several years. What is 'causing' that consistency, in such widely spaced events?

Yes I feel the key there is average.

You could have 6 one week, 15 in a day and nothing for a while - that is the problem with statistics (and averages).

The statistic is constant - I suspect the actual events are very random indeed.

ai_1 · January 2014

briantrumpet wrote:

Apologies for the morbid nature of this post (if you've been affected by a death of a friend or family cyclist I suggest you bypass this post).

Something that interests me is the sad but amazingly predictable number of cyclist deaths each year in the UK. Part of the reason this catches my eye is that, in fact, though the 100 or so deaths each year is a horrible waste of life, it is both remarkably low for the number of miles cycled in this country each year (about 3 billion), but also stays within quite a narrow margin too - a "20% increase" actually is (statistically) a tiny increase for the number of miles cycled and number of people on the roads. On average, two cyclists will be killed each week in the UK (compare that with about 40 motorists), and even with the cluster in London in the latter part of 2013, I think the annual totals will be roughly in line with the preceding years.

Now here's the thing - when each of those deaths is, well, if not random, the outcome of a unique set of circumstances (location, people, time, etc.), I can't wrap my head around how consistent the numbers are. Given the random nature of the events, I'd expect a much wider number of deaths - maybe 50 one year, 200 the next - but we are left with two per week, on average, for the past several years. What is 'causing' that consistency, in such widely spaced events?

There is no mystery here. As you suggest, there's massive randomness in individual accidents but you would expect statistical figures based on a large population and highly randomised variables to be very consistent. Big variations year to year would indicate a specific trend indicating it was not really that random at all.
I think you're misunderstanding the mechanisms at play.

If the number did change from 50 one year to 200 the next it would be incredibly unlikely that there isn't a specific common factor influencing this.
Think of it this way. Lets say I flipped a coin 5 times and then did the same again. There would be nothing odd about the results if I got 4 heads/1 tail the 1st time and 2 heads/3 tails the 2nd time. Agreed? That would mean the occurrence of tails tripled between the two attempts on a completely random event.
Now what if I did 1000 coin flips each time. What's the likelihood of getting vastly more heads than tails the 1st time? It's pretty unlikely. The more times you do a 50/50 test the closer the result will tend to be to 50/50. So the two tests are far more likely to match up pretty closely than if you only have a few events each time.
You COULD still get significantly more heads or tails but that's very unlikely unless there's a cause. If there's a cause then it's not a random result.

ai_1 · January 2014

smidsy wrote:

Yes I feel the key there is average.

You could have 6 one week, 15 in a day and nothing for a while - that is the problem with statistics (and averages)....

that's only a problem with statistics if you're using them inappropriately (which happens a lot). Statistics are incredibly useful when used well but tend to get a bad reputation with the general public due to being horrendously misused/abused more often than they're applied correctly.

smidsy · January 2014

Ai_1 wrote:

smidsy wrote:

Yes I feel the key there is average.

You could have 6 one week, 15 in a day and nothing for a while - that is the problem with statistics (and averages)....

that's only a problem with statistics if you're using them inappropriately (which happens a lot). Statistics are incredibly useful when used well but tend to get a bad reputation with the general public due to being horrendously misused/abused more often than they're applied correctly.

No you have it the wrong way round. Statistics are only used to make/underpin a specific point, thus are only ever used in a way that the author chooses.

If you measure something for long enough you can come up with any number you like (you just select what criteria you want).

Probabilty is actually what is useful, not statistics (we have covered this on here before somewhere).

If you take those statistics they mean nothing without context - probabilty gives the context. E.G. that 2 per week could be really bad or really good, without knowing the total number of cyclist we have no idea.

slowbike · January 2014

Statistics are useful - as you say - if you know the context of those statistics.

2 cyclist deaths per week is an ambiguous figure as we cannot gauge the causes of those deaths.
To make it more meaningful then we need to know
1) what type of environment those deaths occured in eg: road/offroad/city/town/rural
2) time of day
3) what the primary cause of death was ie if it was an accident then who was the "at fault" party. If it was ill-health then although it is a cyclist death, cycling wasn't the cause of the death.

I would think that if you took the raw data and applied a location stamp to it then most deaths occur in a city during a commute and the accident was with another vehicle - but then I would hazzard a guess that there are more cyclists commuting in cities than other areas.

Bozman · January 2014

It'd be interesting to see the figures if you took London out of the equation.

ricky1980 · January 2014

there isn't sufficient information online that i can see that can be used to draw any conclusions. I am not sure if the respective government or press people have more detailed stats.

But in order to study a trend over time, you naturally need stats collated against time. so looking at a single year performance compared with the previous year doesn't tell anyone anything.

But the previous post regarding the general road condition would be an interesting hypothesis to see if the statistics back it up. As if the year on year trend in localised areas are constant or it is trending up/down. and you can easily corrolate that to inititives taken by local authorities to see which scheme or road planning is the most/least sucessful and find out why it is so good/bad and apply it nationally if possible.

this won't take a team of statistians long to corrolate and fingure out and certainly won't cost the government huge amount of money. but i suspect it is the lack of interest in making roads better for cyclists in the whitehall that is stopping the death rate going down.

that said, the increase number of people using bikes are on the increase and therefore naturally there will be more accidents.

ai_1 · January 2014

smidsy wrote:

Ai_1 wrote:

smidsy wrote:

Yes I feel the key there is average.

You could have 6 one week, 15 in a day and nothing for a while - that is the problem with statistics (and averages)....

that's only a problem with statistics if you're using them inappropriately (which happens a lot). Statistics are incredibly useful when used well but tend to get a bad reputation with the general public due to being horrendously misused/abused more often than they're applied correctly.

No you have it the wrong way round. Statistics are only used to make/underpin a specific point, thus are only ever used in a way that the author chooses.

If you measure something for long enough you can come up with any number you like (you just select what criteria you want).

Probabilty is actually what is useful, not statistics (we have covered this on here before somewhere).

If you take those statistics they mean nothing without context - probabilty gives the context. E.G. that 2 per week could be really bad or really good, without knowing the total number of cyclist we have no idea.

Nonsense. You're just saying that bad statistics are bad statistics. The usefullness and accuracy of statistics is provable and not open for debate. No-one with any idea what they're talking about does debate it. But as I said, statistics are massively abused - especially in advertising and bad news reporting (what you describe as statistics are the bad practices often used in these fields and labelled statistics - they're not statistics, they're either confusion or outright lies presented in fancy dress).
You say probablility is useful, not statistics. This makes no sense. What are you saying is the difference? Statistics is the set of tools used to calculate and work with probability based data. Probability can't really be of any use without statistics. Essentially if you ridicule one you ridicule the other.

smidsy · January 2014

Ai_1 wrote:

Nonsense. You're just saying that bad statistics are bad statistics. The usefullness and accuracy of statistics is provable and not open for debate. No-one with any idea what they're talking about does debate it.

Oh the irony.

Ai_1 wrote:

You say probablility is useful, not statistics. This makes no sense. What are you saying is the difference?

Statistics are history (dealing in past events) - they merely provide a numerical way of communicating what has already happened. There is often no context (as is the case with the OP's stats)

Probability deals with future events - It help us to predict how likely something is to happen based on known parameters. This is the bit that I am saying is missing in the statistics.

Simply quoting a statistic of 2 people die every week means nothing as we have no idea if that represents a high or low risk. People need actual useable data to enable them to make informed choices.

A statistic is a statement of fact (however contrived), nothing more.

ai_1 · January 2014

smidsy wrote:

Ai_1 wrote:

Nonsense. You're just saying that bad statistics are bad statistics. The usefullness and accuracy of statistics is provable and not open for debate. No-one with any idea what they're talking about does debate it.

Oh the irony.

Not really, but perhaps it reads badly. I've elaborated a little below.

smidsy wrote:

Ai_1 wrote:

You say probablility is useful, not statistics. This makes no sense. What are you saying is the difference?

Statistics are history (dealing in past events) - they merely provide a numerical way of communicating what has already happened. There is often no context (as is the case with the OP's stats)

Probability deals with future events - It help us to predict how likely something is to happen based on known parameters. This is the bit that I am saying is missing in the statistics.

Simply quoting a statistic of 2 people die every week means nothing as we have no idea if that represents a high or low risk. People need actual useable data to enable them to make informed choices.

A statistic is a statement of fact (however contrived), nothing more.

Statistics are not simply history. Probability is not the future. They are both mathematical terms not philosophical ideas. Probability and statistics are not two different things. Statistics is a mathematical discipline. Probability is a term dealt with extensively within the field of statistics.

If you studied statistics in school and/or college you'll probably remember that it was part of your maths curriculum. Not history, literary fiction, marketing or religious studies. It's a mathematical discipline. It's based on logical concepts and provable hypotheses. I'm not just saying that statistics are fact because it's my opinion and I'm arrogant. It's a fact. It's not an opinion.

Almost everyone who does engineering, experimental physics or a whole range of other subjects, especially in the sciences, will have had no choice but to do their fair share of statistics. Why?....because that's how you prove you're NOT just making stuff up. It's how you demonstrate correlation between theory and reality, or very often, it's how you discover your theory is flawed. Statistics can be used to analyse the past just as it can be used to predict the future. For complex problems it's often the only tool available. It's often the tool that reveals the truths we're not bright enough to recognise. It's exactly the opposite to what you seem to think it is. If some people doctor their results, misrepresent their data or just don't know what they're doing, that doesn't reflect on statistics. It reflects on them.

I don't particularly enjoy statistics and I'm not an expert but I know enough to understand it's a massively powerful tool. It's one of the pillars on which science, technology and indeed society is based..... yet you seem to think it's just a cynical a marketing tool.

Used well, statistical tools are probably the most accurate, revealing and objective we have.

I can't understand what point you're trying to make when you say that a statistic is a statement of fact and nothing more. A probability is a fact .... where are we going with this?

tootsie323 · January 2014

One query to throw into this discussion. You state that the number of cycling deaths per year are remarkably consistent but how many more people have taken to bicycles? If the number of cyclists is on the increase but the number of deaths is not, this would suggest that the proportion of cyclist deaths is actually dropping.
I'm making the assumptions that (i) the number of deaths per year is an absolute, as opposed to proportional, value and (ii) that the number of people cycling is on the up.

keef66 · January 2014

I work in life sciences and statistical analyses are used to determine whether the differences between observed results are likely to be due to experimental treatments or due to natural variation.

Almost every statistician I have ever met has cautioned against over reliance on statistical analysis.

I particularly like:

"Some people use statistics in the same way a drunk uses a lamp-post; more for support than illumination"

briantrumpet · January 2014

tootsie323 wrote:

One query to throw into this discussion. You state that the number of cycling deaths per year are remarkably consistent but how many more people have taken to bicycles? If the number of cyclists is on the increase but the number of deaths is not, this would suggest that the proportion of cyclist deaths is actually dropping.
I'm making the assumptions that (i) the number of deaths per year is an absolute, as opposed to proportional, value and (ii) that the number of people cycling is on the up.

That might well be right. But it's still the rarity but relative consistency of total fatalities that I still can't wrap my head around, as, unlike the tossing a coin thing, this is roughly one death every 30,000,000 miles cycled in the UK (if my maths and the Government figures are vaguely correct), not a 1:2 chance. I guess a better analogy might be with a rare genetic disease that kills just a handful of people a year ... though, in that case, I suppose we do know that a certain proportion of people with a particular disease are likely to die, whereas cyclists' deaths aren't pre-programmed in the genes.

But I did fail A-level maths. And I never did do statistics, because I was in the top set. Haha.

ai_1 · January 2014

The coin flip analogy is valid. The purpose of the analogy was to explain how something with massive variation in a small sample population becomes predictable as the sample size grows. The outcome will correlate closely with the probability for a large sample. This is not the case for a small sample. The actual probability value is unimportant to understanding the concept. It is true for a 0.00002 probability just as it is for a 0.5 probability

This is the answer to the question you asked at the start of the thread.

smidsy · January 2014

keef66 wrote:

I work in life sciences and statistical analyses are used to determine whether the differences between observed results are likely to be due to experimental treatments or due to natural variation.

Almost every statistician I have ever met has cautioned against over reliance on statistical analysis.

I particularly like:

"Some people use statistics in the same way a drunk uses a lamp-post; more for support than illumination"

At last someone talking sense.

smidsy · January 2014

Ai_1 wrote:

The coin flip analogy is valid.

OK you flip a coin 100 times and it lands on heads every time.

Statistically how many times has it landed on heads?

What is the probabilty it will land on heads the next flip?

ai_1 · January 2014

smidsy wrote:

Ai_1 wrote:

The coin flip analogy is valid.

OK you flip a coin 100 times and it lands on heads every time.

Statistically how many times has it landed on heads?.

It has landed on heads 100 times. I'm not sure why you've got the word "statistically" at the start of the sentence though.

smidsy wrote:

What is the probabilty it will land on heads the next flip?

The probability remains 0.5 as it was for every other flip.

I presume you're trying to make a point but I'm at a loss to see what it is. The above in no way disagrees with anything I've said or invalidates anything. The analogy is valid. There is no mystery in the how or why of any of this. It's pretty basic maths. These aren't my ideas or opinions. It pretty common unremarkable knowledge. Look it up or ask the opinion of someone you'll believe. I'm just wasting my time at this point.

smidsy · January 2014

Ai_1 wrote:

It has landed on heads 100 times. I'm not sure why you've got the word "statistically" at the start of the sentence though.

Statistically it lands on heads 100% of the time.

Ai_1 wrote:

probability remains 0.5 as it was for every other flip.

Exactly.

Ai_1 wrote:

I presume you're trying to make a point but I'm at a loss to see what it is.

Statistically it would be guarenteed to land on heads, but in reality it has a 50/50 chance.

So statistics and probabilty give 2 different answers not the same as you would have us believe.

nathancom · January 2014

The probability that Smidsy is wrong is exactly 1. Probability is a branch of mathematics that is used extensively within Statistics. Statistics is simply a group of methods for analysing sets of data in order to reach meaningful conclusions. Your anterior/posterior dichotomy between Statistics and Probability is entirely invalid. Probability very much relies on data from past events so as to create a statistical model that can be applied to equivalent events that take place equally in the past, present or future.

smidsy · January 2014

The probabilty that someone is wrong can never be 1 as there are always 2 possible outcomes. I therefor need not listen to anything else you stated in your overly contrived (attempt to look superior) post, as you can not even grasp the basics.

nathancom · January 2014

Clearly I am stating that the outcome of you being correct is not possible, therefore there is only 1 outcome: you being wrong. This is simply because you are making up your own definition of terms that already have clear definitions. It is a shame you think as clear a definition of terms as I could manage is contrived, but throughout this thread you have tried to redefine mathematical and analytical terms to suit your own argument, just so you can try and appear correct. Go look the words up in a dictionary is my advice. The probability of you being wrong is still 1.

You have come out with some real guff, like 'statistics deals with history, probability with the future' and then told someone else trying to politely correct your misinformation with the truth that he was wrong. Stunningly wrongheaded.

smidsy · January 2014

nathancom wrote:

Clearly I am stating that the outcome of you being correct is not possible, therefore there is only 1 outcome: you being wrong.

Which statistically may be true but in all probability you are still not correct. And the outcome of me being wrong is not even possible. There is still a chance though.

This has now ceased to be sufficent a challenge and is certainly adding nothing the the thread so I am out.

nathancom · January 2014

smidsy wrote:

nathancom wrote:

Clearly I am stating that the outcome of you being correct is not possible, therefore there is only 1 outcome: you being wrong.

Which statistically may be true but in all probability you are still not correct. And the outcome of me being wrong is not even possible. There is still a chance though.

This has now ceased to be sufficent a challenge and is certainly adding nothing the the thread so I am out.

Go and look the words up in a dictionary. You are talking rubbish.

Statistics -
The mathematics of the collection, organization, and interpretation of numerical data, especially the analysis of population characteristics by inference from sampling.

Probability -
The likelihood of the occurrence of an event. The probability of event A is written P(A). Probabilities are always numbers between 0 and 1, inclusive.

ai_1 · January 2014

smidsy wrote:

Ai_1 wrote:

It has landed on heads 100 times. I'm not sure why you've got the word "statistically" at the start of the sentence though.

Statistically it lands on heads 100% of the time.

Ai_1 wrote:

probability remains 0.5 as it was for every other flip.

Exactly.

Ai_1 wrote:

I presume you're trying to make a point but I'm at a loss to see what it is.

Statistically it would be guarenteed to land on heads, but in reality it has a 50/50 chance.

So statistics and probabilty give 2 different answers not the same as you would have us believe.

That's just a load of rubbish.....

You're coming up with spurious conclusions due to your own misunderstanding and then attributing the blame to an entire field of mathematics instead of listening to those trying to explain where you've gone wrong. It's astonishing.

To be clear. You're willing to believe that all mathematicians, scientists, engineers and actuaries amongst others are morons rather than consider the possibility that you should look up the meaning of the word "statistics".

I'm not sure if it's funny, sad or both.
I just hope no one is taking you seriously.

slowbike · January 2014

smidsy wrote:

Ai_1 wrote:

It has landed on heads 100 times. I'm not sure why you've got the word "statistically" at the start of the sentence though.

Statistically it lands on heads 100% of the time.

Ai_1 wrote:

probability remains 0.5 as it was for every other flip.

Exactly.

Ai_1 wrote:

I presume you're trying to make a point but I'm at a loss to see what it is.

Statistically it would be guarenteed to land on heads, but in reality it has a 50/50 chance.

So statistics and probabilty give 2 different answers not the same as you would have us believe.

Statistically speaking you'd be more likely to get another head though - probably a rigged coin either weighted one side or has 2 head sides. Or perhaps it was just the magnetic flux at that point in time. Eitherwhich way, the probability of throwing a tails has been reduced to almost zero with so many heads being thrown in a row.

djhermer · January 2014

What was the probability of such a potentially morbid thread turning into one so thoroughly entertaining?

I'm no mathematician, statistician or probabilitist, but I do love wathching someone dig themselves deeper.

Keep going.

ai_1 · January 2014

Slowbike wrote:

smidsy wrote:

Ai_1 wrote:

It has landed on heads 100 times. I'm not sure why you've got the word "statistically" at the start of the sentence though.

Statistically it lands on heads 100% of the time.

Ai_1 wrote:

probability remains 0.5 as it was for every other flip.

Exactly.

Ai_1 wrote:

I presume you're trying to make a point but I'm at a loss to see what it is.

Statistically it would be guarenteed to land on heads, but in reality it has a 50/50 chance.

So statistics and probabilty give 2 different answers not the same as you would have us believe.

Statistically speaking you'd be more likely to get another head though - probably a rigged coin either weighted one side or has 2 head sides. Or perhaps it was just the magnetic flux at that point in time. Eitherwhich way, the probability of throwing a tails has been reduced to almost zero with so many heads being thrown in a row.

Slowbike,

In this analogy the assumption was that the coin flip was a 50/50. Under those circumstances the probability of 100 consecutive heads is so infinitesimally small that it is for all sensible purposes impossible.
So, I would agree with you that the 100 in a row result is so hugely statistically significant that you'd certainly be justified in saying the test is flawed and not actually random. I didn't want to go into this as I was having enough trouble trying to get across an explanation of much more basic concepts.

chris_bass · January 2014

I think the misunderstanding and somewhat heated discussion is stemming from what each side means by statistics.

If you go out and collect a load of data and present them, the cold, hard data (with no analysis) could be called the statistics.

so, 200 people die a year is a statistic. if, on average, 200 people died every year for 5 years and you were just looking at these statistics and someone asked you how many people will die next year, you'd probably say 200.

but statistics can also be the interpretation of the data as well, when something becomes statistically significant or using chi-squared, z-tests and all that jazz. so if someone asked you the say question and you used statistics (in this meaning) you would probably have a different answer!

neither person is wrong, they are just using the word statistics differently!

in my opinion anyway, i have a 90% confidence limit on this theory by the way!!

smidsy · January 2014

OK my very last word on this. I can see that my attempt to keep things simple has been confused.

Your definitions agree with my simple summary.

nathancom wrote:

Statistics -
The mathematics of the collection, organization, and interpretation of numerical data, especially the analysis of population characteristics by inference from sampling..

History. You can not collect something which does not exist (has not occurred).

nathancom wrote:

Probability -
The likelihood of the occurrence of an event. The probability of event A is written P(A). Probabilities are always numbers between 0 and 1, inclusive.

Future. If something has already happened it has no likelhood - it is absolute/difinitive/already occurred.

I have never said that statistics are not used as part of prediction, nor have I claimed that they are unrelated to probabilty.

Morbid statistics

Comments

Categories