In a comment a couple of days back in response to Petesquiz, I wrote:

If science were compared to a beautiful sandcastle, with towers and spires and turrets, statistics is the wave that sweeps over it and reduces it back to a low hummock of wet sand.

Many years ago, in the very first introductory lecture on probability and statistics, the tutor drew a rectangle on the blackboard, and several circles inside the rectangle, and invited students to imagine that the rectangle was a solid plane in which there were circular holes. He further invited us to imagine that there was a rain of particles falling on this plane, such that some landed on the plane, and some fell through the holes. Assuming an even distribution of particles across the plane, he then asked: “On average, what fraction of all the particles would fall through the holes?”

As one of the students in the lecture theatre, I was quite stunned by this question. And I had no idea what the answer might be. Or even if there was an answer. But the tutor went on to say that we knew that all the particles fell somewhere inside the rectangle, whose area R we also knew. And that we knew that the particles that fell through the holes landed only on the holes, and we knew the area C of these holes. Therefore on average C/R of the particles would fall through the holes, and this was also the probability that any particle would fall through a hole. How beautifully rational! This was the very first lesson in a long series of lessons, most of which I didn’t understand.

Now I think that, in retrospect, if I had not been so stunned, I might have raised my hand and said that his question assumed that we didn’t know the *exact positions* where all the particles landed. We only had a vague idea. And if we knew exactly, we would be able to give an *exact* answer to any question of this kind. So in one experiment with 10,000 particles scattered by hand over the rectangular plane, exactly 3,237 particles would fall through the holes, and in another experiment exactly 3,352 particles would fall through the holes. Nature always knows the exact answers to these questions: it is we humans who don’t know.

And therefore, in a profound sense, all probabilistic reasoning is reasoning that grows out of ignorance. It grows out of *not* knowing all the facts. It grows out of nescience.

And therefore any true and perfect science must necessarily be devoid of any kind of probabilistic reasoning of this kind. And any science which uses probabilistic arguments of this kind can be automatically described as a defective science. And the more that probabilistic arguments are used in any science, the more defective that science will be.

And since, to the best of my understanding, Quantum Physics employs statistical or probabilistic arguments throughout, we may dismiss Quantum Physics as a defective science (and perhaps the very best example of defective science). And in fact, since nobody understands quantum physics, the “success” of quantum physics has resulted in it becoming not so much a science as a nescience. It’s become a black hole in knowledge.

The “beautiful sandcastle” to which I was referring in my comment was the non-probabilistic science of Galileo and Kepler and Newton, which gave precise answers to questions: e.g. the semi-major axis of the Earth’s elliptical orbit has a length of 1.496× 10^{8 }km. But the arrival of probabilistic or statistical arguments was the arrival of a wave that reduced the beautiful sandcastle to a low hummock of wet sand. For with probability exactitude and precision vanish. Everything gets blurred. The length of the semi-major axis becomes iffy. It might be this, or it might be that. And if we add to that the fact that we are using mathematical calculators which might or might not be providing us with the exact right answers, then science – knowledge – must dissolve away completely into nescience.

And very arguably we are now witnessing the dissolution of the beautiful sandcastle of exact science into a low hummock of nescience. For it is now being argued that if there is even a 0.001% probability that Environmental Tobacco Smoke kills anybody in any year, then that means that in a population of 1 million people, ETS will cause the deaths of 10 people – which is utterly intolerable. Antismoking Tobacco Control zealots now conjure up hundreds of millions of dead people using statistical arguments of this kind. e.g. A Billion Lives. And these deaths are regarded as being as certain as the deaths recorded in drownings or traffic accidents.

People may object, like Roberto, that such arguments “can be easily refuted”, but it is unfortunately the case that *they are not being refuted*. They have instead become accepted common knowledge. They are repeated *ad nauseam* in the mainstream media. They are what Everybody Knows. Everybody knows that smoking causes lung cancer. When I lived in Devon I knew a woman whose father died of it. He hadn’t been a smoker, but he had once worked as a bartender in a smoky pub: *post hoc ergo propter hoc*. We have descended back into a medieval mindset in which anything, however improbable, can cause anything else.

Tho the mathematical and statistical tends to make my head hurt, your sand castle picture made me intuitively and instantly understand your point. Brilliant analogy. And yes, the Billion Lives that the vapers have apparently convinced themselves they’re saving has always struck me as pure concocted nonsense. Why not 79,487? 543,972? How about none?

Just found this on fb. Dissection of A Billion Lives including its math. I like Dave G and a lot of the people involved in this but ….no.

http://taking-liberties.squarespace.com/blog/2016/10/29/review-a-billion-lives.html#.WBVHmU1nVF4.twitter

So in one experiment with 10,000 particles scattered by hand over the rectangular plane, exactly 3,237 particles would fall through the holes, and in another experiment exactly 3,352 particles would fall through the holes. Nature always knows the exact answers to these questions: it is we humans who don’t know.Your knowledge of probability is sadly so badly lacking you don’t even know how bad it is.

We are well aware that we don’t know the answers to most questions. But what we do often know is the limits of our lack of knowledge very precisely.

We can calculate that the variation in your example between 3,352 and 3,237 is actually far wider than will usually occur in nature — about 1% chance (using binomial calculations on a mean halfway between the two). So we know the normal amount will almost certainly in the region of 3,300 and the chance of deviations past 3,200 and 3,400 is in the 2% region.

I happen to believe that most risks assessments of the dangers of environmental smoke are exaggerated, often to the point to nonsense. But you cannot wave away a risk by pretending you don’t believe in risks, because “they are only probabilities”.

I hope you’re happy to tell the relatives of people killed by drunk drivers that the drivers weren’t at fault because there was only a small probability they would kill anyone. The thing is that small probabilities do add up, to give large numbers of actual deaths.

The “beautiful sandcastle” to which I was referring in my comment was the non-probabilistic science of Galileo and Kepler and Newton, which gave precise answers to questions:Precise answers, sure, but wrong answers. Wrong enough that they worked out that there problems with them over 100 years ago and started looking for the correct equations.

That you cannot follow the probabilistic equations of modern science is a lack of understanding on your part. It should not be confused with lack of beauty — because they are very beautiful indeed.

That they are probabilistic at the micro level does not mean that they are probabilistic at the macro level, because they are not.

We can calculate that the variation in your example between 3,352 and 3,237 is actually far wider than will usually occur in nature — about 1% chance (using binomial calculations on a mean halfway between the two). So we know the normal amount will almost certainly in the region of 3,300 and the chance of deviations past 3,200 and 3,400 is in the 2% region.What utter nonsense you are writing! Those two numbers weren’t the result of any calculation of mine, or of anyone else. They are both numbers that I made up! They are inventions, fabrications! It is quite absurd for you to have taken these two numbers and found either the mean and the deviation (standard or otherwise) with your “binomial calculations”. You have taken two meaningless numbers and drawn a meaningless conclusion from them.

I’ll quite readily agree that probability theory isn’t my strongest mathematical suit, because I hardly ever use it, except in occasional adventures. I don’t know anyone who ever does. My Newtonian orbital simulation model – which I wrote myself – uses no probability or statistics whatsoever. And I very much doubt that even JPL use any either in constructing their ephemerides.

I am aware that Newtonian mechanics is not accurate. It doesn’t work on the galactic scale. But I have no belief that any statistician will provide us with improved equations. It will require a visionary of the scale of Kepler or Newton to do that.

Lastly, can you point me to where I wrote/waved that:

But you cannot wave away a risk by pretending you don’t believe in risks, because “they are only probabilities”.To the contrary, I would suggest that the dangers of environmental smoke are very well known, hence the various Clean Air Acts of 1956, 1968 and 1993, etc.

Sound evidence of real harm of diesel emission particulates is also emerging, which is ironic given that diesel was considered by ‘experts’ who’d done the studies to be a more environment friendly alternative to petrol and, as such and attracted less punitive duty in order to encourage diesel engine cars (the recent re-think on fats, after decades of ‘expert’ advice and recommendations, is another example). How anyone can seriously identify harm by SHS from a backdrop of widespread environmental pollution is nonsensical, made worse by the fact that smokers have been deliberately treated as scapegoats in order to downplay some actual causes of ill health.

Great post, Frank! As a smoker, I am thoroughly offended and disgusted by the smoking bans and anti-smoking propaganda I encounter on a daily basis. However, as an engineer and simply a member of society, I am distraught by the modern molestation of science as a discipline.

Probability and statistics are great tools for discovering and detecting trends … trends that once discovered need to be supported or disproven by the scientific method. To see probability and statistics perverted in such a way as to indicate ‘proof’ of causation makes me sick. Imagine if instead of the laws of motion, Isaac Newton wrote a paper about how 79.46% of the times he sat under an apple tree, an apple would fall and hit him on the head, so … voilà, gravity exists! Most likely we wouldn’t even know his name, as he would have been laughed out of the field. However, we see so many modern ‘scientific theories’ based off of similar claims, and the majority of people simply accept them at face value as though they are the result of genuine science. If this trend continues much longer, we, the human race, are completely doomed.

Couldn’t agree more! Statistics are a useful tool in science, but they can only lead you in a direction where, hopefully, proper justifying proof of a hypothesis can be found…or not!

Odd thought. There are 9 planets. Copernicus supposed that these planets moved in circles around the sun. Kepler used the observations by Tycho Brahe of these planets to fit ellipses to their orbits. And Newton then used his law of gravitation to generate Kepler’s ellipses.

I can’t help but think that if statisticians had been around back then, they would have declared that 9 was a statistically insignificant sample of planets, and that there would need to be at least 900 planets (maybe even 9 million) before any useful information could be extracted from them. And there would have been no Copernicus, Kepler, or Newton.

If epidemiologists had been around, they would have taken someone else’s rough and ready estimates of distances and performed a few adjustments to obtain six figure accuracy. Then they’d have calculated the average and standard deviation of them. Finding the earth’s solar distance to be outside their confidence interval they would have concluded that that proved that we are all turtles and that anyone who disagreed was in the pay of big banana.

: D

HABITAT IIINEW URBAN AGENDA Draft outcome document for adoption in Quito, October 2016

10 September 2016. https://habitat3.org/the-new-urban-agenda/

Frank et al., if you haven’t already, take a look at this UN kollectivist view of future cities coming to a local authority near you soon. Uniform Soviet style minimal spaced buildings and green spaces, compressing the ‘environmental footprint’ of burgeoning humanity to nix while guaranteeing ‘clean air’ and spiritually obliteration .

https://www2.habitat3.org/bitcache/97ced11dcecef85d41f74043195e5472836f6291?vid=588897&disposition=inline&op=view

Oy. Utopia! Nirvana! They pledge to end all poverty, disease, discrimination and violence, making sure that every “human settlement” (gotta love their language) houses the proper quotas of women and minorities, welcomes illegals, and has healthy communal “green spaces” from which, undoubtedly, smokers will be banned–unless, of course they’re also banned from the housing

Frank, I stand on my ground: these arguments can be easily refuted even with the same (epidemiological) statistics invoked by tobacco controllers. The key issue is that they openly lie about the results of the studies. Notice that I do not claim that epidemiological studies on ETS are methodologically well designed, specially the “meta analysis” when many (often not well related) smaller studies are joined as a large single study. Likely, these studies are not very robust methodologically speaking, as ETS exposure involves many many biassing and “confounding” factors that are extremely hard to quantify: classification bias (who is a smoker?), degree of ventilation, time of exposure, genetics, medical history, women who lie about their smoking status because of social costume (in Asian countries), etc. Yet, even if we assume these studies are (more or less) OK, they do not reveal a statistically significant risk to justify smoking bans. Simply put: the controllers have disavowed the studies they themselves commissioned and (being a powerful lobby) have forced a prohibitionist agenda based on lies. This is acknowledged by several prominent epidemiologists, as for example Dr Jeffrey Kabat (co-author with James Enstrom of perhaps the best quality ETS study). However, tobacco control also controls peer reviewed medical journals on everything that concerns ETS. Thus, physicians (even as prominent as Kabat) that express dissent on the “accepted” truth on ETS (even if they actually support bans) will not get critical articles published.

In short: the scandal is not the lack of validity of statistics, but the fact that harm from ETS has become a sort of very widespread indisputable truth based on lies, thus it is essentially a dogma (indistinguishable from witchcraft) that everybody believes (and that justifies bans). This is not because of the limitations of statistics (which I agree are evident), but because of the abuse of statistics by a politically powerful “health lobby” financed by public and Pharma money.

A good reference criticizing this abuse of statistics is the article “Lies, damned lies & 400,000 smoking related deaths”, by Levy and Marimont (can be googled). They not only criticize how results on ETS are misrepresented, and thus are believed because laymen are profoundly ignorant on statistics. Ley and Marimont also show how the 400,000 numbers are crudely fabricated by a “garbage in garbage out” SAMMEC program. They provide references (which I have read) where the methodological assumptions in SAMMEC’s design are criticized: SAMMEC uses as reference sample a high income population, while simply using a more socially representative sample already reduce these deaths to less than 300,000. Simply changing weight factors for each “smoking related disease” (which are assigned arbitrarily) and eliminating multiple deaths, leads to larger decrease. Again: it is not that statistics is a flawed discipline, the problem is that it is used incorrectly and dishonestly to accommodate a political agenda.

Finally: to be correctly applicable statistics requires large samples and effective control of biassing and confounding factors. Most ETS studies (in fact most Public Health studies on lifestyle issues) fail to meet these conditions. ETS harm studies have found relative risks that are NOT statistically significant, meaning statistical instability or low predictive power: any slight design modification or uncontrolled biassing confounding factors could yield very large deviations of evaluated relative risk, which (following the standard methodological advice in epidemiology) implies dismissing such risk, as is done whenever smoking is not being studied. Obviously, on ETS (and to lesser degree on primary smoking) the controllers do not even follow these standard rules. This is the scandal, not the lack of validity of statistics.

Sorry for the long post.

Whilst I cannot disagree with your main thrust, what I would ask though is, “What is a statistically significant number and who determines it?”

I come from a background in Chemistry where, for example, in just 12 grams of Carbon you have 6.02 x 10 to the power of 23 atoms of Carbon and I would argue that is a statistically significant number for when you perform an experiment on them.

So, when you come to people, the maximum sample size is 7 x 10 to the power of 9 and, unlike the carbon atoms each of the 7 billion is different. This I believe is where the weakness of statistical analysis comes in. As I understand it, one of the basic assumptions in statistical analysis is that the subjects in any analysis respond identically to each other under the test circumstances. In human experiments, that’s why the subjects are split into bands based on sex and again by age.

Although you’ve demonstrated that, in the relevant studies, the results were not statistically significant, I’d argue that it is not possible to get statistically significant results, ever, when studying humans.

Hi Petesquiz. My background is in Physics, so I understand what you mean. Huge samples (ensembles) of the size of Avogradro number involving identical particles make it easier to design consistent statistical experiments in Physics and Chemistry. Although quantum particles are no longer indistinguishable, this doesn’t change things much as long as you consider large ensembles. All this explains the prediction power and success of Statistical Mechanics and applied Quantum Physics. However, sample size is distinct from statistical significance. You can have a statistical correlation that is not statistically significant even if your sample is of the size of Avogradro number. Also, statistical experiments can be ill designed (roughly for the reasons you mention when comparing atoms with people) even if the sample is huge. Also, depending on the problem at hand, a sample size of 7 X 10^9 is already sufficiently large to get consistent and meaningful statistic results for a well designed experiment or observation.

For any sample and any experiment “statistical significance” of a correlation is measured in terms of 95% confidence likelihood intervals. Why 95% and not (say) 99%? well, it is simply an arbitrary (though reasonable) agreed convention just as many other agreed conventions in science. As long as you refer to this convention results of a given experiment or observation are meaningful and can be compared with other experiments or observations.

When applied to human populations the consistency of statistical results also requires a large sample, but sizes on the thousands (say 100,000) are large enough. Polls are based on samples of around 1000 that fulfill certain criteria. The consistency of results strongly depends on the design, which necessarily depends on the problem being studied or observed. For infectious (communicable) diseases the design simplifies (even if the actual cause is not known) because it is easier to know if the causing agent acts or does not act, and when acting in different individuals there is likely not to be a lot of variation in the effects and symptoms (there a few strands of malaria, not hundreds of them). Therefore, it is easier to interpret the statistical significance of epidemiological studies. However, this is not so for non-communicable disease and chronical disease related to issues of lifestyle, which involve many variables that are extremely hard to quantify. Achieving a good experimental design becomes extremely complicated (even for large sample sizes). Therefore, statistical significance of relative risks must be taken with reservations, and risks that are not statistically significant are more likely spurious. Thus, a < 20% relative risk of lung cancer (even lower figures for vascular disease) from ETS is completely unreliable and baseless, which means that smoking bans are 99% based on ideology and prejudice. But this does not mean that statistics is a flawed discipline.

In my modest opinion, the tragedy of modern Public Health medicine is its attempt to treat complicated health phenomena related to lifestyles (like smoking, drinking over-eating, sugar) as if they were simple communicable disease. To be able to treat (say) smoking as a sort of communicable disease (the "smoking epidemic") they argue that simple statistic methods that work for communicable disease apply to "smoking related" disease, which is a totally ignorant assumption. Yet, controllers do not care about basing their anti-smoking crusade on sound science, their power and fear mongering allows them to ignore facts and get away with it.

Thanks for that. It helps me to better understand the use/misuse of statistical analysis.