African American Boys and Autism
A shocking reanalysis of CDC data claimed that African American boys had a greater risk of autism associated with the time they were vaccinated, and that the agency had covered up this finding. The study ignited a wave of conspiracy charges across the web; but did its author understand how to do statistical analysis?
In 2004, Dr. Frank DeStefano et al. from the Centers for Disease Control (CDC) published a case-control study that compared the vaccine records of autistic kids to the vaccine records of non-autistic kids in a population of children in Atlanta, Georgia. The study found no link between autism and the timing of the MMR vaccine. DeStefano’s paper, and several others pointing to the lack of evidence of a relationship between autism and vaccination, were coauthored by William Thompson, another CDC vaccination expert.
Recently, Thompson shared CDC data from this study with Brian Hooker, who has a PhD in biochemical engineering, and who appears to hold deep convictions about the relationship of autism to vaccination. Hooker reanalyzed the data, included some original data that had not been included in the CDC analysis, and found a large effect for African American boys that had not previously been reported.
This effect—a 3.4-fold increase in the risk of autism—sent shockwaves through the anti-vaccination movement this summer, which were amplified by a YouTube video claiming evidence of a CDC cover up.
But Hooker, who appears to have had little training in statistical analysis made errors that can really skew the statistical picture, making it look like associations were present that may well be spurious. Perhaps this is why Hooker’s paper was retracted. But a retraction of a paper does not prove its falsehood, and a closer look may shed some light on the critical importance of study design.
First of all, Hooker chose to look at African American boys because he already knew there was an association in his sample. Though this might, intuitively, seem like the obvious and the right thing to do, it isn’t. In fact, it’s a basic error in statistical analysis. If you first look at your data and choose the most extreme subgroup among all the data you have, you are bound to find statistical correlations that do not exist in the whole population. This phenomenon is the reason why a study’s protocol is designed before the data are collected; to do so after the data have been collected is tantamount to cherry picking.
Cherry picking aside, big questions remain about what exactly Hooker did (statistically speaking) in his paper. Let’s say, for the sake of argument, that the protocol had been designed to consider African American kids as a separate group before any data was collected. Did Hooker try to calculate the relative risk of autism, depending on vaccination status? Unfortunately, using DeStefano’s data, he simply couldn’t have.
To explain why the relative risk of autism could not have been calculated with the CDC data, we take a brief but relevant foray into conditional probability: DeStefano et al. found that among autistic kids, vaccination rates were similar to those among kids who did not have autism.
This is a distinctly different from the question of whether, among vaccinated kids, there is a higher rate of autism.
In the language of conditional probability, if A is the condition that someone has autism, and B is the condition that someone was vaccinated early, then DeStefano et al. evaluated the probability that they were vaccinated early given they have autism. This is rendered in mathematical notation as Pr(B |A), where the | stands for “given”.
DeStefano compared this conditional probability—the probability that they were vaccinated early given they do not have autism, which is denoted as Pr(B | not-A). This is distinct from finding the probability of autism among vaccinated children, or Pr(A | B).
There is a way to get from Pr(B | A) to Pr(A | B) – if you know the overall probabilities. Bayes’ Theorem says that
Pr(A|B) = Pr(B|A) * P(A) / Pr(B)
So if you know the probability of vaccination Pr(B) and the probability of autism Pr(A), then you can go from the probability of vaccination given autism to the probability of autism given vaccination.
Why is this important? Because it points to two fatal mistakes Hooker made in the attempt to disavow DeStefano et al.
In his introduction, Hooker describes the work by DeStefano et al. in these terms: “the study authors assessed relative risk for obtaining an autism diagnosis for those children receiving the first MMR vaccine before and after 18 months, 24 months and 36 months of age. Destefano [sic.] et al. found a statistically significant relative risk of 1.49 (95% confidence interval [CI]: 1.04 – 2.14) at the 36 month cut-off (i.e., in a comparison of children receiving the MMR before versus after 36 months).”
In this account, it would seem that DeStefano et al. found that kids who received the MMR vaccine before 36 months had a higher risk of autism compared to kids who received it after 36 months.
But DeStefano et al. were not calculating the risk of autism given vaccination; they found the rate of vaccination given autism. Hooker implies that children who receive the vaccination are more likely to get autism. But DeStefano showed that children who have autism are more likely to get vaccinated.
There’s another subtle point, which reveals Hooker’s lack of expertise in statistics: DeStefano et al. were speaking about an odds ratio while Hooker was speaking about relative risk. Now both odds ratio and relative risk trend in the same direction—if they are over the value 1.0, they each imply that one group has more risk than the other. But odds ratios and relative risks tell a very different story; here’s a nice example as to how.
In DeStefano et al.’s data, the rate of vaccination among autistic kids at age 36 months was 93.4 percent and the rate of vaccination among kids who did not have autism was 90.6 percent. The odds ratio is 1.49, and it was statistically significant.
On the other hand, the “relative risk” of vaccination given autism is 1.03, compared to the “risk” of vaccination among kids without autism. In other words, kids who have autism are 3 percent more likely to get vaccinated.
This is a long way from the outrage produced by Hooker’s confusing statement suggesting 36-month old babies have a 49 percent increased chance of autism if they are vaccinated with MMR before that point. Nope. Autistic kids have a higher rate of vaccination. According to DeStefano, this higher rate of vaccination results from parents’ seeking medical care and educational services for children with autism.
But what about Hooker’s hook?
The biggest statistical challenge of the autism/vaccination debate is deciding which data can be compared. It would be so ideal if we could simply corral a bunch of kids from birth, have them lead completely equivalent lives, and give only half of them vaccinations, giving the other kids just saline or no injections at all. Then we can watch and wait: are the vaccinated kids more likely to develop autism? Obviously such an experiment would be unethical for multiple reasons, so instead, our main tool is to consider the question from an observational point of view: design a study, collect data, and analyze said data.
Hooker maintains that DeStefano didn’t use all the data he collected, and that, had that missing data been included, a much stronger relationship between vaccination and autism can be seen. In particular, Hooker’s retracted article claims to have shown that African American boys do have a higher risk of autism if they are vaccinated before 36 months. And Hooker used the DeStefano data to show it, or so he claims.
While it may seem of little statistical import, the design of the DeStefano study is essential and determines the kinds of results one can obtain from the data collected. The DeStefano study is a case-control study, which means that kids with autism were included (the cases) and then a separate set of kids without autism (the controls) were matched by various features to be similar to the ones with autism, except that they aren’t autistic.
A key element of a case-control study is that the controls are chosen to be similar to the cases in any way that could otherwise bias the results due to selection bias. For example, if you wanted to see if there’s a link between candy and cavities, you might compare those who get cavities (cases) to those who don’t (controls) and see whether those who get cavities eat more candy. But you would want to make sure both groups have similar brushing habits and dental insurance, otherwise the candy might be unfairly blamed if in fact the difference in cavity levels is entirely due to poor brushing habits.
A case-control study allows you to compare the rates of exposure to a possibly causal factor for autism; for example, one can explore whether kids who have autism are more likely to have been vaccinated than kids who do not have autism. But it cannot tell you how prevalent autism is among people who get vaccinations. That’s because you start by having cases, rather than observing how those cases might come about in a large population. The case-control design is especially useful if only a small percentage of the population could be designated as a “case”. According to the CDC, approximately 1 to 2 percent of all U.S. children have an autism spectrum disorder.
Yet Hooker did something funky with the data: he pulled out the African American kids, and did an analysis on them. One problem with pulling out a subgroup from a case-control study is that race was not directly controlled for in the DeStefano study. DeStefano included 624 kids with ASD, and 1824 controls, matched by their school, their birth year, and their sex.
The first question one might ask is whether the African American controls were still controls – it would be easy to accidentally introduce bias into the controls since they are no longer truly controls for the sample of cases chosen.
Secondly, instead of emulating the case-control design of DeStefano et al., he compares rates of autism among kids who have had (or not had) vaccines at certain ages. He compared the rate of autism among kids who had their first MMR before 18 months, to the rate among kids who had their first MMR after 18 months. Then he did the same analysis for 24 months and 36 months. (Well, at least in one section of his paper, he seems to believe he’s done this; in another section, he implies he is looking at vaccination rates among autistic versus non-autistic kids, which is the best he could do with his data.)
What’s wrong with this idea of taking case-control data and doing a different analysis? Again, the problem lies in how the data are selected. Hooker’s analysis would make sense if the data were obtained in the context of a cohort study. In a cohort study, one chooses first a population that has some exposure to a possibly dangerous substance (say, vaccination), and then one waits to see who develops the disease (in this case, ASD).
The essential part of a cohort study is that the control group should not be exposed to vaccination, yet should otherwise be similar. Again, the rate of autism among those not exposed to vaccination is recorded. Finally, one compares the rate of autism development in both groups. But the cases Hooker uses did not occur randomly among a group of people exposed to vaccination—they were chosen because they were cases. In particular, there was no attempt in the choice of cases to make them representative of all kids who get autism in a specific population. This means that they may—or may not—represent the kids who would have developed autism had a group of children (some vaccinated and some not) been observed over time.
Finally, there is something strange about how Hooker’s data emerge. Remember that Hooker mistakenly spoke about odds ratios as if they were relative risk when describing the results in the DeStefano et al. study. He reports having rerun those analyses and obtained the exact same numbers that were DeStefano’s odds ratios for the total group, with the exact same confidence intervals. It’s highly improbable that a relative risk analysis of the likelihood of autism given vaccine versus the likelihood if not given vaccine, ended up with the exact same proportion as the odds ratio of vaccination given autism compared to vaccinated given no autism, with the exact same confidence interval. That seems almost too good to be true, and it is. We’re left with wondering exactly what calculation is reported in Hooker’s paper when it comes to African American boys.
The Hooker paper shows the damage that can be done when a study is incorrectly designed to answer a question, and then it incorrectly analyses the data it produces. These flaws will be obvious to statisticians and to scientists who understand statistical analysis. The problem is how to undo the damage among a public that is skeptical of scientific authority, and is suspicious or even hostile toward vaccination.
Please note that this is a forum for statisticians and mathematicians to critically evaluate the design and statistical methods used in studies. The subjects (products, procedures, treatments, etc.) of the studies being evaluated are neither endorsed nor rejected by Sense About Science USA. We encourage readers to use these articles as a starting point to discuss better study design and statistical analysis. While we strive for factual accuracy in these posts, they should not be considered journalistic works, but rather pieces of academic writing.