We’ve talked about the philosophy of science, cognitive bias, cognitive dissonance and material-reductionism. All of these things impact what science is, how we ‘do’ science and how we interpret results.
Now I’d like to shift gears a bit and talk about randomized controlled trials (RCT). This is an important subject because RCTs have become the primary way we evaluate new pharmaceuticals and other medical interventions - including acupuncture.
First off, let’s define what we’re talking about. A randomized controlled trial is a scientific study methodology used primarily in the pharmaceutical industry. For some reason it has come to be considered the ‘gold standard’ when evaluating drug efficacy. Here’s how it works: a patient population is gathered. Typically these patients are experiencing a particular disease state. This patient population is randomly divided in to at least 2 groups: a treatment group and a placebo group. The treatment group receives the pharmaceutical being studied, the placebo group receives a sugar pill or some other treatment that appears to be the same as the treatment group but is actually an inert substance. The group receiving placebo is known as the control. Most of the time, these studies are blinded; typically double-blind. This means that neither the researchers nor the patients are supposed to know whether any given individual is receiving the real medication or the placebo.
To summarize: a randomized controlled trial takes a set of people, randomly divides them in to treatment and placebo groups and then administers the treatment such that no one involved knows whether or not they are administering or receiving the real thing.
Trials usually have a hard end date. After a pre-defined number of treatments or doses or days the study is halted and the data gathered is processed. Researchers are interested in several things:
How did the actual substance compare to placebo? We’ve figured out that any treatment, whether real or not, has as part of its outcome some element of placebo. The simple fact that a patient believes they’re being treated can often cause an improvement in their situation. When it comes to pharmaceuticals, we’re not interested in the placebo aspect of any improvement. The placebo control group gives us a statistical idea, for this particular group of people, to what extent just thinking they’re receiving treatment affected their condition. This amount of improvement is then ‘subtracted’ from the results of the treatment group so we can get an idea how effective the pharmaceutical is by itself.
What sort of side effects were experienced during the trial? This is an important piece of information. From the standpoint of medical ethics, potential side effects have to be disclosed to any patient who could wind up taking the drug.
What sort of adverse events were experienced during the trial? Also an important piece of information. Adverse events can be thought of as highly undesirable side-effects. They’re usually much more serious and relate directly to the safety of the drug under study.
In order for this information to be meaningful, studies need large numbers of people (the study ’n’), with the condition or health issue in question and represent a cross-section of the population in general. This is where we start to run in to trouble.
A large study size (bigger ’n’) will give us a more statistically significant outcome. However, that outcome relates more to the population level. In other words, the larger the study, the more statistically accurate the result but the less we can say about how any given individual may respond to that particular treatment. It’s kind of a ‘study uncertainty principle’.
Studies are also often not true cross-sections of a given population. For a variety of ethical reasons pregnancy will often disqualify a person from being a study participant. This means, for a wide range of drugs, we have no idea of safety or efficacy in pregnant women. Children and women who may be breast-feeding are also often excluded from studies for the same ethical reasons which gives rise to a similar problem.
Depending on geographic area, study demographics in terms of racial composition can end up skewed. We know, for example, that drugs for treating hypertension can have different outcomes which appear to depend somewhat on the race and age of the patient under treatment. It’s important to have as wide a population cross section as possible, including racial factors. Unless of course, the drug is targeted at a particular slice of the demographic and would likely never be used by another part of the demographic.
Real people with real health issues are often much more complicated than study participants. Most studies try to limit so-called ‘co-morbidities’. This means they want people in the trial with a particular health issue and no other complicating issues. This is not the majority of actual patients, though.
Study outcomes are often reported as drug ‘x’ reduces some health issue by ‘y%’ in an RCT with ’n’ participants with a ‘p’ value < 0.05. Let’s pick this apart.
“Drug ‘x’ reduces some health issue by ‘y%’”. Sounds impressive, especially in commercials. Problem is, it’s a population level statistic that doesn’t tell us much about how any individual patient might fare. Also, that ‘y%’ reduction is in a population with no, or at least minimal, concurrent health issues. Any given patient might not match that profile. They might be on other drugs which will speed up or slow down the rate at which the new drug is metabolized. There are a variety of factors that can come in to play which may result in any individual not getting the same result as the study group.
Let’s take the flu shot as an example. Randomized controlled trials claim the flu vaccine is about 68% effective (1). Sounds impressive. These studies typically expose study participants to the flu, vaccinate them against that flu strain and then see how many people actually develop the flu. Pretty straight forward. Here’s the problem: this is telling us how effective the vaccine is when we are deliberately exposed to a known strain and then vaccinated against that strain. This is not how things work in ‘the wild’. The question then becomes, what are my odds of getting the flu, under real world conditions, in the first place? Well, as it turns out, pretty low. On average an unvaccinated person has about a 7% chance of getting the flu (1). Your risk drops to 1.9% with vaccination or a real reduction of 5.1% (1). That’s not nearly as impressive as 68%. No wonder they don’t use that statistic in marketing.
The flu example above also highlights the difference, in a medical context, between two words that most folks (especially in the media) seem to think mean the same thing: efficacy and effectiveness. These two words, used in a medical context, DO NOT mean the same thing (2). Efficacy is how well does this treatment perform under controlled laboratory conditions (2). Effectiveness is how well does this treatment perform in the real world (2). The 68% number is the flu vaccine's efficacy NOT it's effectiveness
“In an RCT with ’n’ participants”. There are a few problems related to the study size or ’n’. First off, if it’s too small then the study isn’t really telling us much that is useful. How wide a net are you casting with an n = 20 or 30? Remember, these folks will be randomly divided in to at least 2 groups so we could end up with only 10 or so actually receiving treatment. We like to see large study sizes, but as I mentioned earlier this pulls us further and further away from how any given individual might react to the therapy. A study might start with a decent sized group but folks may drop out of the study prior to completion for a variety of reasons. It’s important to know how many dropped, why they dropped and what group they were in (control or treatment). Studies, particularly big Pharma studies, often leave this detail out.
“With a ‘p’ value < 0.05”. This is a bit of statistical information. The ‘p’ value is telling us how confident the researchers were that the observed effects represented a real effect of the treatment. In scientific terms, it’s the probability of finding the observed result when the null hypothesis of a study question is true (3). The null hypothesis of a study usually amounts to “no difference between the study groups” (3). The ‘p’ value has a lower limit for significance, usually it’s 0.05 or less. This means researchers were 95% sure there was a difference between study groups; or 95% sure the treatment did what they thought it would do. It’s an arbitrary limit and there have been arguments put forth that ‘p’ values should be lower before we can derive any meaning from them. The other thing to note here is that the ‘p’ value is more like a switch: either something meets the ‘p’ limit and is significant or it doesn’t meet the ‘p’ limit and is not significant. Some researchers have been hedging ‘p’ values by claiming “nearly significant” for ‘p’ values bigger than the current limit of 0.05. That’s not how things work (3).
And then we have a couple of biases that creep in to the picture: study and publication. These two subjects are worthy of entire blog entries in and of themselves. I’ll give a short explanation here and, if there’s interest, I’ll throw up another post later with more detail.
Bias in this context is defined as any tendency which prevents unprejudiced consideration of a question (4). In research, bias occurs when systematic error is introduced into sampling or testing by selecting or encouraging one outcome or answer over others (4). Because scientists are humans and all humans are subject to bias, this issue is always present. The question is: to what degree is bias present in any particular piece of research or publication?
And that’s about as far down the study/publication bias rabbit hole I’d like to go right now. This is a very complicated subject as there are lots of ways bias can creep in to study method, peer review and ultimate publication.
To wrap this up, I’d like to point out that doctors themselves often don’t find RCTs very useful. They’re not particularly interested in how a drug fares against placebo. They are, however, interested in how a new therapy compares to an existing therapy for a given condition. Unfortunately, there’s a limited slice of new drugs that are tested in this ‘head to head’ manner. Typically these are cancer therapies or other specific conditions where we encounter ethical issues by deliberately not treating a segment of the study population. For anyone interested in the MD perspective, I suggest Dr. Ben Goldacre’s TED talk (https://youtu.be/RKmxL8VYy0M).
Frost, J. (9 November 2016). How Effective Are Flu Shots? Retrieved from http://blog.minitab.com/blog/adventures-in-statistics-2/how-effective-are-flu-shots using the Wayback Machine (web.archive.org) Joi, P. (19 November 2020). What is the difference between efficacy and effectiveness? Retrieved from https://www.gavi.org/vaccineswork/what-difference-between-efficacy-and-effectiveness?gclid=CjwKCAiAqaWdBhAvEiwAGAQltpAQm0hJ_xXc4_GKePxC5Nxk21qK1-Zw8abtDpVo5R9-mDFHkCoNShoCvaAQAvD_BwE
P Values (2022). Retrieved from https://www.statsdirect.com/help/basics/p_values.htm
Pannucci, C.J., Wilkins, E.G. (1 August 2010). Identifying and Avoiding Bias in Reasearch. Plastic and Reconstructive Surgery. 126(2); 619-625.