p = 0.06: So Close Yet So Far?

October 9, 2020 Michael Nguyen-Truong Early Career Research Community Statistics

The annual Summer Biomechanics, Bioengineering, and Biotransport (SB³C) conference, to be held at a ski resort in Vail, was unfortunately moved to a virtual platform due to the COVID-19 pandemic. On the message boards, a conference participant quipped that the shift to an online meeting was like being presented with a p-value (i.e. a probability indicator used to determine statistical significance) equal to 0.06. In a flood of replies, conference attendees and even the organizers—with much applause and laughter—agreed that this shift, despite us only being a virtual message away from any given participant, was painful because we were so close yet so far from one another. While that jest brought me a good chuckle, it was a stark reminder of my upsetting run-ins with p = 0.06.

After having collected my data from a spate of weeklong experiments involving long and late hours in lab, I was sitting at my desk ready to analyze them. The data, nested in uniform gray and white cells in an Excel spreadsheet, were neatly compiled and grouped. I did what I always do first, I went ahead and calculated the averages and standard errors of the two groups of data. The averages considerably differed—check. The errors were relatively small—check. Now in a good mood, I decided to plot my data into bar plots and saw the elevated bars of distinct heights each containing tight standard error bars—check. All that was left was to adorn the graph with an asterisk above one of the bars, like a blinking star atop a Christmas tree, to indicate the statistical significance between the two bars. Before proceeding however, I firstly needed to run statistical analysis (i.e. obtain a p-value) between the two groups, which was and always will be the most nerve-wracking part. As I was inputting the data into the statistical software, my palms got sweaty and my heart raced. I took a heaving sigh, clicked the Mann-Whitney U test from my test options (the two groups were not normally distributed), and said “here goes nothing” as I clicked enter.

The result—p = 0.06. My mouth dropped and I facepalmed at the sight of this abhorrent p-value. That was when I realized my data, although seemingly significantly different, was now reduced to a measly “trend.” In my biomedical engineering undergraduate and graduate training, I have been taught to regard p < 0.05 as the gold standard for determining statistical significance; a p-value slightly greater than 0.05 could be regarded as a trend. Therefore, that particular data, while valid and had potential to spark other research avenues, could not be submitted or published; if it were to be presented, it would need some significant or positive data as its counterpart. This data’s p-value, so close yet so far from being significant, was like being awarded Honorable Mention from the National Science Foundation Graduate Research Fellowship Program (similar to a “runner-up” or “finalist” distinction in any given competition).

From previous discussions with peers and mentors, I learned that there is always a push to publish positive or statistically significant findings compared to negative or non-statistically significant findings because it garners more excitement and is looked upon more favorably by reviewers. Of course, no one claims that negative or non-statistically significant work cannot be published, but it remains an unspoken belief that many scientists hold. Consequently, results like these are often doomed into the pile of “unpublished data,” never to be included in a table or graph and forever banished from the light of day (or in this case a manuscript).

However, findings like these are still of importance and have much potential and impact given the appropriate statistical test was used. Negative results stand to inform the scientific community about what has been attempted but failed. Non-statistically significant or similar data between groups is just as useful as knowing that two groups are statistically significant. Even knowing that a p-value is a trend (e.g., p = 0.06) may lead to further investigation or identification of putative—but currently unknown—mechanisms.

In the end, these reports would be a huge time, resource, and energy saver for all scientists—knowing what hasn’t worked or does not differ. Especially now during a global pandemic, time is of the essence and knowing what has failed would help researchers worldwide in narrowing promising vaccine and treatment candidates. The overarching effects are not only limited to slowed community research progress either. The hesitation with or rejection of manuscripts including this important data can adversely affect publication productivity and/or funding outcomes, which in turn can be a limiting factor in career advancement—especially in academia.

Our job as scientists is to report the true phenomena we observe from rigorous experimentation—free from emotion and bias. Thus, publishing non-statistically significant or negative data should become more acceptable given its (significant) value to the scientific community. Journals should be willing to devote a section to null but valid data and productivity assessments should be made holistically, not just based on what was successful, in order to more accurately gauge a researcher’s true output (e.g. collections in Nature, PLOS, etc). Empirical data doesn’t always have to be exciting for it to be useful: I used to fear it, but I now for one welcome—with open arms—the “boring” data.

Featured image is under Pexels License and free to use.