Jump to content

Talk:Misuse of p-values

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Semi-protected edit request on 4 September 2018

[edit]

References #4 and #5 are identical. Please edit the citation to reference #5 that follows the sentence "The p-value fallacy is a common misinterpretation of the p-value whereby a binary classification of hypotheses as true or false is made, based on whether or not the corresponding p-values are statistically significant." to refer to reference #4 instead. Amoriarty21 (talk) 23:11, 4 September 2018 (UTC)[reply]

 Done, thank you. Gulumeemee (talk) 04:11, 5 September 2018 (UTC)[reply]

P-value Fallacy Section Should Be Removed

[edit]

This issue was raised years ago, and it appears that the conclusion in this talk page was that "p value fallacy" is not a standard, consistently defined term. Apparently, a single user has been fighting for its conclusion in this article, but it seems to me that is not enough. Certainly giving "p value fallacy" an entire section in the article amounts to undue weight for a term that is hardly ever actually used in science or statistics--making the term's inclusion here misleading regarding what terminology is commonly used. Moreover, as was pointed out in an earlier discussion on this talk page, in the rare cases when the term "p value fallacy" is actually used, it isn't used consistently. Thus, including a section on the "p value fallacy" is not only unnecessary for understanding the topic of the article, but is also potentially confusing.164.67.15.175 (talk) 21:12, 24 September 2018 (UTC)[reply]

The top hit on GScholar - https://scholar.google.com/scholar?hl=en&as_sdt=0%2C39&q=p-value+fallacy - has over 1k citations, which I think makes the definition used there, at least, worth inclusion. — Charles Stewart (talk) 08:45, 25 September 2018 (UTC)[reply]

The "top hit on Google scholar" that you're referring to (which is actually an opinion piece) defines the p-value fallacy as "the mistaken idea that a single number can capture both the long-run outcomes of an experiment and the evidential meaning of a single result." That is NOT the definition given in this wiki article: "The p-value fallacy is a common misinterpretation of the p-value whereby a binary classification of hypotheses as true or false is made." Thus, the "top hit on Google scholar" actually illustrates the point that "p value fallacy" is an inconsistently defined and potentially confusing term. Furthermore, the "p value fallacy" (as defined in the "top hit on Google scholar") isn't even demonstrably a fallacy, though the authors may consider it so. Thus, including it in this wiki article amounts to POV, which is inappropriate. This is supposed to be an article about objective MISUNDERSTANDINGS, not about controversial opinions.23.242.198.189 (talk) 01:57, 26 September 2018 (UTC)[reply]

I agree that we should not be saying that there is a particular inference that is the p-value fallacy, but the fact that this term has some currency justifies a section by that name. What should go in that section is another matter. — Charles Stewart (talk) 07:16, 27 September 2018 (UTC)[reply]

That seems rather backward to me. It doesn't make sense to include a section just because we like the name of the section, without consideration for whether the content of the section is actually relevant to the topic of the article. Note also that the fact that a term or phrase has "some currency" is not enough to make that term merit a section in the article. People have come up with all sorts of terms, many of which have "some currency." That doesn't mean they all belong in an article on misunderstanding p-values.164.67.15.175 (talk) 00:04, 29 September 2018 (UTC)[reply]

Just checking in, I see that still no one has provided any counterargument in favor of keeping the content of the "p value fallacy" section. Please remove it.23.242.198.189 (talk) 01:23, 12 October 2018 (UTC)[reply]

The section belongs here. See refs in the section. Headbomb {t · c · p · b} 01:35, 12 October 2018 (UTC)[reply]

"See the refs" is not a legitimate argument--especially given that the refs were obviously already "seen" because they were addressed in this discussion.23.242.198.189 (talk) 07:22, 16 October 2018 (UTC)[reply]

Let's settle this once and for all, now that the inappropriately applied "semi-protected status" has been lifted. We can go through the section sentence-by-sentence and see that it is not valid.

Sentence 1: The p-value fallacy is a common misinterpretation of the p-value whereby a binary classification of hypotheses as true or false is made, based on whether or not the corresponding p-values are statistically significant.

The cited source for that sentence defining the p-value fallacy is A PAPER THAT DOES NOT EVEN CONTAIN THE TERM "P-VALUE FALLACY." So right of the bat, we can see there is something very wrong here.

Sentence 2: The term 'p-value fallacy' was coined in 1999 by Steven N. Goodman.

The "p-value fallacy" defined by Goodman in the cited article is NOT what is described in the preceding sentence (the "binary classification of hypotheses as true or false"). Instead, Goodman defines "p-value fallacy" as "the mistaken idea that a single number can capture both the long-run outcomes of an experiment andthe evidential meaning of a single result." In other words, Goodman is making a Bayesian critique of p-values. In fact, Goodman's paper is an OPINION PIECE that criticizes the use of "frequentist statistics" altogether! Goodman's opinion that using p-values in conventional frequentist null hypothesis testing is based on "fallacy" is just that--an opinion. It would be relevant in an article on controversies or debates about p-values, but this wiki article is supposed to be about MISUSES of p-values, SO including POV HERE directly contradictS wiki policy.

Sentence 3: This fallacy is contrary to the intent of the statisticians who originally supported the use of p-values in research.

This is more POV that cites the same Goodman article. Curiously, this sentence also cites a Sterne and Smith article (another opinion piece), which DOES NOT EVEN CONTAIN THE TERM "P-VALUE FALLACY."

Sentence 4: As described by Sterne and Smith, "An arbitrary division of results, into 'significant' or 'non-significant' according to the P value, was not the intention of the founders of statistical inference."

That may or may not be true. It doesn't actually matter, because again, that Sterne and Smith opinion piece DOES NOT EVEN CONTAIN THE TERM "P-VALUE FALLACY," and what Sterne and Smith are describing here does not appear to even be equivalent what Goodman defined as the p-value fallacy.

Sentence 5: In contrast, common interpretations of p-values discourage the ability to distinguish statistical results from scientific conclusions, and discourage the consideration of background knowledge such as previous experimental results.

This is POV again, that again cites the opinion piece by Goodman.

Sentence 6: It has been argued that the correct use of p-values is to guide behavior, not to classify results, that is, to inform a researcher's choice of which hypothesis to accept, not to provide an inference about which hypothesis is true.

This is POV yet again, that yet again cites the opinion piece by Goodman. At least here, the wording includes the phrase "It has been argued that..." to acknowledge the POV. It should be noted that in a ddition to citing the Goodman piece, the sentence also cites another article (one by Dixon). Dixon's article, in contrast to Goodman's, does in fact define the p-value fallacy similarly to how it is defined in Sentence 1. However, the fact is that the term SIMPLY HAS NOT CAUGHT ON. A Google scholar search shows that even the handful of articles that have cited some aspect or another of the Dixon paper have rarely (if ever) used the term "p-value fallacy." The same goes for articles that have cited the Goodman paper. In fact, if you search Google scholar for articles containing the phrase "p-value fallacy," in nearly every hit the phrase only appears in the reference section of the article (as part of a citation of the Goodman paper).

In summary, the "p-value fallacy" is: (a) not a term that is in common enough use to merit mention, (b) is a term that, even when it is used, is not used consistently, as this very wiki article illustrates, and (c) when used as the person who originally "coined" the term intended, is not even really a definitive fallacy and thus does not belong in this wiki article because it constitutes partisan Bayesian POV. It should also be noted that the problems with "p-value fallacy" section have been mentioned numerous times before in the past, going back years (search this talk page to see). It's time to put this silliness to bed once and for all. The section is unnecessary (because the term is fairly obscure), inappropriate (because it contains POV), and confusing (because it can't even agree with itself about the definition of the term it's talking about).

A final note: The main advocate for keeping the section has been the editor Headbomb, who showed similar resistance to removing the COMPLETELY INCORRECT section on the false discovery rate a while back (as shown in this talk page). When challenged to present an argument for keeping the "p-value fallacy" section (scroll up a few paragraphs), Headbomb said simply the following: "The section belongs here. See refs in the section." I hope that I have sufficiently demonstrated here that, after "seeing the refs," it is clearer than ever that the section does NOT belong here. — Preceding unsigned comment added by 23.242.198.189 (talk) 04:47, 8 September 2019 (UTC)[reply]

No link back to the p-value article

[edit]

Shouldn’t there be at least one? — Preceding unsigned comment added by 194.5.225.252 (talk) 16:01, 2 December 2019 (UTC)[reply]

There is a link at the start of the second sentence. Mgnbar (talk) 16:21, 2 December 2019 (UTC)[reply]
As Mgnbar noted, there IS in fact a link to the p-value article. Moreover, even if there weren't, that's the type of minor, noncontroversial edit that should simply be performed, without opening a new section of the talk page. 130.182.24.154 (talk) 18:51, 4 December 2019 (UTC)[reply]

Opposite sides of 0.05

[edit]

@23.242.198.189: You reverted my addition

  1. "Studies with p-values on opposite sides of 0.05 are not in conflict." "Studies statistically conflict only when the difference between their results is unlikely to have occurred by chance, corresponding to when their confidence intervals show little or no overlap".[1]

with the comment "Revered good faith edit. It isn't clear what "in conflict" means. This seems like a subjective thing, not an objective misconception." In this case "in conflict" means to disagree about the underlying reality or to contradict each other. As far as I understand it, it is not subjective at all, but maybe we can find a wording that is better. Do you (or anyone else) have an idea to more clearly express the misconception?Nuretok (talk) 14:24, 3 March 2021 (UTC)[reply]

The author of the linked article proposes that studies "statistically conflict" if the confidence intervals have "little to no overlap." But "statistically conflict" is not a standard term, and it isn't clear conceptually what it means. Moreover, the author's criteria for "statistical conflict" are subjective. For example, why "little to no overlap" and not simply "no overlap?" One could argue that if there is any overlap at all between two independent confidence intervals for the same parameter, then the two intervals are compatible. Or one could argue that every pair of independent confidence intervals are in conflict, even if they heavily overlap, because they do not have exactly the same upper and lower bounds. Or one could argue that two confidence intervals conflict only if they disagree about the direction of the effect. Or one could argue that the very idea of conflict between two confidence intervals is nonsensical, because if you conducted two independent studies to estimate the same parameter, then the correct thing to do would be to compute a single confidence interval using the data pooled from the two studies. In any case, I think the misconception the author is actually getting at is already listed in the wiki article: the misconception that "there is generally a scientific reason to consider results on opposite sides of [.05] as qualitatively different." I see no reason to add a separate misconception to the list that says basically the same thing while introducing the questionable and potentially confusing concept of "statistical conflict." 23.242.198.189 (talk) 10:16, 12 March 2021 (UTC)[reply]
Thank you for your explanation. I agree that it basically is a rewording of the sentence which is already on the list.Nuretok (talk) 19:28, 12 March 2021 (UTC)[reply]

References

  1. ^ Goodman, Steven (2008-07-01). "A Dirty Dozen: Twelve P-Value Misconceptions". Seminars in Hematology. Interpretation of Quantitative Research. 45 (3): 135–140. doi:10.1053/j.seminhematol.2008.04.003. ISSN 0037-1963.

The redirect P-hunting has been listed at redirects for discussion to determine whether its use and function meets the redirect guidelines. Readers of this page are welcome to comment on this redirect at Wikipedia:Redirects for discussion/Log/2024 April 21 § P-hunting until a consensus is reached. Utopes (talk / cont) 17:35, 21 April 2024 (UTC)[reply]