Talk:Statistical distance
This article is rated Start-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | |||||||||||
|
The contents of the Statistically close page were merged into Statistical distance on 11 July 2023. For the contribution history and old versions of the redirected page, please see its history; for the discussion at that location, see its talk page. |
Comparison table
[edit]Would be nice to have a comparison table to see how far the distance or divergence measures are from being a metric. Please feel free to fill in missing (-) data, references being welcome too. I mostly copied data from the articles of these distances:
Divergence between distributions | Symmetric | Nonnegative | Triangle inequality | Identity of indiscernibles | Metric |
---|---|---|---|---|---|
Kullback–Leibler divergence | no | yes | no | yes | no |
Hellinger distance | yes | yes | yes | yes | yes |
Total variation distance of probability measures | yes | yes | yes | yes | yes |
Jensen–Shannon divergence | yes | yes | no | yes | no |
Jensen–Shannon distance | yes | yes | yes | yes | yes |
Lévy–Prokhorov metric | yes | yes | yes | yes | yes |
Bhattacharyya distance | yes | yes | no | yes | no |
Wasserstein metric | yes | yes | yes | yes | yes |
Divergence between a point and a distribution | Symmetric | Nonnegative | Triangle inequality | Identity of indiscernibles | Metric |
---|---|---|---|---|---|
Mahalanobis distance | - | - | - | - | - |
Olli Niemitalo (talk) 10:22, 3 December 2014 (UTC)
- That's nice. I've completed missing entries on the total variation distance. - Saibod (talk) 23:54, 3 March 2016 (UTC)
There is no mention of the statistical distance used in 100% of the crypto papers I've encountered. SD(u,v) = 1/2 ∑ |v_i - u_i| . Is there a reason for that, or is it just missing? For example "Three XOR-Lemmas -- An Exposition, Oded Goldreich" states it. "Randomness Extraction and Key Derivation Using the CBC, Cascade and HMAC Modes, Yevgeniy Dodis, Rosario Gennaro, Johan Hastad, Hugo Krawczyk4 and Tal Rabin" states the same definition. Those are the first two papers I checked.
David in oregon (talk) 05:08, 1 October 2016 (UTC)
Never mind. It was a special case of the total variation distance. David in oregon (talk) 23:27, 1 October 2016 (UTC)
Proposed merge of Statistically close into Statistical distance
[edit]Statistically close is a type of measurement within the wider statistical distance topic. Emir of Wikipedia (talk) 22:17, 6 May 2023 (UTC)
I agree. Thatsme314 (talk) 09:10, 5 June 2023 (UTC)
- Merger complete. Klbrain (talk) 05:20, 11 July 2023 (UTC)
Apparent contradiction with the "Distance" article
[edit]The name "statistical distance" lets the reader think that it is a mathematical distance. However, it is not always one since it does not satisfy the axioms of a distance. It would be nice to make that fact clearer by saying that typical statistical distances are not distances. It is only written that statistical distances are not metrics, which lets the reader believe that distances and metrics are not the same. 65.254.109.31 (talk) 15:05, 25 May 2023 (UTC)
- My view is that Statistical distance#Terminology gives readers a sufficient sense of the imprecision (or differences in definition used by different authors) of measures that use that term. However, please do add text to the article if you think that this will make that distinction clearer! Klbrain (talk) 05:24, 11 July 2023 (UTC)
Kullback–Leibler divergence can be infinite
[edit]The article is incorrect in classifying Kullback–Leibler as a divergence. Kullback–Leibler divergence can be infinite and hence be not a "divergence", which (per the article's definition) must be real valued. (Cf. "Lawvere metric space"s, which are basically divergences, except they allow and require the triangle inequality.) Maybe we should talk about "extended" divergences(/premetrics/prametrics/quasi-distances). Thatsme314 (talk) 09:41, 5 June 2023 (UTC)
- It is a divergence, and it cannot be infinite. It is just undefined for some inputs. Tercer (talk) 16:56, 5 June 2023 (UTC)
- No, you’re mistaken; the domain of a divergence must be, according to the article, a Cartesian product , where is the probability measures on the -algebra , unless you’re using the convention that denotes a partial function. Thatsme314 (talk) 00:39, 6 June 2023 (UTC)
- There's no such restriction. This article is not precise enough because frankly, this is just pedantry. The KL divergence can either be defined on the whole set of probability distributions by letting its image be the extended real line, or you restrict its domain to be those probability distributions for which the formula is defined. In either case, it is a divergence.
- Moreover, it is the prototypical example of a divergence, if anything went wrong the definition of divergence would changed to fit the KL divergence.
- In any case, Wikipedia is about sources, not the mathematical knowledge of the editors, so we are calling it a divergence here simply because the sources do so. Tercer (talk) 07:53, 6 June 2023 (UTC)
- No, you’re mistaken; the domain of a divergence must be, according to the article, a Cartesian product , where is the probability measures on the -algebra , unless you’re using the convention that denotes a partial function. Thatsme314 (talk) 00:39, 6 June 2023 (UTC)
Unclear terminology
[edit]The section Distances as metrics contains this sentence:
"Note that condition 1 and 2 together produce positive definiteness)".
But the link to a definition of positive definiteness does not give a definition.
Apparently whoever wrote that sentence was not aware that the linked article contains not one but two distinct definitions of a positive definite function.
That is simply bad writing if you either don't know what you're linking to, or don't care.
I hope someone familiar with this subject can fix this.