Search results
Results From The WOW.Com Content Network
In probability and statistics, the Hellinger distance (closely related to, although different from, the Bhattacharyya distance) is used to quantify the similarity between two probability distributions. It is a type of f-divergence. The Hellinger distance is defined in terms of the Hellinger integral, which was introduced by Ernst Hellinger in 1909.
Applying this theorem to KL-divergence yields the Donsker–Varadhan representation. Attempting to apply this theorem to the general α {\displaystyle \alpha } -divergence with α ∈ ( − ∞ , 0 ) ∪ ( 0 , 1 ) {\displaystyle \alpha \in (-\infty ,0)\cup (0,1)} does not yield a closed-form solution.
In order to find a distribution Q that is closest to P, we can minimize the KL divergence and compute an information projection. While it is a statistical distance , it is not a metric , the most familiar type of distance, but instead it is a divergence . [ 4 ]
In statistics, probability theory, and information theory, a statistical distance quantifies the distance between two statistical objects, which can be two random variables, or two probability distributions or samples, or the distance can be between an individual sample point and a population or a wider sample of points.
The total variation distance (or half the norm) arises as the optimal transportation cost, when the cost function is (,) =, that is, ‖ ‖ = (,) = {(): =, =} = [], where the expectation is taken with respect to the probability measure on the space where (,) lives, and the infimum is taken over all such with marginals and , respectively.
The only divergence for probabilities over a finite alphabet that is both an f-divergence and a Bregman divergence is the Kullback–Leibler divergence. [8] The squared Euclidean divergence is a Bregman divergence (corresponding to the function x 2 {\displaystyle x^{2}} ) but not an f -divergence.
Viewing the Kullback–Leibler divergence as a measure of distance, the I-projection is the "closest" distribution to q of all the distributions in P. The I-projection is useful in setting up information geometry , notably because of the following inequality, valid when P is convex: [ 1 ]
By Chentsov’s theorem, the Fisher information metric on statistical models is the only Riemannian metric (up to rescaling) that is invariant under sufficient statistics. [ 3 ] [ 4 ] It can also be understood to be the infinitesimal form of the relative entropy ( i.e. , the Kullback–Leibler divergence ); specifically, it is the Hessian of ...