Sufficiency and minimal sufficiency
Sufficient statistic
I find that in the notation of a statistic it is better to reflect the dependence on the argument. So I write for a statistic, where
is a sample, instead of a faceless
or
Definition 1. The statistic is called sufficient for the parameter
if the distribution of
conditional on
does not depend on
The main results on sufficiency and minimal sufficiency become transparent if we look at them from the point of view of Maximum Likelihood (ML) estimation.
Let be the joint density of the vector
, where
is a parameter (possibly a vector). The ML estimator is obtained by maximizing over
the function
with
fixed at the observed data. The estimator depends on the data and can be denoted
Fisher-Neyman theorem. is sufficient for
if and only if the joint density can be represented as
(1)
where, as the notation suggests, depends on
only through
and
does not depend on
Maximizing the left side of (1) is the same thing as maximizing because
does not depend on
But this means that
depends on
only through
A sufficient statistic is all you need to find the ML estimator. This interpretation is easier to understand than the definition of sufficiency.
Minimal sufficient statistic
Definition 2. A sufficient statistic is called minimal sufficient if for any other statistic
there exists a function
such that
A level set is a set of type for a constant
(which in general can be a constant vector). See the visualization of level sets. A level set is also called a preimage and denoted
When
is one-to-one the preimage contains just one point. When
is not one-to-one the preimage contains more than one point. The wider it is the less information about the sample carries the statistic (because many data sets are mapped to a single point and you cannot tell one data set from another by looking at the statistic value). In the definition of the minimal sufficient statistic we have
Since generally contains more than one point, this shows that the level sets of
are generally wider than those of
Since this is true for any
carries less information about
than any other statistic.
Definition 2 is an existence statement and is difficult to verify directly as there are words "for any" and "exists". Again it's better to relate it to ML estimation.
Suppose for two sets of data there is a positive number
such that
(2)
Maximizing the left side we get the estimator Maximizing
we get
Since
does not depend on
(2) tells us that
Thus, if two sets of data satisfy (2), the ML method cannot distinguish between
and
and supplies the same estimator. Let us call
indistinguishable if there is a positive number
such that (2) is true.
An equation means that
belong to the same level set.
Characterization of minimal sufficiency. A statistic is minimal sufficient if and only if its level sets coincide with sets of indistinguishable
The advantage of this formulation is that it relates a geometric notion of level sets to the ML estimator properties. The formulation in the guide by J. Abdey is:
A statistic is minimal sufficient if and only if the equality
is equivalent to (2).
Rewriting (2) as
(3)
we get a practical way of finding a minimal sufficient statistic: form the ratio on the left of (3) and find the sets along which the ratio does not depend on Those sets will be level sets of
Leave a Reply
You must be logged in to post a comment.