Mixture density Information & Mixture density Links at HealthHaven.com
advertise
add site
services
publishers
database
health videos
Bookmark and Share

search wiki for    ?
web dir firms image gallery news pdf wiki shop video 
about
toolbar
stats
live show
health store
more stuff
JOIN/LOGIN
Featured Results:
Bone Density DEXA * Diagnostic Centers Of America * Imaging Center of...
Bone Density DEXA * Diagnostic Centers Of America * Imaging Center of...
dcamedical.com
 Alara CrystalView R200 CR, bone density , mineral density
Alara CrystalView R200 CR, bone density, mineral density
fsimedicalimaging.net
 - ayurvedic recipes | Spice Mixture s...
- ayurvedic recipes | Spice Mixtures...
mapi.com
 

In statistics, a mixture density is a probability density function which is expressed as a convex combination of other probability density functions. It can be interpreted as a "first, pick among pdfs with some probability distribution, then pick from the chosen probability distribution to get a sample"; due to this interpretation as a two-step process, they are also called hierarchical models; see also graphical models and hierarchical Bayes models.

It is important to distinguish between a random variable whose density is the sum of a set of component densities (i.e a mixture) and a random variable whose value is the sum of the values of two or more random variables, in which case the distribution is given by the convolution operator.

Such distributions arise in many contexts in the literature and are often cited as a means to represent non-normality or facilitate the identification and/or characterisation of sub-populations or categories within empirical data. Finite mixtures are of use in this latter regard since they are amenable to the representation of systems displaying macro-scale heterogeneity.

Contents

[edit] Finite mixtures

Density of a mixture of three Gaussians (μ = 5, 10, 15, σ = 2) with equal weights

Given a set of probability density functions p_1(x),\dots,p_n(x), (called the mixture components) and weights w_1,\dots,w_n such that w_i \geq 0 and \sum w_i = 1, the sum (which is a convex combination):

 q(x) = \sum_{i=1}^n \, w_i \, p_i(x)

is called the mixture density. This type of mixture, being a finite sum, is called a finite mixture, and in applications, an unqualified reference to a "mixture density" usually means a finite mixture.

The mixture components are often not arbitrary probability density functions, but instead are members of a parametric family (such as normal distributions), with different values for a parameter or parameters, in which case one may write the sum as:

 q(x) = \sum_{i=1}^n \, w_i \, p(x,a_i)

for one parameter, or

 q(x) = \sum_{i=1}^n \, w_i \, p(x,a_i,b_i)

for two parameters, and so forth.

[edit] Infinite mixtures

More generally, one can take an infinite sum of components.

Consider a probability density function p(x,a) for a variable x, parameterized by a. That is, for each value of a in some set A, p(x,a) is a probability density function with respect to x. Given a probability density function w (meaning that w is nonnegative and integrates to 1), the function

 q(x) = \int_A \, w(a) \, p(x,a) \, da

is again a probability density function for x, called the mixture density defined by the mixture components p(x,a) and the weighting function w.

Note that this reduces to the case of a finite mixing if w has finite support, and that it is possible to have discrete but infinite mixtures, such as weights at the integers.

Continuous infinite mixtures often have other names, such as hierarchical models.

[edit] History

Mixture distributions and the problem of mixture decomposition, that is the identification of its constituent components and the parameters thereof, has been cited in the literature as far back as 1846 (Quetelet in McLaughlan [1], 2000) although common reference is made to the work of Karl Pearson (1894) as the first author to explicitly address the decomposition problem in characterising non-normal attributes of forehead to body length ratios in female shore crab populations. The motivation for this work was provided by the zoologist Walter Frank Raphael Weldon who had speculated in 1893 (in Tarter and Lock[2]) that asymmetry in the histogram of these ratios could signal evolutionary divergence. Pearson’s approach was to fit a univariate mixture of two normals to the data by choosing the five parameters of the mixture such that the empirical moments matched that of the model.

While his work was successful in identifying two potentially distinct sub-populations and in demonstrating the flexibility of mixtures as a moment matching tool, the formulation required the solution of a 9th degree (nonic) polynomial which at the time posed a significant computational challenge.

Subsequent works focused on addressing these problems, but it was not until the advent of the modern computer and the popularisation of Maximum Likelihood (ML) parameterisation techniques that research really took off [3]. Since that time there has been a vast body of research on the subject spanning areas such as Fisheries research, Agriculture, Botany, Economics, Medicine, Genetics, Psychology, Palaeontology, Electrophoresis, Finance, Sedimentology/Geology and Zoology (see Titterington [4] for an overview).

[edit] Properties

[edit] Convexity

A general linear combination of probability density functions is not necessarily a probability density, since it may be negative or it may integrate to something other than 1. However, a convex combination of probability density functions preserves both of these properties (non-negativity and integrating to 1), and thus mixture densities are themselves probability density functions.

[edit] Moments and Modes

Assuming suitable integrability criteria then for continuous pi and some function  H(\cdot)

 \mathbb{E}[H(x)] = \int_{-\infty}^{\infty} \sum_{i = 1}^n w_i p_i(x) H(x) dx = \sum_{i = 1}^n w_i \int_{-\infty}^{\infty}  p_i(x) H(x) dx  =  \sum_{i = 1}^n w_i \mathbb{E}_i[H(x)].

An equivalent relation also holds for discrete pi.

It is a trivial matter to note that the jth moment about zero (i.e. choosing H(x) = xj) is simply a weighted average of the jth moments of the components. Moments about the mean H(x) = (x − μ)j involve a binomial expansion

\mathbb{E}[(x - \mu)^j] = \sum_{i = 1}^n w_i \mathbb{E}_i[(x - \mu_i + \mu_i - \mu)^j]   = \sum_{i=1}^n \sum_{k=0}^j \left( \begin{array}{c} j \\ k \end{array} \right) (\mu_i - \mu)^{j-k} w_i \mathbb{E}_i[(x- \mu_i)^k]

where μi denotes the mean of the ith component. In case that we have a mixture of one-dimensional Gaussian (normal) distributions with weights wi, means μi and variances \sigma_i^2, the total mean and variance will be:
 \mathbb{E}[x] = \mu = \sum_{i = 1}^n w_i \mu_i
 \mathbb{E}[(x - \mu)^2] = \sigma^2 = \sum_{i = 1}^n w_i (\mu_i^2 + \sigma_i^2) - \mu^2

These relations highlight the potential of mixture distributions to display non-trivial higher-order moments such as skewness and kurtosis (fat tails) and multi-modality, even in the absence of such features within the components themselves. Marron and Wand (1992) give an illustrative account of the flexibility of this framework.

Unlike some mixture families, such as the exponential (see Frühwirth-Schnatter 2005 [5] , Ch.1) the question of modality within normal mixtures is a complex one. In all instances the former is unimodal. Conditions for the number of modes in a multivariate normal mixture are explored by Ray and Lindsay[6] extending the earlier work on univariate (Robertson and Fryer, 1969; Behboodian, 1970) and multivariate distributions (Carreira-Perpinan and Williams, 2003).

Here the problem of evaluation of modes of a n component mixture in a D dimensional space is reduced to identification of critical points (local minima, maxima and saddle points) on a manifold referred to as the ridgeline surface

 x^{*}(\alpha) = \left[ \sum_{i=1}^{n} \alpha_i \Sigma_i^{-1} \right]^{-1} \times \left[  \sum_{i=1}^{n}  \alpha_i \Sigma_i^{-1} \mu_i \right]^{-1}

where α belongs to the n − 1 dimensional unit simplex  \mathcal{S}_n  =   \{ \alpha \in \mathbb{R}^n: \alpha_i \in [0,1], \sum_{i=1}^n \alpha_i = 1 \} and  \Sigma_i \in \mathbb{R}^{D \times D}, \mu_i \in \mathbb{R}^D correspond to the covariance and mean of the ith component. Ray and Lindsay consider the case in which n − 1 < D showing a one-to-one correspondence of modes of the mixture and those on the elevation function h(α) = q(x * (α)) thus one may identify the modes by solving   \frac{d h(\alpha)}{d \alpha} = 0 with respect to α and determining the value x * (α).

Using graphical tools the potential multi-modality of n = {2,3} mixtures is demonstrated, in particular it is shown that the number of modes may exceed n and that the modes may not be coincident with the component means. For two components they develop a graphical tool for analysis by instead solving the aforementioned differential with respect to w1 and expressing the solutions as a function  \Pi(\alpha), \alpha \in [0,1] so that the number and location of modes for a given value of w1 corresponds to the number of intersections of the graph on the line Π(α) = w1. This in turn can be related to the number of oscillations of the graph and therefore to solutions of  \frac{d \Pi(\alpha)}{d \alpha} = 0 leading to an explicit solution for a two component homoscedastic mixture given by

1 − α(1 − α)dM12,Σ)2

where  d_M(\mu_1, \mu_2, \Sigma) = (\mu_2 - \mu_1)^{\top} \Sigma^{-1}(\mu_2 - \mu_1) is the Mahalanobis distance.

Since the above is quadratic it follows that in this instance there are at most two modes irrespective of the dimension or the weights.

[edit] Parameter Estimation and System Identification

A variety of approaches to the problem of mixture decomposition have been proposed, many of which focus on maximum likelihood methods such as expectation maximization (EM) or maximum a posteriori estimation (MAP). Generally these methods consider separately the question of parameter estimation and system identification, that is to say a distinction is made between the determination of the number and functional form of components within a mixture and the estimation of the corresponding parameter values. Some notable departures are the graphical methods as outlined in Tarter and Lock [2] and more recently minimum message length (MML) techniques such as Figueiredo and Jain [7] and to some extent the moment matching pattern analysis routines suggested by McWilliam and Loh (2009)[8].

[edit] EM and Maximum Likelihood

Seemingly the most popular technique used to determine the parameters of a mixture with an a priori given number of components. EM is of particular appeal for finite normal mixtures where closed-form expressions are possible such as in the following iterative algorithm by Dempster et al. (1977)

 w_s^{(j+1)} = \frac{1}{N} \sum_{t =1}^N h_s^{(j)}(t)
 \mu_s^{(j+1)}  =  \frac{\sum_{t =1}^N h_s^{(j)}(t) x^{(t)}}{\sum_{t =1}^N h_s^{(j)}(t)}
 \Sigma_s^{(j+1)}  =  \frac{\sum_{t =1}^N h_s^{(j)}(t) [x^{(t)}-\mu_s^{(j)}][x^{(t)}-\mu_s^{(j)}]^{\top}}{\sum_{t =1}^N h_s^{(j)}(t)}

with the posterior probabilities

 h_s^{(j)}(t) = \frac{w_s^{(j)} p_s(x^{(t)}; \mu_s^{(j)},\Sigma_s^{(j)}) }{ \sum_{i = 1}^n w_i^{(j)} p_i(x^{(t)}; \mu_i^{(j)}, \Sigma_i^{(j)})}.

Thus on the basis of the current estimate for the parameters, the conditional probability a given observation x(t) being generated from state s is determined for each  t = 1, \dots ,m  ; m being the sample size. The parameters are then updated such that the new component weights correspond to the average conditional probability and each component mean and covariance is the component specific weighted average of the mean and covariance of the entire sample.

Dempster also showed that each successive EM iteration will not decrease the likelihood, a property not shared by other gradient based maximization techniques. Moreover EM naturally embeds within it constraints on the probability vector, and for sufficiently large sample sizes positive definiteness of the covariance iterates. This is a key advantage since explicitly constrained methods incur extra computational costs to check and maintain appropriate values. Theoretically EM is a first-order algorithm and as such converges slowly to a fixed-point solution. Redner and Walker (1984) make this point arguing in favour of superlinear and second order Newton and quasi-Newton methods and reporting slow convergence in EM on the basis of their empirical tests. They do concede that convergence in likelihood was rapid even if convergence in the parameter values themselves was not. A discussion of the relative merits of EM and other algorithms vis-à-vis convergence can be found in Xu and Jordan [9].

Other common objections to the use of EM are that it has a propensity to spuriously identify local maximisers [1] as well as displaying sensitivity to initial values. One may address these problems by evaluating EM at several initial points in the parameter space, but this is computationally costly and other approaches such as the annealing EM method of Udea and Nakano (1998) (in which the initial components are essentially forced to overlap, providing a less heterogeneous basis for initial guesses) may be preferable.

Figueiredo and Jain [7] note that convergence to 'meaningless' parameter values obtained at the boundary (where regularity conditions breakdown, e.g. Ghosh and Sen (1985)) is frequently observed when the number of model components exceeds the optimal/true one. On this basis they suggest a unified approach to estimation and identification in which the initial n is chosen to greatly exceed the expected optimal value. Their optimization routine is constructed via a minimum message length (MML) criterion that effectively eliminates a candidate component if there is insufficient information to support it. In this way it is possible to systematize reductions in n and consider estimation and identification jointly.

[edit] Moment Matching

The method of moment matching is one of the oldest techniques for determining the mixture parameters dating back to Karl Pearson’s seminal work of 1894. In this approach the parameters of the mixture are determined such that the composite distribution has moments matching some given value. In many instances extraction of solutions to the moment equations (see previous section entitled Moments and Modes) may present non-trivial algebraic or computational problems. Moreover numerical analysis by Day [10] has indicated that such methods may be inefficient compared to EM. Nonetheless there has been renewed interest in this method e.g. Craigmile and Titterington (1998) and Wang[11].

McWilliam and Loh (2009) consider the characterisation of a hyper-cuboid normal mixture copula in large dimensional systems for which EM would be computationally prohibitive. Here a pattern analysis routine is used to generate multivariate tail-dependencies consistent with a set of univariate and (in some sense) bivariate moments. The performance of this method is then evaluated using equity log-return data with Kolmogorov-Smirnov test statistics suggesting a good descriptive fit.

[edit] Graphical Methods

Tarter and Lock [2] describe a graphical approach to mixture identification in which a kernel function is applied to an empirical frequency plot so to reduce intra-component variance. In this way one may more readily identify components having differing means. While this λ-method does not require prior knowledge of the number or functional form of the components its success does rely on the choice of the kernel parameters which to some extent implicitly embeds assumptions about the component structure.

[edit] Applications

Mixture densities express complex densities (mixture densities) in terms of simpler densities (the mixture components), and are used both because they provide a good model for certain data sets (where different subsets of the data exhibit different characteristics and can best be modeled separately), and because they can be more mathematically tractable, because the individual mixture components can be more easily studied than the overall mixture density.

Mixture densities can be used used to model a statistical population with subpopulations, where the mixture components are the densities on the subpopulations, and the weights are the proportion of each subpopulation in the overall population.

Mixture densities can also be used to model experimental error or contamination – one assumes that most of the samples measure the desired phenomenon,

Parametric statistics that assume no error often fail on such mixture densities – for example, statistics that assume normality often fail disastrously in the presence of even a few outliers – and instead one uses robust statistics.

[edit] References

  1. ^ a b G. J. McLaughlan (2000), Finite Mixture Models, Wiley 
  2. ^ a b c Michael E. Tarter (1993), Model Free Curve Estimation, Chapman and Hall 
  3. ^ G. J. McLaughlan (1988), Mixture Models: inference and applications to clustering, Dekker 
  4. ^ D. M. Titterington (1985), Statistical Analysis of Finite Mixture Distributions, Wiley 
  5. ^ Sylvia Frühwirth-Schnatter (2006), Finite Mixture and Markov Switching Models, Springer 
  6. ^ S. Ray and B. Lindsay (2005), "The topography of multivariate normal mixtures", The Annals of Statistics 33 (5): 2042–2065 
  7. ^ a b M. A. T. Figueiredo and A. K. Jain (2002), "Unsupervised Learning of Finite Mixture Models", IEEE Transactions on Pattern Analysis and Machine Intelligence 24: 381--396 
  8. ^ N. McWilliam, K. Loh (2008), Incorporating Multidimensional Tail-Dependencies in the Valuation of Credit Derivatives (Working Paper)  [1]
  9. ^ L. Xu and M. I. Jordan (1996), "On Convergence Properties of the EM Algorithm for Gaussian Mixtures", Neural Computation 8: 129--151 
  10. ^ N. E. Day (1969), "Estimating the components of a mixture of two normal distributions", Biometrika 56: 463--474 
  11. ^ J. Wang (2001), "Generating daily changes in market variables using a multivariate mixture of normal distributions", Proceedings of the 33rd winter conference on simulation,IEEE Computer Society: 283–289 

[edit] See also

[edit] Mixture

[edit] Hierarchical models




Product Results (view all...)

search wiki for    ?
web dir firms image gallery news pdf wiki shop video 



↑ top of page ↑about thumbshots