| Title: | Mixture-of-Experts Wishart Models for Covariance Data |
|---|---|
| Description: | Methods for maximum likelihood and Bayesian estimation for the Wishart mixture model and the mixture-of-experts Wishart (MoE-Wishart) model. The package provides four inference algorithms for these models, each implemented using the expectation–maximization (EM) algorithm for maximum likelihood estimation and a fully Bayesian approach via Gibbs-within-Metropolis–Hastings sampling. |
| Authors: | The Tien Mai [aut], Zhi Zhao [aut, cre] |
| Maintainer: | Zhi Zhao <[email protected]> |
| License: | GPL-3 |
| Version: | 1.2 |
| Built: | 2026-06-04 22:12:41 UTC |
| Source: | https://github.com/zhizuio/moewishart |
Compute AIC, BIC, and ICL for EM fits; and PSIS-LOO expected
log predictive density (elpd_loo) for Bayesian fits. Supports
mixturewishart (finite mixture) and moewishart (MoE with
covariates in gating).
computeIC(fit)computeIC(fit)
fit |
A fitted object returned by |
For EM fits:
AIC: ,
where is the number of free parameters and
is the maximized log-likelihood (last EM iteration).
BIC: .
ICL: ,
i.e., BIC plus the entropy term (classification likelihood
approximation).
Parameter counting :
For mixturewishart:
,
where are mixture weights, each has
free parameters, and adds 1
per component when estimated.
For moewishart:
,
where are gating regression coefficients
(last class is reference with zero column).
For Bayesian fits:
elpd_loo: computed via loo::loo using per-observation
log-likelihood draws. Requires fit$loglik_individual
as a matrix of size (draws by observations),
as produced by the provided samplers. Returns elpd_loo,
se_elpd_loo, p_loo, and looic.
Notes:
AIC/BIC/ICL are defined for MLE (EM) fits. For Bayesian fits, prefer elpd_loo (or WAIC) rather than AIC/BIC.
The entropy term in ICL uses EM responsibilities
(field tau for mixturewishart,
gamma for moewishart).
- For method="em": a list with fields AIC, BIC, ICL.
- For method="bayes": a list with fields ICL and elpd of class
'"loo"' as returned by [loo::loo()] that contains fields 'estimates',
'pointwise', 'diagnostics'
# Bayesian example (MoE) # simulate data set.seed(123) n <- 500 # subjects p <- 2 # True gating coefficients (last column zero) set.seed(123) Xq <- 3 K <- 3 betas <- matrix(runif(Xq * K, -2, 2), nrow = Xq, ncol = K) betas[, K] <- 0 dat <- simData(n, p, Xq = 3, K = 3, betas = betas, pis = c(0.35, 0.40, 0.25), nus = c(8, 16, 3) ) set.seed(123) fit <- moewishart( dat$S, X = cbind(1, dat$X), K = 3, mh_sigma = c(0.2, 0.1, 0.1), # RW-MH variances (length K) mh_beta = c(0.2, 0.2), # RW-MH variances (length K-1) niter = 100, burnin = 50 ) computeIC(fit)# Bayesian example (MoE) # simulate data set.seed(123) n <- 500 # subjects p <- 2 # True gating coefficients (last column zero) set.seed(123) Xq <- 3 K <- 3 betas <- matrix(runif(Xq * K, -2, 2), nrow = Xq, ncol = K) betas[, K] <- 0 dat <- simData(n, p, Xq = 3, K = 3, betas = betas, pis = c(0.35, 0.40, 0.25), nus = c(8, 16, 3) ) set.seed(123) fit <- moewishart( dat$S, X = cbind(1, dat$X), K = 3, mh_sigma = c(0.2, 0.1, 0.1), # RW-MH variances (length K) mh_beta = c(0.2, 0.2), # RW-MH variances (length K-1) niter = 100, burnin = 50 ) computeIC(fit)
The empirical per-drug dose-dose covariance from the Cancer Therapeutics Response Portal (CTRP v2, 2015) database https://portals.broadinstitute.org/ctrp. Based on a subset of n=374 drugs profiled at the same p=5 concentrations: 0.002, 0.016, 0.130, 1.000 and 8.300uM. Using replicate viability measurements (cell lines), we computed one p×p covariance matrix S(d) per drug, yielding n= 374 matrices.
CTRPCTRP
An object of class list of length 374.
# Load the covariance data from the CTRP dataset data("CTRP", package = "moewishart") head(CTRP)# Load the covariance data from the CTRP dataset data("CTRP", package = "moewishart") head(CTRP)
Compute the (log) density of a -dimensional Wishart distribution
at an SPD matrix . Returns either the
log-density or the density depending on logarithm.
dWishart(S, nu, Sigma, detS_val = NULL, logarithm = TRUE)dWishart(S, nu, Sigma, detS_val = NULL, logarithm = TRUE)
S |
Numeric |
nu |
Numeric. Degrees of freedom |
Sigma |
Numeric |
detS_val |
Optional numeric. Precomputed |
logarithm |
Logical. If |
Let with degrees of freedom
and scale matrix (SPD). The density is:
where is the multivariate gamma function and
is the dimension.
Note that
(i) detS_val can be supplied to avoid recomputing ,
which is useful inside EM/MCMC loops, and
(ii) small diagonal jitter is added internally to and
when computing determinants or solves for numerical stability.
Constraints: (i) and must be SPD, and (ii) the Wishart
requires .
A numeric scalar: the log-density if logarithm = TRUE,
otherwise the density.
set.seed(123) p <- 3 # Construct an SPD Sigma A <- matrix(rnorm(p * p), p, p) Sigma <- crossprod(A) + diag(p) * 0.5 # Draw a Wishart matrix using base stats::rWishart() W <- drop(rWishart(1, df = p + 5, Sigma = Sigma)) # Evaluate log-density at W dWishart(W, nu = p + 5, Sigma = Sigma, logarithm = TRUE)set.seed(123) p <- 3 # Construct an SPD Sigma A <- matrix(rnorm(p * p), p, p) Sigma <- crossprod(A) + diag(p) * 0.5 # Draw a Wishart matrix using base stats::rWishart() W <- drop(rWishart(1, df = p + 5, Sigma = Sigma)) # Evaluate log-density at W dWishart(W, nu = p + 5, Sigma = Sigma, logarithm = TRUE)
Compute the log of the multivariate gamma function
for dimension and parameter .
lmvgamma(a, p)lmvgamma(a, p)
a |
Numeric. Argument of |
p |
Integer. Dimension |
The multivariate gamma function is defined by:
Constraints: (i) (positive integer), and
(ii) to keep all gamma terms finite (as in the Wishart
normalization constant).
A numeric scalar equal to .
# Dimension p <- 3 # Evaluate log multivariate gamma at a = nu/2 nu <- p + 5 lmvgamma(a = nu / 2, p = p)# Dimension p <- 3 # Evaluate log multivariate gamma at a = nu/2 nu <- p + 5 lmvgamma(a = nu / 2, p = p)
Fit finite mixtures of Wishart-distributed SPD matrices using either a
Bayesian sampler or the EM algorithm. The input S_list is a list
of SPD matrices. Under component ,
with degrees of freedom
and SPD scale matrix . Mixture weights
sum to .
mixturewishart( S_list, K, niter = 3000, burnin = 1000, method = "bayes", thin = 1, alpha = NULL, nu0 = NULL, Psi0 = NULL, init_pi = NULL, init_nu = NULL, init_Sigma = NULL, marginal.z = TRUE, estimate_nu = TRUE, nu_prior_a = 2, nu_prior_b = 0.1, mh_sigma = 1, n_restarts = 3, restart_iters = 20, tol = 1e-06, verbose = TRUE )mixturewishart( S_list, K, niter = 3000, burnin = 1000, method = "bayes", thin = 1, alpha = NULL, nu0 = NULL, Psi0 = NULL, init_pi = NULL, init_nu = NULL, init_Sigma = NULL, marginal.z = TRUE, estimate_nu = TRUE, nu_prior_a = 2, nu_prior_b = 0.1, mh_sigma = 1, n_restarts = 3, restart_iters = 20, tol = 1e-06, verbose = TRUE )
S_list |
List of length |
K |
Integer. Number of mixture components. |
niter |
Integer. Total iterations. Bayesian mode: total MCMC
iterations (including burn-in). EM mode: maximum EM iterations
(alias to |
burnin |
Integer. Number of burn-in iterations (Bayesian mode). |
method |
Character; one of |
thin |
Integer. Thinning interval for saving draws (Bayesian). |
alpha |
Numeric vector length |
nu0 |
Numeric. Inverse-Wishart prior df for |
Psi0 |
Numeric |
init_pi |
Optional numeric vector length |
init_nu |
Optional numeric vector length |
init_Sigma |
Optional list of |
marginal.z |
Logical. If |
estimate_nu |
Logical. If |
nu_prior_a |
Numeric. Prior hyperparameter |
nu_prior_b |
Numeric. Prior hyperparameter |
mh_sigma |
Numeric scalar or length- |
n_restarts |
Integer. Number of random restarts for EM. Ignored in Bayesian mode. |
restart_iters |
Integer. Number of short EM iterations per restart used to select a good initialization. Ignored in Bayesian mode. |
tol |
Numeric. Convergence tolerance on absolute change of log-likelihood (EM), also used internally elsewhere. |
verbose |
Logical. If |
Mixture mixture model:
.
Algorithms:
method = "bayes": Samples latent labels , weights
, component scales , and optionally
. Uses a Dirichlet prior for , inverse-
Wishart prior for , and a prior on
when estimate_nu = TRUE. Degrees-of-freedom are updated
via MH on with proposal sd mh_sigma.
Can integrate out when sampling if
marginal.z = TRUE.
method = "em": Maximizes the observed-data log-
likelihood via EM. The E-step computes responsibilities via
Wishart log-densities. The M-step updates and
; optionally updates when
estimate_nu = TRUE. Supports multiple random restarts.
Note that
(i) All matrices in S_list must be SPD. Small ridge terms may be
added internally for stability, and
(ii) Multiple EM restarts are recommended for robustness on difficult datasets.
A list whose structure depends on method:
For method = "bayes":
pi_ik: array (nsave x n x K),
saved per-observation weights.
pi: matrix (nsave x K), saved mixture
proportions.
nu: matrix (nsave x K), saved degrees-
of-freedom.
Sigma: list of length nsave; each is an
array () of draws.
z: matrix (nsave x n), saved
allocations.
sigma_posterior_mean: array (), posterior mean of .
loglik: numeric vector (length niter), log-
likelihood trace.
loglik_individual: matrix (niter x
n), per-observation log-likelihood.
For method = "em":
pi: numeric vector length , mixture
proportions.
Sigma: list length K, each a
SPD matrix.
nu: numeric vector length K, degrees-of-
freedom.
tau: matrix (), responsibilities.
loglik: numeric vector, log-likelihood per EM
iteration.
iterations: integer, number of EM iterations
performed.
# simulate data set.seed(123) n <- 500 # subjects p <- 2 dat <- simData(n, p, K = 3, pis = c(0.35, 0.40, 0.25), nus = c(8, 16, 3) ) set.seed(123) fit <- mixturewishart( dat$S, K = 3, mh_sigma = c(0.2, 0.1, 0.1), # tune this for MH acceptance 20-40% niter = 100, burnin = 50 ) # Posterior means for degrees of freedom of Wishart distributions: nu_mcmc <- fit$nu[-c(1:fit$burnin), ] colMeans(nu_mcmc)# simulate data set.seed(123) n <- 500 # subjects p <- 2 dat <- simData(n, p, K = 3, pis = c(0.35, 0.40, 0.25), nus = c(8, 16, 3) ) set.seed(123) fit <- mixturewishart( dat$S, K = 3, mh_sigma = c(0.2, 0.1, 0.1), # tune this for MH acceptance 20-40% niter = 100, burnin = 50 ) # Posterior means for degrees of freedom of Wishart distributions: nu_mcmc <- fit$nu[-c(1:fit$burnin), ] colMeans(nu_mcmc)
Fit a mixture-of-experts model for symmetric positive-definite (SPD) matrices with covariate-dependent mixing proportions (gating network). Components are Wishart-distributed. Supports Bayesian sampling and EM-based maximum-likelihood estimation.
moewishart( S_list, X, K, niter = 3000, burnin = 1000, method = "bayes", thin = 1, nu0 = NULL, Psi0 = NULL, init_nu = NULL, estimate_nu = TRUE, nu_prior_a = 2, nu_prior_b = 0.1, mh_sigma = 0.1, mh_beta = 0.05, sigma_beta = 10, init = NULL, tol = 1e-06, ridge = 1e-08, verbose = TRUE )moewishart( S_list, X, K, niter = 3000, burnin = 1000, method = "bayes", thin = 1, nu0 = NULL, Psi0 = NULL, init_nu = NULL, estimate_nu = TRUE, nu_prior_a = 2, nu_prior_b = 0.1, mh_sigma = 0.1, mh_beta = 0.05, sigma_beta = 10, init = NULL, tol = 1e-06, ridge = 1e-08, verbose = TRUE )
S_list |
List of length |
X |
Numeric matrix |
K |
Integer. Number of mixture components (experts). |
niter |
Integer. Total iterations. Bayesian mode: total MCMC iterations (including burn-in). EM mode: maximum EM iterations. |
burnin |
Integer. Number of burn-in iterations (Bayesian mode). |
method |
Character; one of |
thin |
Integer. Thinning interval for saving draws (Bayesian). |
nu0 |
Numeric. Inverse-Wishart prior df for |
Psi0 |
Numeric |
init_nu |
Optional numeric vector length |
estimate_nu |
Logical. If |
nu_prior_a |
Numeric. Prior hyperparameter |
nu_prior_b |
Numeric. Prior hyperparameter |
mh_sigma |
Numeric scalar or length- |
mh_beta |
Numeric scalar or length- |
sigma_beta |
Numeric. Prior sd of the Gaussian prior on |
init |
Optional list with fields for EM initialization, e.g.,
|
tol |
Numeric. Convergence tolerance on absolute change of log-likelihood (EM), also used internally. |
ridge |
Numeric. Small diagonal ridge added to |
verbose |
Logical. If |
MoE-Wishart Model:
Observation: is a SPD matrix. Given
allocation , with df and scale .
Gating (MoE): Let be -dimensional covariates.
Mixing weights follow a
softmax regression:
,
where , is
. Identifiability: last column of
is fixed to zero.
Algorithms:
Bayesian (method = "bayes"): Metropolis-within-Gibbs
sampler for , , optional , and
. Gaussian priors on with sd
sigma_beta. Proposals use mh_sigma for
and mh_beta for .
EM (method = "em"): E-step responsibilities using
Wishart log-densities and softmax gating. M-step updates
, optional , and via
weighted multinomial logistic regression (BFGS).
Note that:
(i) include an intercept column in X; none is added by default, and
(ii) all S_list elements must be SPD. A small ridge may be
added for stability.
A list whose fields depend on method:
For method = "bayes":
Beta_samples: array (nsave x q x
K), saved draws of (last column zero).
nu: matrix (nsave x K), draws
of .
Sigma: list of length nsave; each
element is an array () of
draws.
z_samples: matrix (nsave x n), draws
of allocations.
pi_ik: array (nsave x n x K),
per-observation gating probabilities.
pi_mean: matrix (n x K), posterior
mean of gating probabilities.
loglik: numeric vector (length niter),
log-likelihood trace.
loglik_individual: matrix (niter x
n), per-observation log-likelihood.
For method = "em":
K, p, q, n: problem dimensions.
Beta: matrix (), gating coefficients
with last column zero (reference class).
Sigma: list length K, each a
SPD matrix (scale).
nu: numeric vector length K, degrees of
freedom.
gamma: matrix (), final
responsibilities.
loglik: numeric vector, log-likelihood by EM
iteration.
iter: integer, number of EM iterations performed.
# simulate data set.seed(123) n <- 500 # subjects p <- 2 # True gating coefficients (last column zero) set.seed(123) Xq <- 3 K <- 3 betas <- matrix(runif(Xq * K, -2, 2), nrow = Xq, ncol = K) betas[, K] <- 0 dat <- simData(n, p, Xq = 3, K = 3, betas = betas, pis = c(0.35, 0.40, 0.25), nus = c(8, 16, 3) ) set.seed(123) fit <- moewishart( dat$S, X = cbind(1, dat$X), K = 3, mh_sigma = c(0.2, 0.1, 0.1), # RW-MH variances (length K) mh_beta = c(0.2, 0.2), # RW-MH variances (length K-1) niter = 500, burnin = 200 ) # Posterior means for degrees of freedom of Wishart distributions: nu_mcmc <- fit$nu[-c(1:fit$burnin), ] colMeans(nu_mcmc)# simulate data set.seed(123) n <- 500 # subjects p <- 2 # True gating coefficients (last column zero) set.seed(123) Xq <- 3 K <- 3 betas <- matrix(runif(Xq * K, -2, 2), nrow = Xq, ncol = K) betas[, K] <- 0 dat <- simData(n, p, Xq = 3, K = 3, betas = betas, pis = c(0.35, 0.40, 0.25), nus = c(8, 16, 3) ) set.seed(123) fit <- moewishart( dat$S, X = cbind(1, dat$X), K = 3, mh_sigma = c(0.2, 0.1, 0.1), # RW-MH variances (length K) mh_beta = c(0.2, 0.2), # RW-MH variances (length K-1) niter = 500, burnin = 200 ) # Posterior means for degrees of freedom of Wishart distributions: nu_mcmc <- fit$nu[-c(1:fit$burnin), ] colMeans(nu_mcmc)
Draw MCMC trace plot of parameters
plotMCMC(object, estimator = "nu", coeff.idx = 2)plotMCMC(object, estimator = "nu", coeff.idx = 2)
object |
returned object from Bayesian MoE or mixture model |
estimator |
specified parameter for traceplot |
coeff.idx |
choice of one covariate to show its coefficients in clusters |
A trace plot
# simulate data set.seed(123) n <- 500 # subjects p <- 2 # True gating coefficients (last column zero) set.seed(123) Xq <- 3 K <- 3 betas <- matrix(runif(Xq * K, -2, 2), nrow = Xq, ncol = K) betas[, K] <- 0 dat <- simData(n, p, Xq = 3, K = 3, betas = betas, pis = c(0.35, 0.40, 0.25), nus = c(8, 16, 3) ) set.seed(123) fit <- moewishart( dat$S, X = cbind(1, dat$X), K = 3, mh_sigma = c(0.2, 0.1, 0.1), # RW-MH variances (length K) mh_beta = c(0.2, 0.2), # RW-MH variances (length K-1) niter = 500, burnin = 200 ) plotMCMC(fit, estimator = "nu")# simulate data set.seed(123) n <- 500 # subjects p <- 2 # True gating coefficients (last column zero) set.seed(123) Xq <- 3 K <- 3 betas <- matrix(runif(Xq * K, -2, 2), nrow = Xq, ncol = K) betas[, K] <- 0 dat <- simData(n, p, Xq = 3, K = 3, betas = betas, pis = c(0.35, 0.40, 0.25), nus = c(8, 16, 3) ) set.seed(123) fit <- moewishart( dat$S, X = cbind(1, dat$X), K = 3, mh_sigma = c(0.2, 0.1, 0.1), # RW-MH variances (length K) mh_beta = c(0.2, 0.2), # RW-MH variances (length K-1) niter = 500, burnin = 200 ) plotMCMC(fit, estimator = "nu")
Generate random draws from a Dirichlet distribution with parameter
vector . Each draw is a length-
probability vector on the simplex.
rdirichlet(n, alpha)rdirichlet(n, alpha)
n |
Integer. Number of independent Dirichlet draws to generate. |
alpha |
Numeric vector of positive concentration parameters
|
Definition:
If independently for
(shape , rate 1), then the
normalized vector follows
.
Note that
alpha must be a numeric vector with strictly positive entries.
A numeric matrix of size , where each row is
an independent Dirichlet draw that sums to .
set.seed(123) # 3-dimensional Dirichlet with asymmetric concentration alpha <- c(2, 5, 3) rdirichlet(5, alpha)set.seed(123) # 3-dimensional Dirichlet with asymmetric concentration alpha <- c(2, 5, 3) rdirichlet(5, alpha)
Draw a random sample from an inverse-Wishart distribution
using the identity
iff
. This implementation accepts
directly for speed.
sampleIW(nu, Psi_inv)sampleIW(nu, Psi_inv)
nu |
Numeric. Degrees of freedom |
Psi_inv |
Numeric |
Sampling scheme:
Sample using rWishart.
Return , which has
.
Parameterization:
is the degrees of freedom, must satisfy
.
is the SPD scale matrix of the inverse-Wishart.
This function expects its inverse as input
for efficiency (avoids repeated matrix inversions).
Note that:
(i) internally calls rWishart(1, df = nu, Sigma = Psi_inv), and
(ii) returns solve(W); if numerical issues arise, consider
adding a small ridge to prior to sampling.
A SPD matrix distributed as
.
set.seed(123) p <- 3 # Construct an SPD scale matrix Psi A <- matrix(rnorm(p * p), p, p) Psi <- crossprod(A) + diag(p) * 0.5 Psi_inv <- solve(Psi) # Draw one inverse-Wishart sample S <- sampleIW(nu = p + 5, Psi_inv = Psi_inv) Sset.seed(123) p <- 3 # Construct an SPD scale matrix Psi A <- matrix(rnorm(p * p), p, p) Psi <- crossprod(A) + diag(p) * 0.5 Psi_inv <- solve(Psi) # Draw one inverse-Wishart sample S <- sampleIW(nu = p + 5, Psi_inv = Psi_inv) S
Generate synthetic SPD matrices from either: (i) a finite mixture of Wishart components with fixed mixing proportions, or (ii) a mixture-of-experts (MoE) where mixing proportions depend on covariates via a softmax gating model.
simData( n = 200, p = 2, Xq = 0, K = NA, betas = NULL, pis = c(0.4, 0.6), nus = c(8, 12), Sigma = NULL )simData( n = 200, p = 2, Xq = 0, K = NA, betas = NULL, pis = c(0.4, 0.6), nus = c(8, 12), Sigma = NULL )
n |
Integer. Number of observations to simulate. |
p |
Integer. Dimension of the Wishart distribution (matrix size
|
Xq |
Integer. Number of covariates for the gating network
(MoE case). If |
K |
Integer. Number of latent components. Required when
|
betas |
Numeric matrix |
pis |
Numeric vector of length |
nus |
Numeric vector length |
Sigma |
Optional list length |
Models:
Fixed mixture (no covariates, Xq = 0):
, and
.
Mixture-of-experts (covariates, Xq > 0):
Let . The mixing weights are
given by softmax regression
. Labels are drawn from
and
.
Simulation steps:
Construct pis:
If Xq = 0, replicate the provided pis
over n rows.
If Xq > 0, generate X ~ N(0, I) and compute
softmax probabilities using betas (last column set
to zero by default identifiability).
If Sigma is not provided, create default
matrices (SPD) depending on K and p.
Sample labels .
Draw from via
rWishart.
Note that:
(i) in the MoE case, no intercept is automatically added to X.
Use Xq to include desired covariates; the default
betas is randomly generated with betas[, K] = 0, and
(ii) provided Sigma must be a list of SPD
matrices. Provided nus must exceed .
A list with the following elements:
S: list of length n of simulated SPD matrices
.
z: integer vector length n of component labels.
nu: numeric vector length of degrees of freedom.
pi: matrix of mixing probabilities
(rows sum to ).
Sigma_list: list length of the scale matrices
used for simulation.
X: matrix of covariates if
Xq > 0, otherwise NULL.
betas: the gating coefficient matrix used when
Xq > 0, otherwise NULL.
# simulate data from mixture model (no covariates) set.seed(123) n <- 200 # subjects p <- 10 dat <- simData(n, p, K = 3, pis = c(0.35, 0.40, 0.25), nus = c(8, 12, 3) ) str(dat)# simulate data from mixture model (no covariates) set.seed(123) n <- 200 # subjects p <- 10 dat <- simData(n, p, K = 3, pis = c(0.35, 0.40, 0.25), nus = c(8, 12, 3) ) str(dat)