Package 'moewishart'

Title: Mixture-of-Experts Wishart Models for Covariance Data
Description: Methods for maximum likelihood and Bayesian estimation for the Wishart mixture model and the mixture-of-experts Wishart (MoE-Wishart) model. The package provides four inference algorithms for these models, each implemented using the expectation–maximization (EM) algorithm for maximum likelihood estimation and a fully Bayesian approach via Gibbs-within-Metropolis–Hastings sampling.
Authors: The Tien Mai [aut], Zhi Zhao [aut, cre]
Maintainer: Zhi Zhao <[email protected]>
License: GPL-3
Version: 1.2
Built: 2026-06-04 22:12:41 UTC
Source: https://github.com/zhizuio/moewishart

Help Index


Information criteria for Wishart mixtures and MoE models

Description

Compute AIC, BIC, and ICL for EM fits; and PSIS-LOO expected log predictive density (elpd_loo) for Bayesian fits. Supports mixturewishart (finite mixture) and moewishart (MoE with covariates in gating).

Usage

computeIC(fit)

Arguments

fit

A fitted object returned by mixturewishart() or moewishart().

Details

For EM fits:

  • AIC: AIC=2k2\mathrm{AIC} = 2k - 2\ell, where kk is the number of free parameters and \ell is the maximized log-likelihood (last EM iteration).

  • BIC: BIC=klogn2\mathrm{BIC} = k \log n - 2\ell.

  • ICL: ICL=BIC+i=1nk=1Kτiklogτik\mathrm{ICL} = \mathrm{BIC} + \sum_{i=1}^n \sum_{k=1}^K \tau_{ik}\log \tau_{ik}, i.e., BIC plus the entropy term (classification likelihood approximation).

Parameter counting kk:

  • For mixturewishart: k=(K1)+Kp(p+1)2+KI[estimate ν]k = (K-1) + K \cdot \frac{p(p+1)}{2} + K \cdot \mathbb{I}[\text{estimate }\nu], where (K1)(K-1) are mixture weights, each Σk\Sigma_k has p(p+1)2\frac{p(p+1)}{2} free parameters, and νk\nu_k adds 1 per component when estimated.

  • For moewishart: k=q(K1)+Kp(p+1)2+KI[estimate ν]k = q\,(K-1) + K \cdot \frac{p(p+1)}{2} + K \cdot \mathbb{I}[\text{estimate }\nu], where q(K1)q\,(K-1) are gating regression coefficients (last class is reference with zero column).

For Bayesian fits:

  • elpd_loo: computed via loo::loo using per-observation log-likelihood draws. Requires fit$loglik_individual as a matrix of size S×nS \times n (draws by observations), as produced by the provided samplers. Returns elpd_loo, se_elpd_loo, p_loo, and looic.

Notes:

  • AIC/BIC/ICL are defined for MLE (EM) fits. For Bayesian fits, prefer elpd_loo (or WAIC) rather than AIC/BIC.

  • The entropy term in ICL uses EM responsibilities τik\tau_{ik} (field tau for mixturewishart, gamma for moewishart).

Value

- For method="em": a list with fields AIC, BIC, ICL. - For method="bayes": a list with fields ICL and elpd of class '"loo"' as returned by [loo::loo()] that contains fields 'estimates', 'pointwise', 'diagnostics'

Examples

# Bayesian example (MoE)

# simulate data
set.seed(123)
n <- 500 # subjects
p <- 2
# True gating coefficients (last column zero)
set.seed(123)
Xq <- 3
K <- 3
betas <- matrix(runif(Xq * K, -2, 2), nrow = Xq, ncol = K)
betas[, K] <- 0
dat <- simData(n, p,
  Xq = 3, K = 3, betas = betas,
  pis = c(0.35, 0.40, 0.25),
  nus = c(8, 16, 3)
)
set.seed(123)
fit <- moewishart(
  dat$S,
  X = cbind(1, dat$X), K = 3,
  mh_sigma = c(0.2, 0.1, 0.1), # RW-MH variances (length K)
  mh_beta = c(0.2, 0.2), # RW-MH variances (length K-1)
  niter = 100, burnin = 50
)
computeIC(fit)

CTRP drug response covariances data

Description

The empirical per-drug dose-dose covariance from the Cancer Therapeutics Response Portal (CTRP v2, 2015) database https://portals.broadinstitute.org/ctrp. Based on a subset of n=374 drugs profiled at the same p=5 concentrations: 0.002, 0.016, 0.130, 1.000 and 8.300uM. Using replicate viability measurements (cell lines), we computed one p×p covariance matrix S(d) per drug, yielding n= 374 matrices.

Usage

CTRP

Format

An object of class list of length 374.

Examples

# Load the covariance data from the CTRP dataset
data("CTRP", package = "moewishart")
head(CTRP)

density of Wishart distribution

Description

Compute the (log) density of a pp-dimensional Wishart distribution Wp(ν,Σ)W_p(\nu, \Sigma) at an SPD matrix SS. Returns either the log-density or the density depending on logarithm.

Usage

dWishart(S, nu, Sigma, detS_val = NULL, logarithm = TRUE)

Arguments

S

Numeric p×pp \times p SPD matrix at which to evaluate the density.

nu

Numeric. Degrees of freedom ν\nu (must exceed p1p-1).

Sigma

Numeric p×pp \times p SPD scale matrix Σ\Sigma.

detS_val

Optional numeric. Precomputed logS\log|S| to reuse; if NULL, it is computed internally.

logarithm

Logical. If TRUE, return log-density; otherwise return density.

Details

Let SWp(ν,Σ)S \sim W_p(\nu, \Sigma) with degrees of freedom ν\nu and scale matrix Σ\Sigma (SPD). The density is:

f(Sν,Σ)=S(νp1)/2exp{12tr(Σ1S)}2νp/2Σν/2Γp(ν/2),f(S \mid \nu, \Sigma) = \frac{|S|^{(\nu - p - 1)/2}\, \exp\{-\tfrac{1}{2}\,\mathrm{tr}(\Sigma^{-1}S)\}} {2^{\nu p/2}\,|\Sigma|^{\nu/2}\,\Gamma_p(\nu/2)},

where Γp()\Gamma_p(\cdot) is the multivariate gamma function and pp is the dimension.

Note that (i) detS_val can be supplied to avoid recomputing logS\log|S|, which is useful inside EM/MCMC loops, and (ii) small diagonal jitter is added internally to SS and Σ\Sigma when computing determinants or solves for numerical stability.

Constraints: (i) SS and Σ\Sigma must be SPD, and (ii) the Wishart requires ν>p1\nu > p - 1.

Value

A numeric scalar: the log-density if logarithm = TRUE, otherwise the density.

Examples

set.seed(123)
p <- 3
# Construct an SPD Sigma
A <- matrix(rnorm(p * p), p, p)
Sigma <- crossprod(A) + diag(p) * 0.5
# Draw a Wishart matrix using base stats::rWishart()
W <- drop(rWishart(1, df = p + 5, Sigma = Sigma))
# Evaluate log-density at W
dWishart(W, nu = p + 5, Sigma = Sigma, logarithm = TRUE)

Log multivariate gamma function

Description

Compute the log of the multivariate gamma function logΓp(a)\log \Gamma_p(a) for dimension pp and parameter aa.

Usage

lmvgamma(a, p)

Arguments

a

Numeric. Argument of Γp()\Gamma_p(\cdot) (often ν/2\nu/2 in Wishart contexts).

p

Integer. Dimension pp of the multivariate gamma.

Details

The multivariate gamma function Γp(a)\Gamma_p(a) is defined by:

Γp(a)=πp(p1)/4j=1pΓ ⁣(a+1j2).\Gamma_p(a) = \pi^{\,p(p-1)/4} \prod_{j=1}^{p} \Gamma\!\left(a + \frac{1-j}{2}\right).

Constraints: (i) p{1,2,}p \in \{1,2,\dots\} (positive integer), and (ii) a>(p1)/2a > (p-1)/2 to keep all gamma terms finite (as in the Wishart normalization constant).

Value

A numeric scalar equal to logΓp(a)\log \Gamma_p(a).

Examples

# Dimension
p <- 3
# Evaluate log multivariate gamma at a = nu/2
nu <- p + 5
lmvgamma(a = nu / 2, p = p)

EM/Bayesian estimation for Wishart mixture model

Description

Fit finite mixtures of Wishart-distributed SPD matrices using either a Bayesian sampler or the EM algorithm. The input S_list is a list of p×pp \times p SPD matrices. Under component kk, Sizi=kWp(νk,Σk)S_i \mid z_i=k \sim W_p(\nu_k, \Sigma_k) with degrees of freedom νk\nu_k and SPD scale matrix Σk\Sigma_k. Mixture weights πk\pi_k sum to 11.

Usage

mixturewishart(
  S_list,
  K,
  niter = 3000,
  burnin = 1000,
  method = "bayes",
  thin = 1,
  alpha = NULL,
  nu0 = NULL,
  Psi0 = NULL,
  init_pi = NULL,
  init_nu = NULL,
  init_Sigma = NULL,
  marginal.z = TRUE,
  estimate_nu = TRUE,
  nu_prior_a = 2,
  nu_prior_b = 0.1,
  mh_sigma = 1,
  n_restarts = 3,
  restart_iters = 20,
  tol = 1e-06,
  verbose = TRUE
)

Arguments

S_list

List of length nn of SPD matrices, each p×pp \times p. These are the observed matrices modeled by a mixture of Wisharts.

K

Integer. Number of mixture components.

niter

Integer. Total iterations. Bayesian mode: total MCMC iterations (including burn-in). EM mode: maximum EM iterations (alias to maxiter).

burnin

Integer. Number of burn-in iterations (Bayesian mode).

method

Character; one of c("bayes","em"). Selects sampler or optimizer.

thin

Integer. Thinning interval for saving draws (Bayesian).

alpha

Numeric vector length KK (Dirichlet prior on π\pi) or NULL to default to rep(1, K) (Bayesian).

nu0

Numeric. Inverse-Wishart prior df for Σk\Sigma_k (Bayesian). Default: p+2p + 2.

Psi0

Numeric p×pp \times p SPD matrix. Inverse-Wishart prior scale for Σk\Sigma_k (Bayesian). Default: diag(p).

init_pi

Optional numeric vector length KK summing to 11. EM initialization for mixture weights. If NULL, random or data-driven initialization is used.

init_nu

Optional numeric vector length KK of initial degrees of freedom. Used in both modes.

init_Sigma

Optional list of KK SPD matrices (each p×pp \times p). EM initialization for Σk\Sigma_k.

marginal.z

Logical. If TRUE, integrates out π\pi when sampling zz (collapsed step) in Bayesian mode. If FALSE, samples zz conditional on current π\pi.

estimate_nu

Logical. If TRUE, estimate/update νk\nu_k (MH in Bayesian mode; Newton/EM in EM). If FALSE, νk\nu_k are fixed.

nu_prior_a

Numeric. Prior hyperparameter aa for νk\nu_k (Bayesian), used when estimate_nu = TRUE.

nu_prior_b

Numeric. Prior hyperparameter bb for νk\nu_k (Bayesian), used when estimate_nu = TRUE.

mh_sigma

Numeric scalar or length-KK vector. Proposal sd for MH updates on log(νk)\log(\nu_k) (Bayesian, when estimating ν\nu).

n_restarts

Integer. Number of random restarts for EM. Ignored in Bayesian mode.

restart_iters

Integer. Number of short EM iterations per restart used to select a good initialization. Ignored in Bayesian mode.

tol

Numeric. Convergence tolerance on absolute change of log-likelihood (EM), also used internally elsewhere.

verbose

Logical. If TRUE, print progress information.

Details

Mixture mixture model: p(Si)=k=1KπkfW(Siνk,Σk)p(S_i) = \sum_{k=1}^K \pi_k \, f_W(S_i \mid \nu_k, \Sigma_k).

Algorithms:

  1. method = "bayes": Samples latent labels zz, weights π\pi, component scales Σk\Sigma_k, and optionally νk\nu_k. Uses a Dirichlet prior for π\pi, inverse- Wishart prior for Σk\Sigma_k, and a prior on νk\nu_k when estimate_nu = TRUE. Degrees-of-freedom are updated via MH on log(νk)\log(\nu_k) with proposal sd mh_sigma. Can integrate out π\pi when sampling zz if marginal.z = TRUE.

  2. method = "em": Maximizes the observed-data log- likelihood via EM. The E-step computes responsibilities via Wishart log-densities. The M-step updates πk\pi_k and Σk\Sigma_k; optionally updates νk\nu_k when estimate_nu = TRUE. Supports multiple random restarts.

Note that (i) All matrices in S_list must be SPD. Small ridge terms may be added internally for stability, and (ii) Multiple EM restarts are recommended for robustness on difficult datasets.

Value

A list whose structure depends on method:

  • For method = "bayes":

    • pi_ik: array (nsave x n x K), saved per-observation weights.

    • pi: matrix (nsave x K), saved mixture proportions.

    • nu: matrix (nsave x K), saved degrees- of-freedom.

    • Sigma: list of length nsave; each is an array (p×p×Kp \times p \times K) of Σk\Sigma_k draws.

    • z: matrix (nsave x n), saved allocations.

    • sigma_posterior_mean: array (p×p×Kp \times p \times K), posterior mean of Σk\Sigma_k.

    • loglik: numeric vector (length niter), log- likelihood trace.

    • loglik_individual: matrix (niter x n), per-observation log-likelihood.

  • For method = "em":

    • pi: numeric vector length KK, mixture proportions.

    • Sigma: list length K, each a p×pp \times p SPD matrix.

    • nu: numeric vector length K, degrees-of- freedom.

    • tau: matrix (n×Kn \times K), responsibilities.

    • loglik: numeric vector, log-likelihood per EM iteration.

    • iterations: integer, number of EM iterations performed.

Examples

# simulate data
set.seed(123)
n <- 500 # subjects
p <- 2
dat <- simData(n, p,
  K = 3,
  pis = c(0.35, 0.40, 0.25),
  nus = c(8, 16, 3)
)

set.seed(123)
fit <- mixturewishart(
  dat$S,
  K = 3,
  mh_sigma = c(0.2, 0.1, 0.1), # tune this for MH acceptance 20-40%
  niter = 100, burnin = 50
)

# Posterior means for degrees of freedom of Wishart distributions:
nu_mcmc <- fit$nu[-c(1:fit$burnin), ]
colMeans(nu_mcmc)

EM/Bayesian estimation for Wishart MoE model

Description

Fit a mixture-of-experts model for symmetric positive-definite (SPD) matrices with covariate-dependent mixing proportions (gating network). Components are Wishart-distributed. Supports Bayesian sampling and EM-based maximum-likelihood estimation.

Usage

moewishart(
  S_list,
  X,
  K,
  niter = 3000,
  burnin = 1000,
  method = "bayes",
  thin = 1,
  nu0 = NULL,
  Psi0 = NULL,
  init_nu = NULL,
  estimate_nu = TRUE,
  nu_prior_a = 2,
  nu_prior_b = 0.1,
  mh_sigma = 0.1,
  mh_beta = 0.05,
  sigma_beta = 10,
  init = NULL,
  tol = 1e-06,
  ridge = 1e-08,
  verbose = TRUE
)

Arguments

S_list

List of length nn of SPD matrices, each p×pp \times p. These are the observed responses modeled by the MoE.

X

Numeric matrix n×qn \times q of covariates for the gating network. Include an intercept column if desired.

K

Integer. Number of mixture components (experts).

niter

Integer. Total iterations. Bayesian mode: total MCMC iterations (including burn-in). EM mode: maximum EM iterations.

burnin

Integer. Number of burn-in iterations (Bayesian mode).

method

Character; one of c("bayes", "em"). Selects sampler or optimizer.

thin

Integer. Thinning interval for saving draws (Bayesian).

nu0

Numeric. Inverse-Wishart prior df for Σk\Sigma_k (Bayesian). Default: p+2p + 2 if NULL.

Psi0

Numeric p×pp \times p SPD matrix. Inverse-Wishart prior scale for Σk\Sigma_k (Bayesian). Default: diag(p) if NULL.

init_nu

Optional numeric vector length KK of initial dfs νk\nu_k. Used for initialization.

estimate_nu

Logical. If TRUE, estimate νk\nu_k (MH in Bayesian; Newton/EM in EM). If FALSE, keep νk\nu_k fixed at init_nu.

nu_prior_a

Numeric. Prior hyperparameter aa for νk\nu_k (Bayesian), used when estimate_nu = TRUE.

nu_prior_b

Numeric. Prior hyperparameter bb for νk\nu_k (Bayesian), used when estimate_nu = TRUE.

mh_sigma

Numeric scalar or length-KK vector. Proposal sd for MH updates on log(νk)\log(\nu_k) (Bayesian, when estimating ν\nu).

mh_beta

Numeric scalar or length-K1K-1 vector. Proposal sd for MH updates of the free BB columns (Bayesian).

sigma_beta

Numeric. Prior sd of the Gaussian prior on BB (Bayesian).

init

Optional list with fields for EM initialization, e.g., beta, Sigma, nu. See return structure.

tol

Numeric. Convergence tolerance on absolute change of log-likelihood (EM), also used internally.

ridge

Numeric. Small diagonal ridge added to Σk\Sigma_k updates in EM for numerical stability.

verbose

Logical. If TRUE, print progress information.

Details

MoE-Wishart Model:

  • Observation: SiS_i is a p×pp \times p SPD matrix. Given allocation zi=kz_i=k, SiziWp(νk,Σk)S_i \mid z_i \sim W_p(\nu_k, \Sigma_k) with df νk\nu_k and scale Σk\Sigma_k.

  • Gating (MoE): Let XiX_i be qq-dimensional covariates. Mixing weights πik=Pr(zi=kXi)\pi_{ik} = \Pr(z_i=k \mid X_i) follow a softmax regression: πik=exp(ηik)/j=1Kexp(ηij)\pi_{ik} = \exp(\eta_{ik})/\sum_{j=1}^K \exp(\eta_{ij}), where ηi=XiB\eta_i = X_i^\top B, BB is q×Kq \times K. Identifiability: last column of BB is fixed to zero.

Algorithms:

  1. Bayesian (method = "bayes"): Metropolis-within-Gibbs sampler for zz, Σk\Sigma_k, optional νk\nu_k, and BB. Gaussian priors on BB with sd sigma_beta. Proposals use mh_sigma for log(νk)\log(\nu_k) and mh_beta for BB.

  2. EM (method = "em"): E-step responsibilities using Wishart log-densities and softmax gating. M-step updates Σk\Sigma_k, optional νk\nu_k, and BB via weighted multinomial logistic regression (BFGS).

Note that: (i) include an intercept column in X; none is added by default, and (ii) all S_list elements must be SPD. A small ridge may be added for stability.

Value

A list whose fields depend on method:

  • For method = "bayes":

    • Beta_samples: array (nsave x q x K), saved draws of BB (last column zero).

    • nu: matrix (nsave x K), draws of νk\nu_k.

    • Sigma: list of length nsave; each element is an array (p×p×Kp \times p \times K) of Σk\Sigma_k draws.

    • z_samples: matrix (nsave x n), draws of allocations.

    • pi_ik: array (nsave x n x K), per-observation gating probabilities.

    • pi_mean: matrix (n x K), posterior mean of gating probabilities.

    • loglik: numeric vector (length niter), log-likelihood trace.

    • loglik_individual: matrix (niter x n), per-observation log-likelihood.

  • For method = "em":

    • K, p, q, n: problem dimensions.

    • Beta: matrix (q×Kq \times K), gating coefficients with last column zero (reference class).

    • Sigma: list length K, each a p×pp \times p SPD matrix (scale).

    • nu: numeric vector length K, degrees of freedom.

    • gamma: matrix (n×Kn \times K), final responsibilities.

    • loglik: numeric vector, log-likelihood by EM iteration.

    • iter: integer, number of EM iterations performed.

Examples

# simulate data
set.seed(123)
n <- 500 # subjects
p <- 2
# True gating coefficients (last column zero)
set.seed(123)
Xq <- 3
K <- 3
betas <- matrix(runif(Xq * K, -2, 2), nrow = Xq, ncol = K)
betas[, K] <- 0
dat <- simData(n, p,
  Xq = 3, K = 3, betas = betas,
  pis = c(0.35, 0.40, 0.25),
  nus = c(8, 16, 3)
)

set.seed(123)
fit <- moewishart(
  dat$S,
  X = cbind(1, dat$X), K = 3,
  mh_sigma = c(0.2, 0.1, 0.1), # RW-MH variances (length K)
  mh_beta = c(0.2, 0.2), # RW-MH variances (length K-1)
  niter = 500, burnin = 200
)

# Posterior means for degrees of freedom of Wishart distributions:
nu_mcmc <- fit$nu[-c(1:fit$burnin), ]
colMeans(nu_mcmc)

Trace plot of parameters

Description

Draw MCMC trace plot of parameters

Usage

plotMCMC(object, estimator = "nu", coeff.idx = 2)

Arguments

object

returned object from Bayesian MoE or mixture model

estimator

specified parameter for traceplot

coeff.idx

choice of one covariate to show its coefficients in clusters

Value

A trace plot

Examples

# simulate data
set.seed(123)
n <- 500 # subjects
p <- 2
# True gating coefficients (last column zero)
set.seed(123)
Xq <- 3
K <- 3
betas <- matrix(runif(Xq * K, -2, 2), nrow = Xq, ncol = K)
betas[, K] <- 0
dat <- simData(n, p,
  Xq = 3, K = 3, betas = betas,
  pis = c(0.35, 0.40, 0.25),
  nus = c(8, 16, 3)
)

set.seed(123)
fit <- moewishart(
  dat$S,
  X = cbind(1, dat$X), K = 3,
  mh_sigma = c(0.2, 0.1, 0.1), # RW-MH variances (length K)
  mh_beta = c(0.2, 0.2), # RW-MH variances (length K-1)
  niter = 500, burnin = 200
)

plotMCMC(fit, estimator = "nu")

Dirichlet random sampling

Description

Generate random draws from a Dirichlet distribution with parameter vector αR+K\alpha \in \mathbb{R}_+^K. Each draw is a length-KK probability vector on the simplex.

Usage

rdirichlet(n, alpha)

Arguments

n

Integer. Number of independent Dirichlet draws to generate.

alpha

Numeric vector of positive concentration parameters α=(α1,,αK)\alpha = (\alpha_1,\dots,\alpha_K). Its length KK defines the dimension of the simplex.

Details

Definition: If YkGamma(αk,1)Y_k \sim \mathrm{Gamma}(\alpha_k, 1) independently for k=1,,Kk=1,\dots,K (shape αk\alpha_k, rate 1), then the normalized vector Xk=Yk/j=1KYjX_k = Y_k / \sum_{j=1}^K Y_j follows Dirichlet(α)\mathrm{Dirichlet}(\alpha).

Note that alpha must be a numeric vector with strictly positive entries.

Value

A numeric matrix of size n×Kn \times K, where each row is an independent Dirichlet draw that sums to 11.

Examples

set.seed(123)
# 3-dimensional Dirichlet with asymmetric concentration
alpha <- c(2, 5, 3)
rdirichlet(5, alpha)

Fast sampler for the inverse-Wishart distribution

Description

Draw a random sample from an inverse-Wishart distribution IWp(ν,Ψ)\mathcal{IW}_p(\nu, \Psi) using the identity SIWp(ν,Ψ)S \sim \mathcal{IW}_p(\nu, \Psi) iff S1Wp(ν,Ψ1)S^{-1} \sim W_p(\nu, \Psi^{-1}). This implementation accepts Ψ1\Psi^{-1} directly for speed.

Usage

sampleIW(nu, Psi_inv)

Arguments

nu

Numeric. Degrees of freedom ν\nu of the inverse-Wishart (must exceed p1p - 1).

Psi_inv

Numeric p×pp \times p SPD matrix equal to Ψ1\Psi^{-1}, the inverse of the inverse-Wishart scale matrix.

Details

Sampling scheme:

  • Sample WWp(ν,Ψ1)W \sim W_p(\nu, \Psi^{-1}) using rWishart.

  • Return S=W1S = W^{-1}, which has IWp(ν,Ψ)\mathcal{IW}_p(\nu, \Psi).

Parameterization:

  • ν\nu is the degrees of freedom, must satisfy ν>p1\nu > p - 1.

  • Ψ\Psi is the SPD scale matrix of the inverse-Wishart. This function expects its inverse Ψ1\Psi^{-1} as input for efficiency (avoids repeated matrix inversions).

Note that: (i) internally calls rWishart(1, df = nu, Sigma = Psi_inv), and (ii) returns solve(W); if numerical issues arise, consider adding a small ridge to Ψ1\Psi^{-1} prior to sampling.

Value

A p×pp \times p SPD matrix SS distributed as IWp(ν,Ψ)\mathcal{IW}_p(\nu, \Psi).

Examples

set.seed(123)
p <- 3
# Construct an SPD scale matrix Psi
A <- matrix(rnorm(p * p), p, p)
Psi <- crossprod(A) + diag(p) * 0.5
Psi_inv <- solve(Psi)

# Draw one inverse-Wishart sample
S <- sampleIW(nu = p + 5, Psi_inv = Psi_inv)
S

Simulate data from a Wishart mixture or mixture-of-experts model

Description

Generate synthetic SPD matrices from either: (i) a finite mixture of Wishart components with fixed mixing proportions, or (ii) a mixture-of-experts (MoE) where mixing proportions depend on covariates via a softmax gating model.

Usage

simData(
  n = 200,
  p = 2,
  Xq = 0,
  K = NA,
  betas = NULL,
  pis = c(0.4, 0.6),
  nus = c(8, 12),
  Sigma = NULL
)

Arguments

n

Integer. Number of observations to simulate.

p

Integer. Dimension of the Wishart distribution (matrix size p×pp \times p).

Xq

Integer. Number of covariates for the gating network (MoE case). If Xq = 0, a standard mixture (no covariates) is simulated.

K

Integer. Number of latent components. Required when Xq > 0. If Xq = 0, defaults to length(pis).

betas

Numeric matrix Xq×KXq \times K of gating coefficients used when Xq > 0. If NULL, random coefficients are generated and the last column is set to zero (reference class).

pis

Numeric vector of length KK giving fixed mixture proportions when Xq = 0. Ignored when Xq > 0.

nus

Numeric vector length KK, degrees of freedom νk\nu_k for each component (must exceed p1p - 1).

Sigma

Optional list length KK of SPD scale matrices Σk\Sigma_k (each p×pp \times p). If NULL, defaults are generated based on K and p.

Details

Models:

  • Fixed mixture (no covariates, Xq = 0): ziCategorical(π)z_i \sim \mathrm{Categorical}(\pi), and Sizi=kWp(νk,Σk)S_i \mid z_i=k \sim W_p(\nu_k, \Sigma_k).

  • Mixture-of-experts (covariates, Xq > 0): Let XiRXqX_i \in \mathbb{R}^{Xq}. The mixing weights are πik=Pr(zi=kXi)\pi_{ik} = \Pr(z_i=k \mid X_i) given by softmax regression πik=exp(XiBk)/j=1Kexp(XiBj)\pi_{ik} = \exp(X_i^\top B_k) / \sum_{j=1}^K \exp(X_i^\top B_j). Labels ziz_i are drawn from Categorical(πi)\mathrm{Categorical}(\pi_i) and Sizi=kWp(νk,Σk)S_i \mid z_i=k \sim W_p(\nu_k, \Sigma_k).

Simulation steps:

  1. Construct pis:

    • If Xq = 0, replicate the provided pis over n rows.

    • If Xq > 0, generate X ~ N(0, I) and compute softmax probabilities using betas (last column set to zero by default identifiability).

  2. If Sigma is not provided, create default Σk\Sigma_k matrices (SPD) depending on K and p.

  3. Sample labels ziCategorical(πi)z_i \sim \mathrm{Categorical}(\pi_i).

  4. Draw SiS_i from Wp(νzi,Σzi)W_p(\nu_{z_i}, \Sigma_{z_i}) via rWishart.

Note that: (i) in the MoE case, no intercept is automatically added to X. Use Xq to include desired covariates; the default betas is randomly generated with betas[, K] = 0, and (ii) provided Sigma must be a list of SPD p×pp \times p matrices. Provided nus must exceed p1p - 1.

Value

A list with the following elements:

  • S: list of length n of simulated SPD matrices SiS_i.

  • z: integer vector length n of component labels.

  • nu: numeric vector length KK of degrees of freedom.

  • pi: matrix n×Kn \times K of mixing probabilities (rows sum to 11).

  • Sigma_list: list length KK of the scale matrices used for simulation.

  • X: matrix n×Xqn \times Xq of covariates if Xq > 0, otherwise NULL.

  • betas: the gating coefficient matrix used when Xq > 0, otherwise NULL.

Examples

# simulate data from mixture model (no covariates)
set.seed(123)
n <- 200 # subjects
p <- 10
dat <- simData(n, p,
  K = 3,
  pis = c(0.35, 0.40, 0.25),
  nus = c(8, 12, 3)
)
str(dat)