% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/ombc_gmm.R
\name{ombc_gmm}
\alias{ombc_gmm}
\title{Sequentially identify outliers while fitting a Gaussian mixture model.}
\usage{
ombc_gmm(
  x,
  comp_num,
  max_out,
  gross_outs = rep(FALSE, nrow(x)),
  init_scheme = c("update", "reinit", "reuse"),
  mnames = "VVV",
  nmax = 1000,
  atol = 1e-08,
  init_z = NULL,
  init_model = NULL,
  init_method = c("hc", "kmpp"),
  init_scaling = FALSE,
  kmpp_seed = 123,
  fixed_labels = NULL,
  verbose = TRUE
)
}
\arguments{
\item{x}{Data.}

\item{comp_num}{Number of mixture components.}

\item{max_out}{Maximum number of outliers.}

\item{gross_outs}{Logical vector identifying gross outliers.}

\item{init_scheme}{Which initialisation scheme to use.}

\item{mnames}{Model names for mixture::gpcm.}

\item{nmax}{Maximum number of iterations for \code{mixture::gpcm}.}

\item{atol}{EM convergence tolerance threshold for \code{mixture::gpcm}.}

\item{init_z}{Initial component assignment probability matrix.}

\item{init_model}{Initial mixture model (\code{mixture::gpcm} \code{best_model}).}

\item{init_method}{Method used to initialise each mixture model.}

\item{init_scaling}{Logical value controlling whether the data should be
scaled for initialisation.}

\item{kmpp_seed}{Optional seed for k-means++ initialisation.}

\item{fixed_labels}{Cluster labels that are known a prior. See \code{label}
argument in \code{mixture::gpcm}.}

\item{verbose}{Whether the iteration count is printed.}
}
\value{
\code{ombc_gmm} returns an object of class "outliermbc_gmm", which is essentially
a list with the following elements:
\describe{
\item{\code{labels}}{Vector of mixture component labels with outliers denoted by
0.}
\item{\code{outlier_bool}}{Logical vector indicating if an observation has been
classified as an outlier.}
\item{\code{outlier_num}}{Number of observations classified as outliers.}
\item{\code{outlier_rank}}{Order in which observations are removed from the data
set. Observations which were provisionally removed,
including those that were eventually not classified
as outliers, are ranked from \code{1} to \code{max_out}. All
gross outliers have rank \code{1}. If there are
\code{gross_num} gross outliers, then the observations
removed during the main algorithm itself will be
numbered from \code{gross_num + 1} to \code{max_out}.
Observations that were ever removed have rank \code{0}.}
\item{\code{gross_outs}}{Logical vector identifying the gross outliers. This is
identical to the \code{gross_outs} vector passed to this
function as an argument / input.}
\item{\code{mix}}{Output from \code{mixture::gpcm} fitted to the non-outlier
observations.}
\item{\code{loglike}}{Vector of log-likelihood values for each iteration.}
\item{\code{removal_dens}}{Vector of mixture densities for the removed
observations. These are the lowest mixture densities
at each iteration.}
\item{\code{distrib_diff_vec}}{Vector of aggregated cross-component
dissimilarity values for each iteration.}
\item{\code{distrib_diff_mat}}{Matrix of component-specific dissimilarity values
for each iteration.}
\item{\code{call}}{Arguments / parameter values used in this function call.}
\item{\code{version}}{Version of \code{outlierMBC} used in this function call.}
\item{\code{conv_status}}{Logical vector indicating which iterations' mixture
models reached convergence during model-fitting.}
}
}
\description{
This function performs model-based clustering and outlier identification. It
does so by iteratively fitting a Gaussian mixture model and removing the
observation that is least likely under the model. Its procedure is summarised
below:
\enumerate{
\item Fit a Gaussian mixture model to the data.
\item Compute a dissimilarity between the theoretical and observed distributions
of the scaled squared sample Mahalanobis distances for each mixture
component.
\item Aggregate across the components to obtain a single dissimilarity value.
\item Remove the observation  with the lowest mixture density.
\item Repeat Steps 1-4 until \code{max_out} observations have been removed.
\item Identify the number of outliers which minimised the aggregated
dissimilarity, remove only those observations, and fit a Gaussian mixture
model to the remaining data.
}
}
\examples{
ombc_gmm_k3n1000o10 <- ombc_gmm(
  gmm_k3n1000o10[, 1:2],
  comp_num = 3, max_out = 20
)

plot_curve(ombc_gmm_k3n1000o10)
}
