% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/disbayes_hier.R
\name{disbayes_hier}
\alias{disbayes_hier}
\title{Bayesian estimation of chronic disease epidemiology from incomplete data -
hierarchical model for case fatalities.}
\usage{
disbayes_hier(
  data,
  group,
  gender = NULL,
  inc_num = NULL,
  inc_denom = NULL,
  inc_prob = NULL,
  inc_lower = NULL,
  inc_upper = NULL,
  prev_num = NULL,
  prev_denom = NULL,
  prev_prob = NULL,
  prev_lower = NULL,
  prev_upper = NULL,
  mort_num = NULL,
  mort_denom = NULL,
  mort_prob = NULL,
  mort_lower = NULL,
  mort_upper = NULL,
  rem_num = NULL,
  rem_denom = NULL,
  rem_prob = NULL,
  rem_lower = NULL,
  rem_upper = NULL,
  age = "age",
  cf_init = 0.01,
  eqage = 30,
  eqagehi = NULL,
  cf_model = "default",
  inc_model = "smooth",
  rem_model = "const",
  prev_zero = FALSE,
  sprior = c(1, 1, 1),
  hp_fixed = NULL,
  nfold_int_guess = 5,
  nfold_int_upper = 100,
  nfold_slope_guess = 5,
  nfold_slope_upper = 100,
  mean_int_prior = c(0, 10),
  mean_slope_prior = c(5, 5),
  gender_int_priorsd = 0.82,
  gender_slope_priorsd = 0.82,
  inc_prior = c(1.1, 0.1),
  rem_prior = c(1.1, 1),
  method = "opt",
  draws = 1000,
  iter = 10000,
  stan_control = NULL,
  ...
)
}
\arguments{
\item{data}{Data frame containing some of the variables below.  The
variables below are provided as character strings naming columns in this
data frame.   For each disease measure available, one of the following three
combinations of variables must be specified:

(1) numerator and denominator (2) estimate and denominator (3) estimate
with lower and upper credible limits.

Mortality must be supplied, and at least one of incidence and prevalence.
If remission is assumed to be possible, then remission data should also be supplied (see below).

Estimates refer to the probability of having some event within a year, rather than rates.  Rates per year $r$ can be
converted to probabilities $p$ as $p = 1 - exp(-r)$, assuming the rate is constant
within the year.

For estimates based on registry data assumed to cover the whole
population, then the denominator will be the population size.}

\item{group}{Variable in the data representing the area (or other grouping
factor).}

\item{gender}{If \code{NULL} (the default) then the data are one homogenous
gender, and there should be one row per year of age.  Otherwise, set
\code{gender} to a character string naming the variable in the data
representing gender (or other categorical grouping factor).  Gender will then
treated as a fixed additive effect, so the linear effect of gender on log
case fatality is the same in each area.  The data should have one row per
year of age and gender.}

\item{inc_num}{Numerator for the incidence data, assumed to represent the
observed number of new cases within a year among a population of size
\code{inc_denom}.}

\item{inc_denom}{Denominator for the incidence data.

The function \code{\link{ci2num}} can be used to convert a published
estimate and interval for a proportion to an implicit numerator and
denominator.

Note that to include extra uncertainty beyond that implied by a published
interval, the numerator and denominator could be multiplied by a constant,
for example, multiplying both the numerator and denominator by 0.5 would
give the data source half its original weight.}

\item{inc_prob}{Estimate of the incidence probability}

\item{inc_lower}{Lower credible limit for the incidence estimate}

\item{inc_upper}{Upper credible limit for the incidence estimate}

\item{prev_num}{Numerator for the estimate of prevalence, i.e.
number of people currently with a disease.}

\item{prev_denom}{Denominator for the estimate of prevalence (e.g. the size
of the survey used to obtain the prevalence estimate)}

\item{prev_prob}{Estimate of the prevalence probability}

\item{prev_lower}{Lower credible limit for the prevalence estimate}

\item{prev_upper}{Upper credible limit for the prevalence estimate}

\item{mort_num}{Numerator for the estimate of the mortality probability, i.e
number of deaths attributed to the disease under study within a year}

\item{mort_denom}{Denominator for the estimate of the mortality probability (e.g.
the population size, if the estimates were obtained from a comprehensive
register)}

\item{mort_prob}{Estimate of the mortality probability}

\item{mort_lower}{Lower credible limit for the mortality estimate}

\item{mort_upper}{Upper credible limit for the mortality estimate}

\item{rem_num}{Numerator for the estimate of the remission probability, i.e number
of people observed to recover from the disease within a year.

Remission
data should be supplied if remission is permitted in the model, either as
a numerator and denominator or as an estimate and lower credible interval.
Conversely, if no remission data are supplied, then remission is assumed
to be impossible.  These "data" may represent a prior judgement rather than
observation - lower denominators or wider credible limits represent
weaker prior information.}

\item{rem_denom}{Denominator for the estimate of the remission probability}

\item{rem_prob}{Estimate of the remission probability}

\item{rem_lower}{Lower credible limit for the remission estimate}

\item{rem_upper}{Upper credible limit for the remission estimate}

\item{age}{Variable in the data indicating the year of age.  This must
start at age zero, but can end at any age.}

\item{cf_init}{Initial guess at a typical case fatality value, for an
average age.}

\item{eqage}{Case fatalities (and incidence and remission rates) are assumed to be equal for
all ages below this age, inclusive, when using the smoothed model.}

\item{eqagehi}{Case fatalities (and incidence and remission rates) are assumed to be equal for
all ages above this age, inclusive, when using the smoothed model.}

\item{cf_model}{The following alternative models for case fatality are
supported:

\code{"default"} (the default). Random intercepts and slopes, and no
further restriction.

\code{"interceptonly"}.  Random intercepts, but common slopes.

\code{"increasing"}. Case fatality is assumed to be an increasing function
of age (note it is constant below \code{"eqage"} in all models) with a
common slope for all groups.

\code{"common"} Case fatality is an unconstrained function of age
which is common to all areas, i.e. it has the same parameter values in
every area.  This and \code{"increasing_common"} are used in situations
where you want to compare a model with area-specific rates with a single model for
the data aggregated over areas.  Modelling the area-disaggregated data using
a common function for all areas is equivalent to a model for the aggregated data,
and can be statistically compared (using cross-validation) with a model with
area-specific rates.

\code{"increasing_common"} Case fatality is an increasing function of age
which is common to all areas.

\code{"const"} Case fatality is assumed to be constant with age, for all
ages, but different in each area.

\code{"const_common"} Case fatality is a constant over all ages and areas.

In all models, case fatality is a smooth function of age.}

\item{inc_model}{Model for how incidence varies with age.

\code{"smooth"} (the default). Incidence is modelled as a smooth spline
function of age, independently for each area (and gender).

\code{"indep"} Incidence rates for each year of age, area (and gender) are
estimated independently.}

\item{rem_model}{Model for how remission varies with age.  Currently
supported models are \code{"const"} for a constant remission rate over all
ages, \code{"const"} for a smooth spline,  or \code{"indep"} for a different remission rates estimated
independently for each age with no smoothing.}

\item{prev_zero}{If \code{TRUE}, attempt to estimate prevalence at age zero
from the data, as part of the Bayesian model, even if the observed prevalence is zero.
Otherwise (the default) this is assumed to be zero if the count is zero, and estimated
otherwise.}

\item{sprior}{Rates of the exponential prior distributions used to penalise
the coefficients of the spline model.   The default of 1 should adapt
appropriately to the data, but Higher values give stronger smoothing, or
lower values give weaker smoothing,  if required.

This can be a named vector with names \code{"inc","cf","rem"} in any
order, giving the prior smoothness rates for incidence, case fatality and
remission.  If any of these are not smoothed they can be excluded, e.g.
\code{sprior = c(cf=10, inc=1)}.

This can also be an unnamed vector of three elements, where the first
refers to the spline model for incidence, the second for case fatality,
the third for remission. If one of the rates (e.g. remission) is not being
modelled with a spline, any number can be supplied here and it is just
ignored.}

\item{hp_fixed}{A list with one named element for each hyperparameter
to be fixed.  The value should be either
\itemize{
\item a number (to fix the hyperparameter at this number)
\item \code{TRUE} (to fix the hyperparameter at the posterior mode from a training run
where it is not fixed)
}

If the element is either \code{NULL}, \code{FALSE}, or omitted from the list,
then the hyperparameter is given a prior and estimated as part of the Bayesian model.

The hyperparameters that can be fixed are
\itemize{
\item \code{scf} Smoothness parameter for the spline relating case fatality to age.
\item \code{sinc} Smoothness parameter for the spline relating incidence to age.
\item \code{scfmale} Smoothness parameter for the spline defining how the gender
effect relates to age.  Only for models with additive gender and area effects.
\item \code{sd_int} Standard deviation of random intercepts for case fatality.
\item \code{sd_slope} Standard deviation of random slopes for case fatality.
}

For example, to fix the case fatality smoothness to 1.2, fix the incidence
smoothness to its posterior mode, and estimate all the other hyperparameters,
specify \code{hp_fixed = list(scf=1.2, sinc=TRUE)}.}

\item{nfold_int_guess}{Prior guess at the ratio of case fatality between a
high risk (97.5\% quantile) and low risk (2.5\% quantile) area.}

\item{nfold_int_upper}{Prior upper 95\% credible limit for the ratio in
average case fatality between a high risk (97.5\% quantile) and low risk
(2.5\% quantile) area.}

\item{nfold_slope_guess, nfold_slope_upper}{This argument and the next
argument define the prior distribution for the variance in the random
linear effects of age on log case fatality.   They define a prior guess
and upper 95\% credible limit for the ratio of case fatality slopes
between a high trend (97.5\% quantile) and low risk (2.5\% quantile) area.
(Note that the model is not exactly linear, since departures from
linearity are defined through a spline model.  See the Jackson et al.
paper for details.).}

\item{mean_int_prior}{Vector of two elements giving the prior mean and
standard deviation respectively for the mean random intercept for log case
fatality.}

\item{mean_slope_prior}{Vector of two elements giving the prior mean and
standard deviation respectively for the mean random slope for log case
fatality.}

\item{gender_int_priorsd}{Prior standard deviation for the additive effect
of gender on log case fatality}

\item{gender_slope_priorsd}{Prior standard deviation for the additive effect
of gender on the linear age slope of log case fatality}

\item{inc_prior}{Vector of two elements giving the Gamma shape and rate parameters of the
prior for the incidence rate.  Only used if \code{inc_model="indep"}, for each age-specific rate.}

\item{rem_prior}{Vector of two elements giving the Gamma shape and rate parameters of the
prior for the remission rate, used in both \code{rem_model="const"} and \code{rem_model="indep"}.}

\item{method}{String indicating the inference method, defaulting to
\code{"opt"}.

If \code{method="mcmc"} then a sample from the posterior is drawn using Markov Chain Monte Carlo
sampling, via \pkg{rstan}'s \code{\link[rstan:stanmodel-method-sampling]{rstan::sampling()}} function.   This is the most
accurate but the slowest method.

If \code{method="opt"}, then instead of an MCMC sample from the posterior,
\code{disbayes} returns the posterior mode calculated using optimisation, via
\pkg{rstan}'s \code{\link[rstan:stanmodel-method-optimizing]{rstan::optimizing()}} function.
A sample from a normal approximation to the (real-line-transformed)
posterior distribution is drawn in order to obtain credible intervals.

If the optimisation fails to converge (non-zero return code), try increasing the
number of iterations from the default 1000, e.g. \code{disbayes(..., iter=10000, ...)}, or changing the algorithm to \code{disbayes(..., algorithm="Newton", ...)}.

If there is an error message that mentions \code{chol}, then
the computed Hessian matrix is not positive definite at the reported optimum, hence credible intervals
cannot be computed.
This can occur either because of numerical error in computation of the Hessian, or because the
reported optimum is wrong.  If you are willing to believe
the optimum and live without credible intervals, then set \code{draws=0} to skip
computation of the Hessian.   To examine the problematic Hessian, set
\code{hessian=TRUE,draws=0}, then look at the \code{$fit$hessian} component of the
\code{disbayes} return object.   If it can be inverted, do \code{sqrt(diag(solve()))} on the Hessian, and
check for \code{NaN}s, indicating the problematic parameters.
Otherwise, diagonal entries of the Hessian matrix that are very small
may indicate parameters that are poorly identified from the data, leading to computational
problems.

If \code{method="vb"}, then variational Bayes methods are used, via \pkg{rstan}'s
\code{\link[rstan:stanmodel-method-vb]{rstan::vb()}} function.  This is labelled as "experimental" by
\pkg{rstan}.  It might give a better approximation to the posterior
than \code{method="opt"}, but has not been investigated much for \code{disbayes} models.}

\item{draws}{Number of draws from the normal approximation to the posterior
when using \code{method="opt"}.}

\item{iter}{Number of iterations for MCMC sampling, or maximum number of iterations for optimization.}

\item{stan_control}{(\code{method="mcmc"} only). List of options supplied as the \code{control} argument
to \code{\link[rstan:stanmodel-method-sampling]{rstan::sampling()}} in \pkg{rstan} for the main model fit.}

\item{...}{Further arguments passed to \code{\link[rstan:stanmodel-method-sampling]{rstan::sampling()}} to
control MCMC sampling, or \code{\link[rstan:stanmodel-method-optimizing]{rstan::optimizing()}} to control
optimisation, in Stan.}
}
\value{
A list including the following components

\code{call}: Function call that was used.

\code{fit}: An object containing posterior samples from the fitted model,
in the \code{stanfit} format returned by the \code{\link[rstan]{stan}}
function in the \pkg{rstan} package.

\code{method}:  Optimisation method that was chosen.

\code{nage}: Number of years of age in the data

\code{narea}: Number of areas (or other grouping variable that defines the hierarchical model).

\code{ng}: Number of genders (or other categorical variable whose effect is treated as
additive with the area effect).

\code{groups}: Names of the areas (or other grouping variable), taken from the factor levels in the
original data.

\code{genders}: Names of the genders (or other categorical variable), taken from the factor levels in the
original data.

\code{dat}: A list containing the input data in the form of numerators
and denominators.

\code{stan_data}: Full list of data supplied to Stan

\code{stan_inits}: Full list of parameter initial values supplied to Stan

\code{trend}: Whether a time trend was modelled

\code{hp_fixed} Values of any hyperparameters that are fixed during the main model fit.
}
\description{
A variant of \code{\link{disbayes}} in which data from different areas can be
related in a hierarchical model and, optionally, the effect of gender can be
treated as additive with the effect of area.  This is much more computationally
intensive than the basic model in \code{\link{disbayes}}.  Time trends are not
supported in this function.
}
\references{
Jackson C, Zapata-Diomedi B, Woodcock J.
"Bayesian multistate modelling of incomplete chronic disease burden data"
\url{https://arxiv.org/abs/2111.14100}
}
