% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/tsraking.R
\name{tsraking}
\alias{tsraking}
\title{Restore cross-sectional (contemporaneous) aggregation constraints}
\usage{
tsraking(
  data_df,
  metadata_df,
  alterability_df = NULL,
  alterSeries = 1,
  alterTotal1 = 0,
  alterTotal2 = 0,
  alterAnnual = 0,
  tolV = 0.001,
  tolP = NA,
  warnNegResult = TRUE,
  tolN = -0.001,
  id = NULL,
  verbose = FALSE,

  # New in G-Series 3.0
  Vmat_option = 1,
  warnNegInput = TRUE,
  quiet = FALSE
)
}
\arguments{
\item{data_df}{(mandatory)

Data frame (object of class "data.frame") that contains the time series data to be reconciled. It must minimally
contain variables corresponding to the component series and cross-sectional control totals specified in the
metadata data frame (argument \code{metadata_df}). If more than one observation (period) is provided, the sum of
the provided component series values will also be preserved as part of implicit temporal constraints.}

\item{metadata_df}{(mandatory)

Data frame (object of class "data.frame") that describes the cross-sectional aggregation constraints
(additivity rules) for the raking problem. Two character variables must be included in the metadata data frame:
\code{series} and \code{total1}. Two variables are optional: \code{total2} (character) and \code{alterAnnual} (numeric). The values
of variable \code{series} represent the variable names of the component series in the input time series data frame
(argument \code{data_df}). Similarly, the values of variables \code{total1} and \code{total2} represent the variable names of
the 1\if{html}{\out{<sup>}}st\if{html}{\out{</sup>}} and 2\if{html}{\out{<sup>}}nd\if{html}{\out{</sup>}} dimension cross-sectional control totals in the input time series data
frame. Variable \code{alterAnnual} contains the alterability coefficient for the temporal constraint associated to
each component series. When specified, the latter will override the default alterability coefficient specified
with argument \code{alterAnnual}.}

\item{alterability_df}{(optional)

Data frame (object of class "data.frame"), or \code{NULL}, that contains the alterability coefficients variables.
They must correspond to a component series or a cross-sectional control total, that is, a variable with the same
name must exist in the input time series data frame (argument \code{data_df}). The values of these alterability
coefficients will override the default alterability coefficients specified with arguments \code{alterSeries},
\code{alterTotal1} and \code{alterTotal2}. When the input time series data frame contains several observations and the
alterability coefficients data frame contains only one, the alterability coefficients are used (repeated) for
all observations of the input time series data frame. Alternatively, the alterability coefficients data frame
may contain as many observations as the input time series data frame.

\strong{Default value} is \code{alterability_df = NULL} (default alterability coefficients).}

\item{alterSeries}{(optional)

Nonnegative real number specifying the default alterability coefficient for the component series values. It
will apply to component series for which alterability coefficients have not already been specified in the
alterability coefficients data frame (argument \code{alterability_df}).

\strong{Default value} is \code{alterSeries = 1.0} (nonbinding component series values).}

\item{alterTotal1}{(optional)

Nonnegative real number specifying the default alterability coefficient for the 1\if{html}{\out{<sup>}}st\if{html}{\out{</sup>}} dimension
cross-sectional control totals. It will apply to cross-sectional control totals for which alterability
coefficients have not already been specified in the alterability coefficients data frame (argument
\code{alterability_df}).

\strong{Default value} is \code{alterTotal1 = 0.0} (binding 1\if{html}{\out{<sup>}}st\if{html}{\out{</sup>}} dimension cross-sectional control totals)}

\item{alterTotal2}{(optional)

Nonnegative real number specifying the default alterability coefficient for the 2\if{html}{\out{<sup>}}nd\if{html}{\out{</sup>}} dimension
cross-sectional control totals. It will apply to cross-sectional control totals for which alterability
coefficients have not already been specified in the alterability coefficients data frame (argument
\code{alterability_df}).

\strong{Default value} is \code{alterTotal2 = 0.0} (binding 2\if{html}{\out{<sup>}}nd\if{html}{\out{</sup>}} dimension cross-sectional control totals).}

\item{alterAnnual}{(optional)

Nonnegative real number specifying the default alterability coefficient for the component series temporal
constraints (e.g., annual totals). It will apply to component series for which alterability coefficients
have not already been specified in the metadata data frame (argument \code{metadata_df}).

\strong{Default value} is \code{alterAnnual = 0.0} (binding temporal control totals).}

\item{tolV, tolP}{(optional)

Nonnegative real number, or \code{NA}, specifying the tolerance, in absolute value or percentage, to be used
when performing the ultimate test in the case of binding totals (alterability coefficient of \eqn{0.0}
for temporal or cross-sectional control totals). The test compares the input binding control totals with
the ones calculated from the reconciled (output) component series. Arguments \code{tolV} and \code{tolP} cannot be both
specified together (one must be specified while the other must be \code{NA}).

\strong{Example:} to set a tolerance of 10 \emph{units}, specify \verb{tolV = 10, tolP = NA}; to set a tolerance of 1\%,
specify \verb{tolV = NA, tolP = 0.01}.

\strong{Default values} are \code{tolV = 0.001} and \code{tolP = NA}.}

\item{warnNegResult}{(optional)

Logical argument specifying whether a warning message is generated when a negative value created by the
function in the reconciled (output) series is smaller than the threshold specified by argument \code{tolN}.

\strong{Default value} is \code{warnNegResult = TRUE}.}

\item{tolN}{(optional)

Negative real number specifying the threshold for the identification of negative values. A value is
considered negative when it is smaller than this threshold.

\strong{Default value} is \code{tolN = -0.001}.}

\item{id}{(optional)

String vector (minimum length of 1), or \code{NULL}, specifying the name of additional variables to be transferred
from the input time series data frame (argument \code{data_df}) to the output time series data frame, the
object returned by the function (see section \strong{Value}). By default, the output series data frame only contains
the variables listed in the metadata data frame (argument \code{metadata_df}).

\strong{Default value} is \code{id = NULL}.}

\item{verbose}{(optional)

Logical argument specifying whether information on intermediate steps with execution time (real time,
not CPU time) should be displayed. Note that specifying argument \code{quiet = TRUE} would \emph{nullify} argument
\code{verbose}.

\strong{Default value} is \code{verbose = FALSE}.}

\item{Vmat_option}{(optional)

Specification of the option for the variance matrices (\eqn{V_e} and \eqn{V_\epsilon}; see section \strong{Details}):\tabular{cl}{
   \strong{Value} \tab \strong{Description} \cr
   \code{1} \tab Use vectors \eqn{x} and \eqn{g} in the variance matrices. \cr
   \code{2} \tab Use vectors \eqn{|x|} and \eqn{|g|} in the variance matrices. \cr
}


See Ferland (2016) and subsection \strong{Arguments \code{Vmat_option} and \code{warnNegInput}} in section \strong{Details} for
more information.

\strong{Default value} is \code{Vmat_option = 1}.}

\item{warnNegInput}{(optional)

Logical argument specifying whether a warning message is generated when a negative value smaller than
the threshold specified by argument \code{tolN} is found in the input time series data frame (argument \code{data_df}).

\strong{Default value} is \code{warnNegInput = TRUE}.}

\item{quiet}{(optional)

Logical argument specifying whether or not to display only essential information such as warnings and errors.
Specifying \code{quiet = TRUE} would also \emph{nullify} argument \code{verbose} and is equivalent to \emph{wrapping} your
\code{\link[=tsraking]{tsraking()}} call with \code{\link[=suppressMessages]{suppressMessages()}}.

\strong{Default value} is \code{quiet = FALSE}.}
}
\value{
The function returns a data frame containing the reconciled component series, reconciled cross-sectional control
totals and variables specified with  argument \code{id}. Note that the "data.frame" object can be explicitly coerced to
another type of object with the appropriate \verb{as*()} function (e.g., \code{tibble::as_tibble()} would coerce it to a tibble).
}
\description{
\if{html,text}{(\emph{version française: 
\url{https://StatCan.github.io/gensol-gseries/fr/reference/tsraking.html}})}

\emph{Replication of the G-Series 2.0 SAS\eqn{^\circledR}{®} TSRAKING procedure (PROC TSRAKING). See the
G-Series 2.0 documentation for details (Statistics Canada 2016).}

This function will restore cross-sectional aggregation constraints in a system of time series. The
aggregation constraints may come from a 1 or 2-dimensional table. Optionally, temporal constraints
can also be preserved.

\code{\link[=tsraking]{tsraking()}} is usually called in practice through \code{\link[=tsraking_driver]{tsraking_driver()}} in order to reconcile
all periods of the time series system in a single function call.
}
\details{
This function returns the generalized least squared solution of a specific, simple variant of the general
regression-based raking model proposed by Dagum and Cholette (Dagum and Cholette 2006). The model, in matrix form, is:
\deqn{\displaystyle
\begin{bmatrix} x \\ g \end{bmatrix} = 
\begin{bmatrix} I \\ G \end{bmatrix} \theta + 
\begin{bmatrix} e \\ \varepsilon \end{bmatrix}
}{[x; g] = [I; G] theta + [e; epsilion]}
where
\itemize{
\item \eqn{x} is the vector of the initial component series values.
\item \eqn{\theta} is the vector of the final (reconciled) component series values.
\item \eqn{e \sim \left( 0, V_e \right)}{e ~ (0, V_e)} is the vector of the measurement errors of \eqn{x} with covariance
matrix \eqn{V_e = \mathrm{diag} \left( c_x x \right)}{V_e = diag(c_x x)}, or \eqn{V_e = \mathrm{diag} \left( \left| 
c_x x \right| \right)}{V_e = diag(|c_x x|)} when argument \code{Vmat_option = 2}, where \eqn{c_x} is the vector of the
alterability coefficients of \eqn{x}.
\item \eqn{g} is the vector of the initial control totals, including the component series temporal totals (when
applicable).
\item \eqn{\varepsilon \sim (0, V_\varepsilon)}{epsilon ~ (0, V_epsilon)} is the vector of the measurement errors of
\eqn{g} with covariance matrix \eqn{V_\varepsilon = \mathrm{diag} \left( c_g g \right)}{V_epsilion = diag(c_g g)}, or
\eqn{V_\varepsilon = \mathrm{diag} \left( \left| c_g g \right| \right)}{V_epsilon = diag(|c_g g|)} when argument
\code{Vmat_option = 2}, where \eqn{c_g} is the vector of the alterability coefficients of \eqn{g}.
\item \eqn{G} is the matrix of aggregation constraints, including the implicit temporal constraints (when applicable).
}

The generalized least squared solution is:
\deqn{\displaystyle 
\hat{\theta} = x + V_e G^{\mathrm{T}} \left( G V_e G^{\mathrm{T}} + V_\varepsilon \right)^+ \left( g - G x \right)
}{theta^hat = x + V_e G' (G V_e G' + V_epsilion)^{+} (g - G x)}
where \eqn{A^{+}} designates the Moore-Penrose inverse of matrix \eqn{A}.

\code{\link[=tsraking]{tsraking()}} solves a single raking problem, i.e., either a single period of the time series system, or a single
temporal group (e.g., all periods of a given year) when temporal total preservation is required. Several call to
\code{\link[=tsraking]{tsraking()}} are therefore necessary in order to reconcile all the periods of the time series system.
\code{\link[=tsraking_driver]{tsraking_driver()}} can achieve this in a single call: it conveniently determines the required set of raking
problems to be solved and internally generates the individual calls to \code{\link[=tsraking]{tsraking()}}.
\subsection{Alterability Coefficients}{

Alterability coefficients \eqn{c_x} and \eqn{c_g} conceptually represent the measurement errors associated with the
input component series values \eqn{x} and control totals \eqn{g} respectively. They are nonnegative real numbers which,
in practice, specify the extent to which an initial value can be modified in relation to other values. Alterability
coefficients of \eqn{0.0} define fixed (binding) values while alterability coefficients greater than \eqn{0.0} define
free (nonbinding) values. Increasing the alterability coefficient of an intial value results in more changes for that
value in the reconciled (output) data and, conversely, less changes when decreasing the alterability coefficient. The
default alterability coefficients are \eqn{1.0} for the component series values and \eqn{0.0} for the cross-sectional
control totals and, when applicable, the component series temporal totals. These default alterability coefficients
result in a proportional allocation of the discrepancies to the component series. Setting the component series
alterability coefficients to the inverse of the component series initial values would result in a uniform allocation
of the discrepancies instead. \emph{Almost binding} totals can be obtained in practice by specifying very small
(almost \eqn{0.0}) alterability coefficients relative to those of the (nonbinding) component series.

\strong{Temporal total preservation} refers to the fact that temporal totals, when applicable, are usually kept
“as close as possible” to their initial value. \emph{Pure preservation} is achieved by default with binding temporal
totals while the change is minimized with nonbinding temporal totals (in accordance with the set of alterability
coefficients).
}

\subsection{Arguments \code{Vmat_option} and \code{warnNegInput}}{

These arguments allow for an alternative handling of negative values in the input data, similar to that of \code{\link[=tsbalancing]{tsbalancing()}}.
Their default values correspond to the G-Series 2.0 behaviour (SAS\eqn{^\circledR}{®} PROC TSRAKING) for which equivalent
options are not defined. The latter was developed with "nonnegative input data only" in mind, similar to SAS\eqn{^\circledR}{®}
PROC BENCHMARKING in G-Series 2.0 that did not allow negative values either with proportional benchmarking, which explains
the "suspicious use of proportional raking" warning in presence of negative values with PROC TSRAKING in G-Series 2.0 and
when \code{warnNegInput = TRUE} (default). However, (proportional) raking in the presence of negative values generally works well
with \code{Vmat_option = 2} and produces reasonable, intuitive solutions. E.g., while the default \code{Vmat_option = 1} fails at
solving constraint \code{A + B = C} with input data \code{A = 2}, \code{B = -2}, \code{C = 1} and the default alterability coefficients,
\code{Vmat_option = 2} returns the (intuitive) solution \code{A = 2.5}, \code{B = -1.5}, \code{C = 1} (25\% increase for \code{A} and \code{B}). See
Ferland (2016) for more details.
}

\subsection{Treatment of Missing (\code{NA}) Values}{

Missing values in the input time series data frame (argument \code{data_df}) or alterability coefficients data frame
(argument \code{alterability_df}) for any of the raking problem data (variables listed in the metadata data frame
with argument \code{metadata_df}) will generate an error message and stop the function execution.
}
}
\section{Comparing \code{\link[=tsraking]{tsraking()}} and \code{\link[=tsbalancing]{tsbalancing()}}}{
\itemize{
\item \code{\link[=tsraking]{tsraking()}} is limited to one- and two-dimensional aggregation table raking problems (with temporal total
preservation if required) while \code{\link[=tsbalancing]{tsbalancing()}} handles more general balancing problems (e.g., higher dimensional
raking problems, nonnegative solutions, general linear equality and inequality constraints as opposed to aggregation
rules only, etc.).
\item \code{\link[=tsraking]{tsraking()}} returns the generalized least squared solution of the Dagum and Cholette regression-based raking
model (Dagum and Cholette 2006) while \code{\link[=tsbalancing]{tsbalancing()}} solves the corresponding quadratic minimization problem using
a numerical solver. In most cases, \emph{convergence to the minimum} is achieved and the \code{\link[=tsbalancing]{tsbalancing()}} solution matches
the (exact) \code{\link[=tsraking]{tsraking()}} least square solution. It may not be the case, however, if convergence could not be achieved
after a reasonable number of iterations. Having said that, only in very rare occasions will the \code{\link[=tsbalancing]{tsbalancing()}}
solution \emph{significantly} differ from the \code{\link[=tsraking]{tsraking()}} solution.
\item \code{\link[=tsbalancing]{tsbalancing()}} is usually faster than \code{\link[=tsraking]{tsraking()}}, especially for large raking problems, but is generally more
sensitive to the presence of (small) inconsistencies in the input data associated to the redundant constraints of
fully specified (over-specified) raking problems. \code{\link[=tsraking]{tsraking()}} handles these inconsistencies by using the
Moore-Penrose inverse (uniform distribution among all binding totals).
\item \code{\link[=tsbalancing]{tsbalancing()}} accommodates the specification of sparse problems in their reduced form. This is not true in the
case of \code{\link[=tsraking]{tsraking()}} where aggregation rules must always be fully specified since a \emph{complete data cube} without
missing data is expected as input (every single \emph{inner-cube} component series must contribute to all dimensions of
the cube, i.e., to every single \emph{outer-cube} marginal total series).
\item Both tools handle negative values in the input data differently by default. While the solutions of raking problems
obtained from \code{\link[=tsbalancing]{tsbalancing()}} and \code{\link[=tsraking]{tsraking()}} are identical when all input data points are positive, they will
differ if some data points are negative (unless argument \code{Vmat_option = 2} is specified with \code{\link[=tsraking]{tsraking()}}).
\item While both \code{\link[=tsbalancing]{tsbalancing()}} and \code{\link[=tsraking]{tsraking()}} allow the preservation of temporal totals, time management is not
incorporated in \code{\link[=tsraking]{tsraking()}}. For example, the construction of the processing groups (sets of periods of each raking
problem) is left to the user with \code{\link[=tsraking]{tsraking()}} and separate calls must be submitted for each processing group (each
raking problem). That's where helper function \code{\link[=tsraking_driver]{tsraking_driver()}} comes in handy with \code{\link[=tsraking]{tsraking()}}.
\item \code{\link[=tsbalancing]{tsbalancing()}} returns the same set of series as the input time series object while \code{\link[=tsraking]{tsraking()}} returns the set
of series involved in the raking problem plus those specified with argument \code{id} (which could correspond to a subset
of the input series).
}
}

\examples{
###########
# Example 1: Simple 1-dimensional raking problem where the values of `cars` and `vans`
#            must sum up to the value of `total`.

# Problem metadata
my_metadata1 <- data.frame(series = c("cars", "vans"),
                           total1 = c("total", "total"))
my_metadata1

# Problem data
my_series1 <- data.frame(cars = 25, vans = 5, total = 40)

# Reconcile the data
out_raked1 <- tsraking(my_series1, my_metadata1)

# Initial data
my_series1

# Reconciled data
out_raked1

# Check the output cross-sectional constraint
all.equal(rowSums(out_raked1[c("cars", "vans")]), out_raked1$total)

# Check the control total (fixed)
all.equal(my_series1$total, out_raked1$total)


###########
# Example 2: 2-dimensional raking problem similar to the 1st example but adding
#            regional sales for the 3 prairie provinces (Alb., Sask. and Man.)
#            and where the sales of vans in Sask. are non-alterable
#            (alterability coefficient = 0), with `quiet = TRUE` to avoid
#            displaying the function header.

# Problem metadata
my_metadata2 <- data.frame(series = c("cars_alb", "cars_sask", "cars_man",
                                      "vans_alb", "vans_sask", "vans_man"),
                           total1 = c(rep("cars_total", 3),
                                      rep("vans_total", 3)),
                           total2 = rep(c("alb_total", "sask_total", "man_total"), 2))
my_metadata2

# Problem data
my_series2 <- data.frame(cars_alb = 12, cars_sask = 14, cars_man = 13,
                         vans_alb = 20, vans_sask = 20, vans_man = 24,
                         alb_total = 30, sask_total = 31, man_total = 32,
                         cars_total = 40, vans_total = 53)

# Reconciled data
out_raked2 <- tsraking(my_series2, my_metadata2,
                       alterability_df = data.frame(vans_sask = 0),
                       quiet = TRUE)

# Initial data
my_series2

# Reconciled data
out_raked2

# Check the output cross-sectional constraints
all.equal(rowSums(out_raked2[c("cars_alb", "cars_sask", "cars_man")]), out_raked2$cars_total)
all.equal(rowSums(out_raked2[c("vans_alb", "vans_sask", "vans_man")]), out_raked2$vans_total)
all.equal(rowSums(out_raked2[c("cars_alb", "vans_alb")]), out_raked2$alb_total)
all.equal(rowSums(out_raked2[c("cars_sask", "vans_sask")]), out_raked2$sask_total)
all.equal(rowSums(out_raked2[c("cars_man", "vans_man")]), out_raked2$man_total)

# Check the control totals (fixed)
tot_cols <- union(unique(my_metadata2$total1), unique(my_metadata2$total2))
all.equal(my_series2[tot_cols], out_raked2[tot_cols])

# Check the value of vans in Saskatchewan (fixed at 20)
all.equal(my_series2$vans_sask, out_raked2$vans_sask)
}
\references{
Bérubé, J. and S. Fortier (2009). "PROC TSRAKING: An in-house SAS\eqn{^\circledR}{®} procedure for balancing
time series". In \strong{JSM Proceedings, Business and Economic Statistics Section}. Alexandria, VA: American Statistical
Association.

Dagum, E. B. and P. Cholette (2006). \strong{Benchmarking, Temporal Distribution and Reconciliation Methods
of Time Series}. Springer-Verlag, New York, Lecture Notes in Statistics, Vol. 186.

Ferland, M. (2016). "Negative Values with PROC TSRAKING". \strong{Internal document}. Statistics Canada, Ottawa,
Canada.

Fortier, S. and B. Quenneville (2009). "Reconciliation and Balancing of Accounts and Time Series".
In \strong{JSM Proceedings, Business and Economic Statistics Section}. Alexandria, VA: American Statistical Association.

Quenneville, B. and S. Fortier (2012). "Restoring Accounting Constraints in Time Series – Methods and
Software for a Statistical Agency". \strong{Economic Time Series: Modeling and Seasonality}. Chapman & Hall, New York.

Statistics Canada (2016). "The TSRAKING Procedure". \strong{G-Series 2.0 User Guide}. Statistics Canada,
Ottawa, Canada.

Statistics Canada (2018). \strong{Theory and Application of Reconciliation (Course code 0437)}.
Statistics Canada, Ottawa, Canada.
}
\seealso{
\code{\link[=tsraking_driver]{tsraking_driver()}} \code{\link[=tsbalancing]{tsbalancing()}} \code{\link[=rkMeta_to_blSpecs]{rkMeta_to_blSpecs()}} \code{\link[=gs.gInv_MP]{gs.gInv_MP()}} \code{\link[=build_raking_problem]{build_raking_problem()}} \link{aliases}
}
