% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/computePercentiles.R
\name{computePercentiles}
\alias{computePercentiles}
\title{Compute percentiles of column values.}
\usage{
computePercentiles(channel, tableName, columnName = NULL,
  columns = columnName, temporal = FALSE, percentiles = c(ifelse(temporal,
  5, 0), 5, 10, 25, 50, 75, 90, 95, 100), by = NULL, where = NULL,
  nameInDataFrame = "column", stringsAsFactors = FALSE, test = FALSE,
  parallel = FALSE)
}
\arguments{
\item{channel}{connection object as returned by \code{\link{odbcConnect}}}

\item{tableName}{Aster table name}

\item{columnName}{deprecated. Use vector \code{columns} instead.}

\item{columns}{names of the columns to compute percentiles on}

\item{temporal}{logical: TRUE indicates all columns are temporal, otherwsie numerical.
Temporal percentiles have 2 values: character \code{value} representing temporal
percentile (date, time, timestamp or datetime) and integer \code{epoch} value 
of the number of seconds since 1970-01-01 00:00:00-00 (can be negative) or for interval 
values includeing \code{time}, the total number of seconds in the interval.}

\item{percentiles}{integer vector with percentiles to compute. Values \code{0, 25, 50, 75, 100}
will always be added if omitted for numerical types, and \code{25, 50, 75, 100} for 
temporal. Percentile 0 (minimum) has to be included explicitly for temporals as its 
computation affects performance more than others.}

\item{by}{for optional grouping by one or more values for faceting or alike. 
Used with \code{\link{createBoxplot}} in combination with column name for x-axis and 
wrap or grid faceting.}

\item{where}{specifies criteria to satisfy by the table rows before applying
computation. The creteria are expressed in the form of SQL predicates (inside
\code{WHERE} clause).}

\item{nameInDataFrame}{name of the column in returned data frame to store table column name(s)
defined by parameter \code{columns}. \code{NULL} indicates omit this column from the data 
frame (not recommended when computing percentiles for multiple columns).}

\item{stringsAsFactors}{logical: should columns returned as character and not excluded by \code{as.is}
and not converted to anything else be converted to factors?}

\item{test}{logical: if TRUE show what would be done, only (similar to parameter \code{test} in \link{RODBC} 
functions like \link{sqlQuery} and \link{sqlSave}).}

\item{parallel}{logical: enable parallel calls to Aster database. This option requires parallel 
backend enabled and registered (see in examples). Parallel execution requires ODBC \code{channel} 
obtained without explicit password: either with \code{\link{odbcConnect}(dsn)} or 
\code{\link{odbcDriverConnect}} calls, but not with \code{\link{odbcConnect}(dsn, user, password)}.}
}
\value{
For numeric data function returns a data frame with percentile values organized 
  into following columns:
  \itemize{
    \item \emph{percentile} percentile to compute (from 0 to 100): will contain all valid values 
      from \code{percentiles}
    \item \emph{value} computed percentile
    \item \emph{column} table column name. Override name \code{column} with parameter \code{nameInDataFrame}
      or omit this column all together if \code{NULL}.
    \item \emph{by[1], by[2], ...} in presence of parameter \code{by}, contain values of the grouping 
      columns for computed percentiles (optional). 
  }
  For temporal data function returns a data frame with percentile values organized 
  into following columns:
  \itemize{
    \item \emph{percentile} percentile to compute (from 0 to 100): will contain all valid values 
      from \code{percentiles}
    \item \emph{value} computed percentile value converted from temporal data type to its character 
      representation.
    \item \emph{epoch} corresponding to temporal percentile value epoch: for \code{date} and 
      \code{timestamp} values, the number of seconds since 1970-01-01 00:00:00-00 (can be negative); 
      for interval values include \code{time}, the total number of seconds in the interval. 
    \item \emph{column} table column name. Override name \code{column} with parameter \code{nameInDataFrame}
      or omit this column all together if \code{NULL}.
    \item \emph{by[1], by[2], ...} in presence of parameter \code{by}, contain values of the grouping 
      columns for computed percentiles (optional).
  }
}
\description{
Compute percentiles including boxplot quartiles across values of column 
\code{columnName}. Multiple sets of percentiles achieved with the
parameter \code{by}. Vector \code{by} may contain arbitrary number 
of column names: the percentiles are computed for each combination
of values from these columns. Remember that when using computed
quartiles with function \code{\link{createBoxplot}} it can utilize
up to 3 columns by displaying them along the x-axis and inside
facets.
}
\examples{
if(interactive()){
# initialize connection to Lahman baseball database in Aster 
conn = odbcDriverConnect(connection="driver={Aster ODBC Driver};
                         server=<dbhost>;port=2406;database=<dbname>;uid=<user>;pwd=<pw>")

# ipouts percentiles for pitching ipouts for AL in 2000s
ipop = computePercentiles(conn, "pitching", "ipouts",
                          where = "lgid = 'AL' and yearid >= 2000")

# ipouts percentiles by league
ipopLg = computePercentiles(conn, "pitching", "ipouts", by="lgid")

# percentiles on temporal columns
playerAllDates = computePercentiles(conn, "master_enh", 
                    columns=c('debut','finalgame','birthdate','deathdate'),
                    temporal=TRUE, percentiles=c(0))
createBoxplot(playerAllDates, x='column', value='epoch', useIQR=TRUE, 
              title="Boxplots for Date columns (epoch values)", 
              legendPosition="none")

}
}

